Wednesday, September 2, 2015

Hive CLI fails with error "Metastore contains multiple versions"

Env:

Hive 0.13

Symptom:

Hive CLI fails with error "Metastore contains multiple versions".

Below is a sample stacktrace:
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient 
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346) 
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) 
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:606) 
at org.apache.hadoop.util.RunJar.main(RunJar.java:212) 
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient 
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412) 
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62) 
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72) 
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2511) 
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2523) 
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340) 
... 7 more 
Caused by: java.lang.reflect.InvocationTargetException 
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) 
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
at java.lang.reflect.Constructor.newInstance(Constructor.java:526) 
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410) 
... 12 more 
Caused by: MetaException(message:Metastore contains multiple versions) 
at org.apache.hadoop.hive.metastore.ObjectStore.getMSchemaVersion(ObjectStore.java:6373) 
at org.apache.hadoop.hive.metastore.ObjectStore.getMetaStoreSchemaVersion(ObjectStore.java:6335) 
at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:6294) 
at org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:6282) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:606) 
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108) 
at com.sun.proxy.$Proxy9.verifySchema(Unknown Source) 
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:476) 
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:523) 
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:397) 
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:356) 
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:54) 
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59) 
at org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4944) 
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:171) 

Root Cause:

There are 2 scenarios where Hive metastore will modify "VERSION" table.

1. hive.metastore.schema.verification=false AND get "NULL" when fetching the schema version from backend database due to some reason.

Hive will insert a new row in VERSION table in this case:
INSERT INTO `VERSION` (`VER_ID`,`VERSION_COMMENT`,`SCHEMA_VERSION`) VALUES (16,'Set by MetaStore','0.13.0')

2. hive.metastore.schema.verification=true AND metastore schema version is the same as Hive distribution.

Hive will update the existing row in VERSION table in this case:
UPDATE `VERSION` SET `SCHEMA_VERSION`='0.13.0',`VERSION_COMMENT`='Set by MetaStore' WHERE `VER_ID`=16

Actually only scenario 1 will add more rows into VERSION table.
Setting hive.metastore.schema.verification=true can avoid it;

Scenario 2 is actually a different bug HIVE-9749, the "if-else" logic is wrong in Hive 0.13 code and it is fixed in Hive 1.2.
The correct behavior of scenario 2 should be:  Hive should update VERSION table only when hive.metastore.schema.verification=false AND metastore schema version is different than Hive distribution.
But anyway, scenario 2 will never insert duplicate rows into VERSION table.

See the Hive 0.13 code of metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
  private synchronized void checkSchema() throws MetaException {
    // recheck if it got verified by another thread while we were waiting
    if (isSchemaVerified.get()) {
      return;
    }

    boolean strictValidation =
      HiveConf.getBoolVar(getConf(), HiveConf.ConfVars.METASTORE_SCHEMA_VERIFICATION);
    // read the schema version stored in metastore db
    String schemaVer = getMetaStoreSchemaVersion();
    if (schemaVer == null) {
      // metastore has no schema version information
      if (strictValidation) {
            throw new MetaException("Version information not found in metastore. ");
          } else {
            LOG.warn("Version information not found in metastore. "
                + HiveConf.ConfVars.METASTORE_SCHEMA_VERIFICATION.toString() +
                " is not enabled so recording the schema version " +
                MetaStoreSchemaInfo.getHiveSchemaVersion());
            setMetaStoreSchemaVersion(MetaStoreSchemaInfo.getHiveSchemaVersion(),
                "Set by MetaStore");
        }
    } else {
      // metastore schema version is different than Hive distribution needs
      if (strictValidation) {
        if (!schemaVer.equalsIgnoreCase(MetaStoreSchemaInfo.getHiveSchemaVersion())) {
          throw new MetaException("Hive Schema version "
              + MetaStoreSchemaInfo.getHiveSchemaVersion() +
              " does not match metastore's schema version " + schemaVer +
              " Metastore is not upgraded or corrupt");
        } else {
          LOG.warn("Metastore version was " + schemaVer + " " +
              HiveConf.ConfVars.METASTORE_SCHEMA_VERIFICATION.toString() +
              " is not enabled so recording the new schema version " +
              MetaStoreSchemaInfo.getHiveSchemaVersion());
          setMetaStoreSchemaVersion(MetaStoreSchemaInfo.getHiveSchemaVersion(),
              "Set by MetaStore");
        }
      }
    }
    isSchemaVerified.set(true);
    return;
  }
See the Hive 1.2 code of metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:
  private synchronized void checkSchema() throws MetaException {
    // recheck if it got verified by another thread while we were waiting
    if (isSchemaVerified.get()) {
      return;
    }

    boolean strictValidation =
      HiveConf.getBoolVar(getConf(), HiveConf.ConfVars.METASTORE_SCHEMA_VERIFICATION);
    // read the schema version stored in metastore db
    String schemaVer = getMetaStoreSchemaVersion();
    if (schemaVer == null) {
      if (strictValidation) {
        throw new MetaException("Version information not found in metastore. ");
      } else {
        LOG.warn("Version information not found in metastore. "
            + HiveConf.ConfVars.METASTORE_SCHEMA_VERIFICATION.toString() +
            " is not enabled so recording the schema version " +
            MetaStoreSchemaInfo.getHiveSchemaVersion());
        setMetaStoreSchemaVersion(MetaStoreSchemaInfo.getHiveSchemaVersion(),
          "Set by MetaStore " + USER + "@" + HOSTNAME);
      }
    } else {
      // metastore schema version is different than Hive distribution needs
      if (schemaVer.equalsIgnoreCase(MetaStoreSchemaInfo.getHiveSchemaVersion())) {
        LOG.debug("Found expected HMS version of " + schemaVer);
      } else {
        if (strictValidation) {
          throw new MetaException("Hive Schema version "
              + MetaStoreSchemaInfo.getHiveSchemaVersion() +
              " does not match metastore's schema version " + schemaVer +
              " Metastore is not upgraded or corrupt");
        } else {
          LOG.error("Version information found in metastore differs " + schemaVer +
              " from expected schema version " + MetaStoreSchemaInfo.getHiveSchemaVersion() +
              ". Schema verififcation is disabled " +
              HiveConf.ConfVars.METASTORE_SCHEMA_VERIFICATION + " so setting version.");
          setMetaStoreSchemaVersion(MetaStoreSchemaInfo.getHiveSchemaVersion(),
            "Set by MetaStore " + USER + "@" + HOSTNAME);
        }
      }
    }
    isSchemaVerified.set(true);
    return;
  }
So from above codes, Hive 1.2 just switched the two if-clauses for scenario 2.
Although scenario 2 is also buggy, but it is not impacting this multiple versions issue.

Solution:

1. To fix this issue and bring Hive back online, we need to delete the bad rows from "VERSION" table in backend database.
For example, if current "VERSION" looks as below:
mysql> select * from version; 
+--------+----------------+------------------+ 
| VER_ID | SCHEMA_VERSION | VERSION_COMMENT | 
+--------+----------------+------------------+ 
| 1 | 0.13.0 | Set by MetaStore | 
| 6 | 0.13.0 | Set by MetaStore | 
| 11 | 0.13.0 | Set by MetaStore | 
| 16 | 0.13.0 | Set by MetaStore | 
| 21 | 0.13.0 | Set by MetaStore | 
| 26 | 0.13.0 | Set by MetaStore | 
| 31 | 0.13.0 | Set by MetaStore | 
+--------+----------------+------------------+ 
7 rows in set (0.01 sec) 
The fix is :
delete from version where ver_id<>1; 
2. To avoid Hive metastore implicitly inserting into VERSION table, please set hive.metastore.schema.verification to true in hive-site.xml on all the nodes which will connect to this metastore database.


6 comments:

  1. Replies
    1. Good to know that this helps. Feel free to share the issue and resolution under this topic which may help others:)

      Delete
  2. Hi

    Can you throw some light on why insert overwrite runs slow, when I am loading data into a partitioned orc table from text table in hive using dynamic partitioning.

    text table: table_text
    orc table with partitions: table_orc

    Command: INSERT OVERWRITE table table_orc partitioned by (country)
    SELECT * FROM table_text;

    Number of partitions are 1000.

    The mapreduce job ran fine, but process is stucked at "Loading data to table default.table_orc (country=null)"

    I read in one of the links, that partition needs to be added to the metastore and this is an expensive operation to perform. Is it right understanding.

    Link: http://grokbase.com/t/hive/user/1549j1x41h/hive-slow-loading-data-process-with-parquet-over-30k-partitions

    ReplyDelete
  3. Thank you so much, it really worked:-)

    ReplyDelete

  4. Looking to add some magic to your apparel? disney dtf transfers are a fantastic way to achieve vibrant and long-lasting designs on various fabrics. They are easy to apply and perfect for both small projects and larger orders. Whether you're customizing clothing for kids or creating unique gifts, these transfers offer excellent quality and detail. Explore the options available and make your designs stand out with ease.

    ReplyDelete

Popular Posts