Env:
Hive 0.13Symptom:
Hive CLI fails with error "Metastore contains multiple versions".Below is a sample stacktrace:
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2511) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2523) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340) ... 7 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410) ... 12 more Caused by: MetaException(message:Metastore contains multiple versions) at org.apache.hadoop.hive.metastore.ObjectStore.getMSchemaVersion(ObjectStore.java:6373) at org.apache.hadoop.hive.metastore.ObjectStore.getMetaStoreSchemaVersion(ObjectStore.java:6335) at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:6294) at org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:6282) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108) at com.sun.proxy.$Proxy9.verifySchema(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:476) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:523) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:397) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:356) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:54) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59) at org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4944) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:171)
Root Cause:
There are 2 scenarios where Hive metastore will modify "VERSION" table.1. hive.metastore.schema.verification=false AND get "NULL" when fetching the schema version from backend database due to some reason.
Hive will insert a new row in VERSION table in this case:INSERT INTO `VERSION` (`VER_ID`,`VERSION_COMMENT`,`SCHEMA_VERSION`) VALUES (16,'Set by MetaStore','0.13.0')
2. hive.metastore.schema.verification=true AND metastore schema version is the same as Hive distribution.
Hive will update the existing row in VERSION table in this case:UPDATE `VERSION` SET `SCHEMA_VERSION`='0.13.0',`VERSION_COMMENT`='Set by MetaStore' WHERE `VER_ID`=16
Actually only scenario 1 will add more rows into VERSION table.
Setting hive.metastore.schema.verification=true can avoid it;
Scenario 2 is actually a different bug HIVE-9749, the "if-else" logic is wrong in Hive 0.13 code and it is fixed in Hive 1.2.
The correct behavior of scenario 2 should be: Hive should update VERSION table only when hive.metastore.schema.verification=false AND metastore schema version is different than Hive distribution.
But anyway, scenario 2 will never insert duplicate rows into VERSION table.
See the Hive 0.13 code of metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
  private synchronized void checkSchema() throws MetaException {
    // recheck if it got verified by another thread while we were waiting
    if (isSchemaVerified.get()) {
      return;
    }
    boolean strictValidation =
      HiveConf.getBoolVar(getConf(), HiveConf.ConfVars.METASTORE_SCHEMA_VERIFICATION);
    // read the schema version stored in metastore db
    String schemaVer = getMetaStoreSchemaVersion();
    if (schemaVer == null) {
      // metastore has no schema version information
      if (strictValidation) {
            throw new MetaException("Version information not found in metastore. ");
          } else {
            LOG.warn("Version information not found in metastore. "
                + HiveConf.ConfVars.METASTORE_SCHEMA_VERIFICATION.toString() +
                " is not enabled so recording the schema version " +
                MetaStoreSchemaInfo.getHiveSchemaVersion());
            setMetaStoreSchemaVersion(MetaStoreSchemaInfo.getHiveSchemaVersion(),
                "Set by MetaStore");
        }
    } else {
      // metastore schema version is different than Hive distribution needs
      if (strictValidation) {
        if (!schemaVer.equalsIgnoreCase(MetaStoreSchemaInfo.getHiveSchemaVersion())) {
          throw new MetaException("Hive Schema version "
              + MetaStoreSchemaInfo.getHiveSchemaVersion() +
              " does not match metastore's schema version " + schemaVer +
              " Metastore is not upgraded or corrupt");
        } else {
          LOG.warn("Metastore version was " + schemaVer + " " +
              HiveConf.ConfVars.METASTORE_SCHEMA_VERIFICATION.toString() +
              " is not enabled so recording the new schema version " +
              MetaStoreSchemaInfo.getHiveSchemaVersion());
          setMetaStoreSchemaVersion(MetaStoreSchemaInfo.getHiveSchemaVersion(),
              "Set by MetaStore");
        }
      }
    }
    isSchemaVerified.set(true);
    return;
  }
See the Hive 1.2 code of metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:  private synchronized void checkSchema() throws MetaException {
    // recheck if it got verified by another thread while we were waiting
    if (isSchemaVerified.get()) {
      return;
    }
    boolean strictValidation =
      HiveConf.getBoolVar(getConf(), HiveConf.ConfVars.METASTORE_SCHEMA_VERIFICATION);
    // read the schema version stored in metastore db
    String schemaVer = getMetaStoreSchemaVersion();
    if (schemaVer == null) {
      if (strictValidation) {
        throw new MetaException("Version information not found in metastore. ");
      } else {
        LOG.warn("Version information not found in metastore. "
            + HiveConf.ConfVars.METASTORE_SCHEMA_VERIFICATION.toString() +
            " is not enabled so recording the schema version " +
            MetaStoreSchemaInfo.getHiveSchemaVersion());
        setMetaStoreSchemaVersion(MetaStoreSchemaInfo.getHiveSchemaVersion(),
          "Set by MetaStore " + USER + "@" + HOSTNAME);
      }
    } else {
      // metastore schema version is different than Hive distribution needs
      if (schemaVer.equalsIgnoreCase(MetaStoreSchemaInfo.getHiveSchemaVersion())) {
        LOG.debug("Found expected HMS version of " + schemaVer);
      } else {
        if (strictValidation) {
          throw new MetaException("Hive Schema version "
              + MetaStoreSchemaInfo.getHiveSchemaVersion() +
              " does not match metastore's schema version " + schemaVer +
              " Metastore is not upgraded or corrupt");
        } else {
          LOG.error("Version information found in metastore differs " + schemaVer +
              " from expected schema version " + MetaStoreSchemaInfo.getHiveSchemaVersion() +
              ". Schema verififcation is disabled " +
              HiveConf.ConfVars.METASTORE_SCHEMA_VERIFICATION + " so setting version.");
          setMetaStoreSchemaVersion(MetaStoreSchemaInfo.getHiveSchemaVersion(),
            "Set by MetaStore " + USER + "@" + HOSTNAME);
        }
      }
    }
    isSchemaVerified.set(true);
    return;
  }
So from above codes, Hive 1.2 just switched the two if-clauses for scenario 2.Although scenario 2 is also buggy, but it is not impacting this multiple versions issue.
Solution:
1. To fix this issue and bring Hive back online, we need to delete the bad rows from "VERSION" table in backend database.For example, if current "VERSION" looks as below:
mysql> select * from version; +--------+----------------+------------------+ | VER_ID | SCHEMA_VERSION | VERSION_COMMENT | +--------+----------------+------------------+ | 1 | 0.13.0 | Set by MetaStore | | 6 | 0.13.0 | Set by MetaStore | | 11 | 0.13.0 | Set by MetaStore | | 16 | 0.13.0 | Set by MetaStore | | 21 | 0.13.0 | Set by MetaStore | | 26 | 0.13.0 | Set by MetaStore | | 31 | 0.13.0 | Set by MetaStore | +--------+----------------+------------------+ 7 rows in set (0.01 sec)The fix is :
delete from version where ver_id<>1;2. To avoid Hive metastore implicitly inserting into VERSION table, please set hive.metastore.schema.verification to true in hive-site.xml on all the nodes which will connect to this metastore database.
 
 
Thanks, this helped me a lot.
ReplyDeleteGood to know that this helps. Feel free to share the issue and resolution under this topic which may help others:)
DeleteThis helped me alot.
ReplyDeleteHi
ReplyDeleteCan you throw some light on why insert overwrite runs slow, when I am loading data into a partitioned orc table from text table in hive using dynamic partitioning.
text table: table_text
orc table with partitions: table_orc
Command: INSERT OVERWRITE table table_orc partitioned by (country)
SELECT * FROM table_text;
Number of partitions are 1000.
The mapreduce job ran fine, but process is stucked at "Loading data to table default.table_orc (country=null)"
I read in one of the links, that partition needs to be added to the metastore and this is an expensive operation to perform. Is it right understanding.
Link: http://grokbase.com/t/hive/user/1549j1x41h/hive-slow-loading-data-process-with-parquet-over-30k-partitions
Thank you so much, it really worked:-)
ReplyDelete