Thursday, November 27, 2014

How to upgrade Hive metastore using Hive schema tool

Starting from Hive 0.12, Hive schema tool is introduced for Hive metastore schema manipulation. It can have below 3 usage:

1. Initialize the metastore schema for the current Hive version.
2. Verify if the metastore schema version is compatible with Hive binaries.
3. Upgrade the schema from an older version to current.


Before Hive 0.12, we have to manually run the upgrade DDL SQL files in the metastore database.
This article will go through the steps to upgrade metastore schema using Hive schema tool, taking PostgreSQL for example.

Hive 0.12 introduced a new table "VERSION" in Hive metastore, that is why Hive can recognize the metastore version.
So upgrading from 0.12 to newer version, Hive schema tool knows which upgrade SQL files to execute.
If we plan to upgrade from an older version, we must know what is current version of metastore because Hive schema tool can never know by itself.
Here are 2 examples, one is upgrading metastore from 0.12 from current 0.13, and the other one is upgrading from 0.8 to 0.13.

Upgrading metastore schema from 0.12 to 0.13.

1. Verify current versions of Hive binary and Hive metastore.

[root@n1a conf]#   schematool -dbType postgres -info   
Metastore connection URL:  jdbc:postgresql://n1a/oldmetastore2
Metastore Connection Driver :  org.postgresql.Driver
Metastore connection User:  hiveuser
Hive distribution version:  0.13.0
Metastore schema version:  0.12.0
schemaTool completeted

2. Dry run can let us know what SQL file to execute in advance.

[root@n1a conf]# schematool -dbType postgres -upgradeSchema -dryRun
Metastore connection URL:  jdbc:postgresql://n1a/oldmetastore2
Metastore Connection Driver :  org.postgresql.Driver
Metastore connection User:  hiveuser
Starting upgrade metastore schema from version 0.12.0 to 0.13.0
Upgrade script upgrade-0.12.0-to-0.13.0.postgres.sql
schemaTool completeted 

3. Execute the upgrade.

[root@n1a conf]# schematool -dbType postgres -upgradeSchema   
Metastore connection URL:  jdbc:postgresql://n1a/oldmetastore2
Metastore Connection Driver :  org.postgresql.Driver
Metastore connection User:  hiveuser
Starting upgrade metastore schema from version 0.12.0 to 0.13.0
Upgrade script upgrade-0.12.0-to-0.13.0.postgres.sql
Completed upgrade-0.12.0-to-0.13.0.postgres.sql
schemaTool completeted

4. Verify.

[root@n1a conf]# schematool -dbType postgres -info  
Metastore connection URL:  jdbc:postgresql://n1a/oldmetastore2
Metastore Connection Driver :  org.postgresql.Driver
Metastore connection User:  hiveuser
Hive distribution version:  0.13.0
Metastore schema version:  0.13.0
schemaTool completeted

Upgrading metastore schema from 0.8 to 0.13.

1. Schema tool can not get the current metastore version.

[root@n1a conf]#  schematool -dbType postgres -info   
Metastore connection URL:  jdbc:postgresql://n1a/oldmetastore
Metastore Connection Driver :  org.postgresql.Driver
Metastore connection User:  hiveuser
Hive distribution version:  0.13.0
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
*** schemaTool failed ***

2. Dry run can let us know what SQL file to execute in advance.

Here we must know what is current version of metastore, for example, 0.8 in this case.
[root@n1a conf]# schematool -dbType postgres -upgradeSchemaFrom 0.8.0 -dryRun
Metastore connection URL:  jdbc:postgresql://n1a/oldmetastore
Metastore Connection Driver :  org.postgresql.Driver
Metastore connection User:  hiveuser
Starting upgrade metastore schema from version 0.8.0 to 0.13.0
Upgrade script upgrade-0.8.0-to-0.9.0.postgres.sql
Upgrade script upgrade-0.9.0-to-0.10.0.postgres.sql
Upgrade script upgrade-0.10.0-to-0.11.0.postgres.sql
Upgrade script upgrade-0.11.0-to-0.12.0.postgres.sql
Upgrade script upgrade-0.12.0-to-0.13.0.postgres.sql
schemaTool completeted

3. Execute the upgrade.

[root@n1a conf]# schematool -dbType postgres -upgradeSchemaFrom 0.8.0  
Metastore connection URL:  jdbc:postgresql://n1a/oldmetastore
Metastore Connection Driver :  org.postgresql.Driver
Metastore connection User:  hiveuser
Starting upgrade metastore schema from version 0.8.0 to 0.13.0
Upgrade script upgrade-0.8.0-to-0.9.0.postgres.sql
Completed upgrade-0.8.0-to-0.9.0.postgres.sql
Upgrade script upgrade-0.9.0-to-0.10.0.postgres.sql
Completed upgrade-0.9.0-to-0.10.0.postgres.sql
Upgrade script upgrade-0.10.0-to-0.11.0.postgres.sql
Completed upgrade-0.10.0-to-0.11.0.postgres.sql
Upgrade script upgrade-0.11.0-to-0.12.0.postgres.sql
Completed upgrade-0.11.0-to-0.12.0.postgres.sql
Upgrade script upgrade-0.12.0-to-0.13.0.postgres.sql
Completed upgrade-0.12.0-to-0.13.0.postgres.sql
schemaTool completeted

4. Verify.

[root@n1a conf]# schematool -dbType postgres -info   
Metastore connection URL:  jdbc:postgresql://n1a/oldmetastore
Metastore Connection Driver :  org.postgresql.Driver
Metastore connection User:  hiveuser
Hive distribution version:  0.13.0
Metastore schema version:  0.13.0
schemaTool completeted

Possible issues and solutions.

1. If you are using PostgreSQL as the metastore, the language "plpgsql" may not be created in the database before.
Then upgrade will fail as below.
[root@n1a conf]# schematool -dbType postgres -upgradeSchema 
Metastore connection URL:  jdbc:postgresql://n1a/oldmetastore2
Metastore Connection Driver :  org.postgresql.Driver
Metastore connection User:  hiveuser
Starting upgrade metastore schema from version 0.12.0 to 0.13.0
Upgrade script upgrade-0.12.0-to-0.13.0.postgres.sql
Error: ERROR: language "plpgsql" does not exist
  Hint: Use CREATE LANGUAGE to load the language into the database. (state=42704,code=0)
org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore state would be inconsistent !!
*** schemaTool failed ***
The solution is to logon to metastore database using superuser and create language "plpgsql" manually.
oldmetastore2=# create language "plpgsql";
CREATE LANGUAGE
After that, re-run the upgrade step.
2. The Hive user to connect metastore is not the owner of the tables, or it is not superuser.
This may happen if you only granted the permissions on tables to Hive user, but not set Hive user as the owner of tables, when you initialized the metastore.
[root@n1a conf]# schematool -dbType postgres -upgradeSchema  
Metastore connection URL:  jdbc:postgresql://n1a/oldmetastore2
Metastore Connection Driver :  org.postgresql.Driver
Metastore connection User:  hiveuser
Starting upgrade metastore schema from version 0.12.0 to 0.13.0
Upgrade script upgrade-0.12.0-to-0.13.0.postgres.sql
Error: ERROR: must be owner of relation DBS (state=42501,code=0)
org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore state would be inconsistent !!
*** schemaTool failed ***
The solution is to change the owner of the tables to Hive user. In this case, "hiveuser" is the one.
Take PostgreSQL for example.
Firstly generate a SQL file "/tmp/changeowner.sql" using below SQL.
oldmetastore2=#  \o /tmp/changeowner.sql
oldmetastore2=#  SELECT 'alter table "'  || schemaname || '"."' || tablename || '" OWNER TO hiveuser ;'
oldmetastore2-#  FROM pg_tables
oldmetastore2-#  WHERE tableowner = CURRENT_USER and schemaname = 'public';
Then edit "/tmp/changeowner.sql" to remove some junk lines in the beginning or end of the file.
Execute "/tmp/changeowner.sql" in the metastore database to change the owner of tables to Hive user.
oldmetastore2=# \i /tmp/changeowner.sql
After that, re-run the upgrade script.

No comments:

Post a Comment

Popular Posts