Wednesday, February 6, 2019

"msck repair" of a partition table fails with error "Expecting only one partition but more than one partitions are found.".

Symptom:

In spark-shell, "msck repair" of a partition table named "database_name.table_name" fails with error "Expecting only one partition but more than one partitions are found.".

Env:

Hive 2.1(with MySQL as the backend database for Hive Metastore)
Spark 2.2.1

Troubleshooting:

The source code for this error message is :
metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
      query =
          pm.newQuery(MPartition.class,
              "table.tableName == t1 && table.database.name == t2 && partitionName == t3");
      query.declareParameters("java.lang.String t1, java.lang.String t2, java.lang.String t3");
      mparts = (List<MPartition>) query.execute(tableName, dbName, name);
      pm.retrieveAll(mparts);
      commited = commitTransaction();
      // We need to compare partition name with requested name since some DBs
      // (like MySQL, Derby) considers 'a' = 'a ' whereas others like (Postgres,
      // Oracle) doesn't exhibit this problem.
      if (mparts != null && mparts.size() > 0) {
        if (mparts.size() > 1) {
          throw new MetaException(
              "Expecting only one partition but more than one partitions are found.");
        }

Basically it will generate a query in MySQL(Hive Metastore backend database) to check if there are any duplicate entries based on Table Name, Database Name and Partition Name.
Then we can run below query in MySQL to find out the duplicate entries from PARTITIONS table for that specific Hive partition table -- database_name.table_name:
select p.PART_NAME,count(*)
from TBLS t, PARTITIONS p, DBS d
where d.DB_ID=t.DB_ID
and p.TBL_ID=t.TBL_ID
and t.TBL_NAME='table_name' and d.NAME='database_name'
group by p.PART_NAME
having count(*)>1;

If above query returns one partition name as "key=0", then it means there are 2 or more entries inside PARTITIONS table for that specific partition "key=0".
Then we need to list all the data for that partition:
select p.*
from TBLS t, PARTITIONS p, DBS d
where d.DB_ID=t.DB_ID
and p.TBL_ID=t.TBL_ID
and t.TBL_NAME='table_name' and d.NAME='database_name'
and p.PART_NAME='key=0';
For example, if above query returns two entries as below:
+---------+-------------+------------------+-----------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID |
+---------+-------------+------------------+-----------+-------+--------+
|     255 |  1536697318 |                0 | key=0     |   262 |      7 |
|     256 |  1536697319 |                0 | key=0     |   263 |      7 |
+---------+-------------+------------------+-----------+-------+--------+
So in all, above results show there is already inconsistency such as duplicate entries in Hive Metastore backend database.

Root Cause:

For this specific case, we found that there were no constraints created for those Hive Metastore backend MySQL tables.
Take "PARTITIONS" table for example, we can run "show create table PARTITIONS;" in MySQL and check if below 2 FOREIGN KEY CONSTRAINT exists or not:
  CONSTRAINT `PARTITIONS_FK1` FOREIGN KEY (`TBL_ID`) REFERENCES `TBLS` (`TBL_ID`),
  CONSTRAINT `PARTITIONS_FK2` FOREIGN KEY (`SD_ID`) REFERENCES `SDS` (`SD_ID`)

Solution:

1. Short term fix

The safest way to clean the duplicate entries is to use Hive command instead of manually deleting entries in MySQL.
We just need to drop and re-create that problematic Hive partition, and after that, those duplicate entries are cleaned.
For example:
ALTER TABLE database_name.table_name DROP PARTITION(key=0);
After that, double confirm all the duplicate entries are gone in MySQL:
select p.PART_NAME,count(*)
from TBLS t, PARTITIONS p, DBS d
where d.DB_ID=t.DB_ID
and p.TBL_ID=t.TBL_ID
and t.TBL_NAME='table_name' and d.NAME='database_name'
group by p.PART_NAME
having count(*)>1;

2. Long term fix

Since this issue is due to wrong DDLs in Hive Metastore, we would suggest backup the mysql database and re-create the Hive Metastore backend database from scratch using Hive schema tool.


9 comments:

  1. Pretty good post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog posts. Any way I'll be subscribing to your feed and I hope you post again soon. Big thanks for the useful info. phone repairs

    ReplyDelete
  2. Positive site, where did u come up with the information on this posting? I'm pleased I discovered it though, ill be checking back soon to find out what additional posts you include. more info

    ReplyDelete
  3. Pretty good post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog posts. Any way I'll be subscribing to your feed and I hope you post again soon. Big thanks for the useful info. sony service

    ReplyDelete
  4. So while purchasing a home with an old water heating appliance that you'll have to supplant appears to be a buzz kill, you'll pay not exactly in the event that they had updated it for you (generally utilize this as arranging power! sunset blvd los angeles zip code

    ReplyDelete
  5. American pool tables - Compared to English tables, American tables are bigger, normally 7 to 9.5 feet long. Indeed, even the balls and the pockets are large; they can undoubtedly oblige balls as large as 2 inches. online table price

    ReplyDelete
  6. Hello, I have browsed most of your posts. This post is probably where I got the most useful information for my research. Thanks for posting, maybe we can see more on this. Are you aware of any other websites on this subject appliance repair edmonton

    ReplyDelete
  7. This permits vehicle proprietors to save their windshield just as their cash, which they would have spent on a substitution. препоръчайте професионален домоуправител

    ReplyDelete
  8. I feel a lot more people need to read this, very good info! . 維修熱水爐

    ReplyDelete

Popular Posts