Tuesday, November 11, 2014

Hive Locks -- Table/Partition Level

Hive supports concurrency and table/partition level locks.
Currently only "Share" and "Exclusive" locks are introduced.
Share lock is for read , and anything else requires Exclusive lock.

Below tips can help you hands on this feature.

1. To enable the locking feature, hive.zookeeper.quorum and hive.support.concurrency need to be set.

<property>
  <name>hive.zookeeper.quorum</name>
  <value>n1a,n2a,n3a</value>
</property>

<property>
  <name>hive.support.concurrency</name>
  <value>true</value>
</property>

2.  How to manually lock and unlock table or partitions in hive session.

Prepare the table and partition table:
create table passwords (col0 string, col1 string, col2 string, col3 string, col4 string,
col5 string, col6 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ":";

load data local inpath "/etc/passwd" into table passwords;

create table testpart (
col0 string, col1 string, col2 string, 
col3 string, col4 string,col5 string, col6 string)
PARTITIONED BY (one STRING, two STRING) ;

INSERT OVERWRITE TABLE testpart
PARTITION(one = 'a', two = 'b')
SELECT  * FROM passwords;
For Table:
0: jdbc:hive2://localhost:10000/default> lock table passwords shared;
No rows affected (0.226 seconds)
0: jdbc:hive2://localhost:10000/default> show locks extended;
+-----------------------------------------------+---------+
|                   tab_name                    |  mode   |
+-----------------------------------------------+---------+
| default@passwords                             | SHARED  |
| LOCK_QUERYID:                                 | NULL    |
| LOCK_TIME:1415758308800                       | NULL    |
| LOCK_MODE:EXPLICIT                            | NULL    |
| LOCK_QUERYSTRING:lock table passwords shared  | NULL    |
+-----------------------------------------------+---------+
5 rows selected (0.123 seconds)
0: jdbc:hive2://localhost:10000/default> unlock table passwords;
No rows affected (0.177 seconds)
0: jdbc:hive2://localhost:10000/default> show locks extended;
+-----------+-------+
| tab_name  | mode  |
+-----------+-------+
+-----------+-------+
No rows selected (0.105 seconds)
For partition:
0: jdbc:hive2://localhost:10000/default> lock table testpart PARTITION(one = 'a', two = 'b') exclusive;
No rows affected (0.205 seconds)
0: jdbc:hive2://localhost:10000/default> show locks extended;                                          
+---------------------------------------------------------------------------------+------------+
|                                    tab_name                                     |    mode    |
+---------------------------------------------------------------------------------+------------+
| default@testpart@one=a/two=b                                                    | EXCLUSIVE  |
| LOCK_QUERYID:                                                                   | NULL       |
| LOCK_TIME:1415758674295                                                         | NULL       |
| LOCK_MODE:EXPLICIT                                                              | NULL       |
| LOCK_QUERYSTRING:lock table testpart PARTITION(one = 'a', two = 'b') exclusive  | NULL       |
+---------------------------------------------------------------------------------+------------+
5 rows selected (0.104 seconds)
0: jdbc:hive2://localhost:10000/default> unlock table testpart PARTITION(one = 'a', two = 'b');
No rows affected (0.209 seconds)
0: jdbc:hive2://localhost:10000/default> show locks extended;
+-----------+-------+
| tab_name  | mode  |
+-----------+-------+
+-----------+-------+
No rows selected (0.174 seconds)

3.  We can get the znode information from zookeeper when locking happens.

For example, if we lock table "passwords" in shared mode:
lock table passwords shared;
Then logon zookeeper:
/opt/mapr/zookeeper/zookeeper-3.4.5/bin/zkCli.sh -server <IP of zookeeper>:<port of zookeeper>

[zk: xx.xx.xx.xx:5181(CONNECTED) 39] ls /hive_zookeeper_namespace/default/passwords           
[LOCK-SHARED-0000000000]
[zk: xx.xx.xx.xx:5181(CONNECTED) 40] get /hive_zookeeper_namespace/default/passwords
cZxid = 0x600001eeb
ctime = Tue Nov 11 18:21:04 GMT-08:00 2014
mZxid = 0x600001eeb
mtime = Tue Nov 11 18:21:04 GMT-08:00 2014
pZxid = 0x600001eec
cversion = 1
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 1

4. If DML/DDL only happens on leaf partition, then only leaf partition will be locked in Exclusive mode. Parent partitions and parent table will be locked in Shared mode.

For example: If inserting to the leaf partition of table "testpart":
INSERT OVERWRITE TABLE testpart
PARTITION(one = 'a', two = 'b')
SELECT  * FROM passwords;
Here is locking status from hive:
0: jdbc:hive2://localhost:10000/default> show locks extended;
+--------------------------------------------------------------------------------------------------------------------+------------+
|                                                      tab_name                                                      |    mode    |
+--------------------------------------------------------------------------------------------------------------------+------------+
| default@passwords                                                                                                  | SHARED     |
| LOCK_QUERYID:mapr_20141111174545_9d578e82-0588-4dcf-80a5-0cd6ce8f05a9                                              | NULL       |
| LOCK_TIME:1415756712558                                                                                            | NULL       |
| LOCK_MODE:IMPLICIT                                                                                                 | NULL       |
| LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one = 'a', two = 'b') SELECT  * FROM passwords limit 1  | NULL       |
| default@testpart                                                                                                   | SHARED     |
| LOCK_QUERYID:mapr_20141111174545_9d578e82-0588-4dcf-80a5-0cd6ce8f05a9                                              | NULL       |
| LOCK_TIME:1415756712558                                                                                            | NULL       |
| LOCK_MODE:IMPLICIT                                                                                                 | NULL       |
| LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one = 'a', two = 'b') SELECT  * FROM passwords limit 1  | NULL       |
| default@testpart@one=a                                                                                             | SHARED     |
| LOCK_QUERYID:mapr_20141111174545_9d578e82-0588-4dcf-80a5-0cd6ce8f05a9                                              | NULL       |
| LOCK_TIME:1415756712558                                                                                            | NULL       |
| LOCK_MODE:IMPLICIT                                                                                                 | NULL       |
| LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one = 'a', two = 'b') SELECT  * FROM passwords limit 1  | NULL       |
| default@testpart@one=a/two=b                                                                                       | EXCLUSIVE  |
| LOCK_QUERYID:mapr_20141111174545_9d578e82-0588-4dcf-80a5-0cd6ce8f05a9                                              | NULL       |
| LOCK_TIME:1415756712558                                                                                            | NULL       |
| LOCK_MODE:IMPLICIT                                                                                                 | NULL       |
| LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one = 'a', two = 'b') SELECT  * FROM passwords limit 1  | NULL       |
+--------------------------------------------------------------------------------------------------------------------+------------+
Here is zookeeper znodes information:
[zk: xx.xx.xx.xx:5181(CONNECTED) 35] ls /hive_zookeeper_namespace/default/testpart
[LOCK-SHARED-0000000000, one=a]
[zk: xx.xx.xx.xx:5181(CONNECTED) 37] ls /hive_zookeeper_namespace/default/testpart/one=a
[LOCK-SHARED-0000000000, two=b]
[zk: xx.xx.xx.xx:5181(CONNECTED) 38] ls /hive_zookeeper_namespace/default/testpart/one=a/two=b
[LOCK-EXCLUSIVE-0000000000] 

5. Lock holder and lock waiter behavior.

When lock waiter is asking for Share or Exclusive lock while lock holder is holding Exclusive lock, OR lock waiter is asking for Exclusive lock while lock holder is holding Share lock,  waiter will retry for "hive.lock.numretries" times(default 100 times), and each retry interval is "hive.lock.sleep.between.retries" seconds(default 60 seconds).
If waiter still can not get the lock after  "hive.lock.numretries" retries, waiter will fail.

The 2 parameters can be set in hive-site.xml for system level or at session level.
For example, HiveServer1:
hive> set hive.lock.numretries=5;
hive> set hive.lock.sleep.between.retries=3;
hive> select count(*) from passwords;       
conflicting lock present for default@passwords mode SHARED
conflicting lock present for default@passwords mode SHARED
conflicting lock present for default@passwords mode SHARED
conflicting lock present for default@passwords mode SHARED
FAILED: Error in acquiring locks: Locks on the underlying objects cannot be acquired. retry after some time
HiveSever2:
0: jdbc:hive2://n1a:10000/default> set hive.lock.numretries=5;
No rows affected (0.005 seconds)
0: jdbc:hive2://n1a:10000/default> set hive.lock.sleep.between.retries=3;
No rows affected (0.003 seconds)
0: jdbc:hive2://n1a:10000/default> select count(*) from passwords; 
Error: Error while processing statement: FAILED: Error in acquiring locks: Locks on the underlying objects cannot be acquired. retry after some time (state=42000,code=10)

6. What is the log message when locking happens?

In /tmp/mapr/hive.log, eg:
ERROR [pool-1-thread-88]: ZooKeeperHiveLockManager (SessionState.java:printError(545)) - conflicting lock present for default@passwords mode SHARED


==

3 comments:

Popular Posts