Monday, November 13, 2017
Monday, November 6, 2017
How to install and configure MapR Hive ODBC driver on Linux
Goal:
How to install and configure MapR Hive ODBC driver on Linux.This article gives detailed step-by-step instructions as a supplement to this MapR Documentation.
Monday, October 30, 2017
How to modify hbase thrift client code if Hbase Thrift Service enables framed transport and compact protocol
Goal:
How to modify hbase thrift client code if Hbase Thrift Service enables framed transportation and compact protocol.The background is:
To avoid thrift service crash issue mentioned in HBASE-11052, we need to enable framed transport and compact protocol in hbase-site.xml and restart Hbase Thrift Service as below:
<property> <name>hbase.regionserver.thrift.framed</name> <value>true</value> </property> <property> <name>hbase.regionserver.thrift.framed.max_frame_size_in_mb</name> <value>2</value> </property> <property> <name>hbase.regionserver.thrift.compact</name> <value>true</value> </property>After that, the old Hbase thrift client code need to be modified, otherwise it will fail with below error:
thrift.transport.TTransport.TTransportException: TSocket read 0 bytesThis article explains what to modify in hbase thrift code to make the job compatible with framed transport and compact protocol.
Friday, October 27, 2017
How to install Thrift and run a sample Hbase thrift job towards Hbase Thrift Gateway on MapR Cluster
Goal:
How to install Thrift and run a sample Hbase thrift job towards Hbase Thrift Gateway on MapR Cluster.The example job is written in Python, and it just scans a MapR-DB table.
Friday, September 22, 2017
How to modify max heap size for Impalad embedded JVM
Goal:
The impalad is a 'native' process that has an embedded JVM. The JVM is started from within c++ code.The -mem_limit startup option sets an overall limit for the impalad process (which handles multiple queries concurrently).
However the max heap size of impalad embedded JVM is much smaller than that limit.
Some impala queries may use up all embedded JVM heap size before reaching the limit set by "-mem_limit" startup option, so that it may cause impalad errors "OutOfMemoryError: Java heap space" or just get hung. In that situation, we need to increase the JVM max heap size.
This article shows how to check and modify the max heap size for impalad embedded JVM.
Tuesday, September 19, 2017
Hue could not show Hive tables after Hive enables PAM authentication
Symptom:
a. Hue could not show Hive tables after Hive enables PAM authentication, see below screenshot:b. From /opt/mapr/hue/hue-<version>/logs/runcpserver.log, below error messages show up:
[19/Sep/2017 15:55:07 -0700] dbms DEBUG Query Server: {'server_name': 'beeswax', 'transport_mode': 'socket', 'server_host': 's4.poc.com', 'server_port': 10000, 'auth_password_used': False, 'http_url': 'http://s4.poc.com:10001/cliservice', 'auth_username': 'hue', 'principal': None} [19/Sep/2017 15:55:10 -0700] thrift_util INFO Thrift saw a transport exception: Bad status: 3 (Error validating the login)c. From HiveServer2 log /opt/mapr/hive/hive-<version>/logs/mapr/hive.log, below stacktrace shows up:
2017-09-19T15:57:11,046 ERROR [HiveServer2-Handler-Pool: Thread-60] transport.TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: Error validating the login [Caused by javax.security.sasl.AuthenticationException: Error authenticating with the PAM service: login [Caused by javax.security.sasl.AuthenticationException: Error authenticating with the PAM service: login]] at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:110) at org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:539) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: javax.security.sasl.AuthenticationException: Error authenticating with the PAM service: login [Caused by javax.security.sasl.AuthenticationException: Error authenticating with the PAM service: login] at org.apache.hive.service.auth.PamAuthenticationProviderImpl.Authenticate(PamAuthenticationProviderImpl.java:54) at org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:119) at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:103) ... 8 more Caused by: javax.security.sasl.AuthenticationException: Error authenticating with the PAM service: login at org.apache.hive.service.auth.PamAuthenticationProviderImpl.Authenticate(PamAuthenticationProviderImpl.java:48) ... 10 more 2017-09-19T15:57:11,046 ERROR [HiveServer2-Handler-Pool: Thread-60] server.TThreadPoolServer: Error occurred during processing of message. java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Error validating the login at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.thrift.transport.TTransportException: Error validating the login at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:316) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) ... 4 more
Friday, September 15, 2017
Hive 2.x queries got stuck when waiting for "tryAcquireCompileLock" in HS2 stacktrace
Env:
Hive 2.xSymptom:
Hive 2.x queries got stuck when waiting for "tryAcquireCompileLock" in HS2 stacktrace.When the issue happens, connecting using beeline to HS2 works.
However any queries, eg, "show databases" will hung.
Below is one example of the jstack output on HS2 process:
"7d88a5ad-cd2c-4c37-9025-8372164524fd HiveServer2-Handler-Pool: Thread-214" #214 prio=5 os_prio=0 tid=0x00007fdacc2d1800 nid=0x7af0 waiting on condition [0x00007fda9b6fa000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000005c1fb7f28> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at org.apache.hadoop.hive.ql.Driver.tryAcquireCompileLock(Driver.java:1324) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1236) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1230) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:191) at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:276) at org.apache.hive.service.cli.operation.Operation.run(Operation.java:324) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:499) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:486) at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) at com.sun.proxy.$Proxy25.executeStatementAsync(Unknown Source) at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:294) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:505) at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437) at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)
How to limit the Hive log size using RFA instead of default DRFA in Hive 2.x using log4j2
Env:
Hive 2.x on MapRGoal:
By default, MapR Hive is using DRFA(Daily Rolling File Appender) for log4j2. The template for DRFA settings are in /opt/mapr/hive/hive-<version>/conf/hive-log4j2.properties.templateAdministrators can copy hive-log4j2.properties.template to hive-log4j2.properties in "conf" directory and make the changes as they want.
However if the daily Hive log is too large and may potentially fill up all the disk space, we can use RFA(Rolling File Appender) instead to set a max size of each log and also the total number of logs.
Note: Per HIVE-11304, Hive upgraded log4j 1.x to log4j2. So that the previous article is only for Hive 1.x.
Wednesday, May 24, 2017
Hive on Tez : How to control the number of Mappers and Reducers
Goal:
How to control the number of Mappers and Reducers in Hive on Tez.Tuesday, May 23, 2017
Hive on Tez : How to identify which DAG ID for which Hive query in the same DAG Application Master
Goal:
Hive on Tez : How to identify which DAG ID for which Hive query in the same DAG Application Master
Subscribe to:
Posts (Atom)
Popular Posts
-
This is a cookbook for scala programming. 1. Define a object with main function -- Helloworld. object HelloWorld { def main(args: Array...
-
Hive table contains files in HDFS, if one table or one partition has too many small files, the HiveQL performance may be impacted. Sometime...
-
This article shows a sample code to load data into Hbase or MapRDB(M7) using Scala on Spark. I will introduce 2 ways, one is normal load us...
-
Hive is trying to embrace CBO(cost based optimizer) in latest versions, and Join is one major part of it. Understanding join best practices ...
-
Goal: How to build and use parquet-tools to read parquet files. Solution: 1. Download and Install maven. Follow below link: http://...
-
Goal: This article explains what is the difference between Spark HiveContext and SQLContext.
-
Goal: This article explains the configuration parameters for Oozie Launcher job.
-
Many commands can check the memory utilization of JAVA processes, for example, pmap, ps, jmap, jstat. What are the differences? Before we ...
-
Goal: How to understand PageRank algorithm in scala on Spark. This article explains each step using sample data.
-
Goal: This article provides the SQL to list table or partition locations from Hive Metastore. Env: Hive metastore 0.13 on MySQL Root ...
