Tuesday, September 11, 2018

How to start KSQL server in server-client mode on MapR platform

Goal:

KSQL can start in local(standalone mode) using CLI, and it also can start the server-client mode shown as below.

This article just shows the exact commands on MapR 6.1 platform.

Friday, September 7, 2018

How to dig into the Kafka Streams/ksql's statestore in RocksDB

Goal:

By default, Kafka Streams and ksql use RocksDB as the internal state store.
This feature is used for:
  1. an internally created and compacted changelog topic (for fault-tolerance)
  2. one (or multiple) RocksDB instances (for cached key-value lookups).
Thus, in case of starting/stopping applications and rewinding/reprocessing, this internal data needs to get managed correctly.
More design details can be found in Kafka Streams Internal Data Management.
This article explains step by step on how to use tools "ldb" and "sst_dump" to dig deeper into the RocksDB state stores.

How to use ksql on MapR Streams(MapR-ES)

Goal:

KSQL is a streaming SQL engine that enables stream processing against Apache Kafka and MapR Streams(aka MapR-ES). KSQL interacts directly with the Kafka Streams API, removing the requirement of building a Java app.
This article provides an example to quickly help you hands on ksql on MapR Streams by referring to this quick start.

Thursday, July 19, 2018

Drill query fails with "AGGR OOM at First Phase" when doing Hash Aggregate

Symptom:

Drill query fails with "AGGR OOM at First Phase" when doing Hash Aggregate.
Sample error message or stacktrace is:
2018-01-01 11:11:11,111 [xxx:frag:5:6] INFO  o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred: One or more nodes ran out of memory while executing the query. (AGGR OOM at First Phase. Partitions: 1. Estimated batch size: 57655296. values size: 65536. Output alloc size: 65536 Memory limit: 41943040 so far allocated: 262144. )
org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query.

AGGR OOM at First Phase. Partitions: 1. Estimated batch size: 57655296. values size: 65536. Output alloc size: 65536 Memory limit: 41943040 so far allocated: 262144.

[Error Id: yyy ]
        at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) ~[drill-common-1.13.0-mapr.jar:1.13.0-mapr]
        at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:243) [drill-java-exec-1.13.0-mapr.jar:1.13.0-mapr]
        at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.13.0-mapr.jar:1.13.0-mapr]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_171]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_171]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_171]
Caused by: org.apache.drill.exec.exception.OutOfMemoryException: AGGR OOM at First Phase. Partitions: 1. Estimated batch size: 57655296. values size: 65536. Output alloc size: 65536 Memory limit: 41943040 so far allocated: 262144.
        at org.apache.drill.exec.test.generated.HashAggregatorGen5.spillIfNeeded(HashAggTemplate.java:1419) ~[na:na]
        at org.apache.drill.exec.test.generated.HashAggregatorGen5.doSpill(HashAggTemplate.java:1381) ~[na:na]
        at org.apache.drill.exec.test.generated.HashAggregatorGen5.checkGroupAndAggrValues(HashAggTemplate.java:1281) ~[na:na]
        at org.apache.drill.exec.test.generated.HashAggregatorGen5.doWork(HashAggTemplate.java:592) ~[na:na]
        at org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:176) ~[drill-java-exec-1.13.0-mapr.jar:1.13.0-mapr]
        at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) ~[drill-java-exec-1.13.0-mapr.jar:1.13.0-mapr]
        at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.13.0-mapr.jar:1.13.0-mapr]
        at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) ~[drill-java-exec-1.13.0-mapr.jar:1.13.0-mapr]
        at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.13.0-mapr.jar:1.13.0-mapr]
        at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134) ~[drill-java-exec-1.13.0-mapr.jar:1.13.0-mapr]
        at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) ~[drill-java-exec-1.13.0-mapr.jar:1.13.0-mapr]
        at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105) ~[drill-java-exec-1.13.0-mapr.jar:1.13.0-mapr]
        at org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93) ~[drill-java-exec-1.13.0-mapr.jar:1.13.0-mapr]
        at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95) ~[drill-java-exec-1.13.0-mapr.jar:1.13.0-mapr]
        at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:233) ~[drill-java-exec-1.13.0-mapr.jar:1.13.0-mapr]
        at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:226) ~[drill-java-exec-1.13.0-mapr.jar:1.13.0-mapr]
        at java.security.AccessController.doPrivileged(Native Method) ~[na:1.8.0_171]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_171]
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1633) ~[hadoop-common-2.7.0-mapr-1710.jar:na]
        at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:226) [drill-java-exec-1.13.0-mapr.jar:1.13.0-mapr]

Tuesday, July 10, 2018

Hive: Different lock behaviors between DummyTxnManager and DbTxnManager

Goal

Understand the different lock behaviors between DummyTxnManager and DbTxnManager in Hive. The example query is INSERT into a partition table.

Friday, June 8, 2018

How to control the parallelism of Spark job

Goal:

This article explains how to control the parallelism of Spark job and uses a Spark on YARN job to demonstrate.

Monday, November 13, 2017

How to configure LDAP client by using SSSD for authentication on CentOS

Goal:

How to configure LDAP client by using SSSD(System Security Services Daemon) for authentication on CentOS.

Monday, November 6, 2017

How to install and configure MapR Hive ODBC driver on Linux

Goal:

How to install and configure MapR Hive ODBC driver on Linux.
This article gives detailed step-by-step instructions as a supplement to this MapR Documentation.

Monday, October 30, 2017

How to modify hbase thrift client code if Hbase Thrift Service enables framed transport and compact protocol

Goal:

How to modify hbase thrift client code if Hbase Thrift Service enables framed transportation and compact protocol.
The background is:
To avoid thrift service crash issue mentioned in HBASE-11052, we need to enable framed transport and compact protocol in hbase-site.xml and restart Hbase Thrift Service as below:
<property> 
  <name>hbase.regionserver.thrift.framed</name> 
  <value>true</value> 
</property> 
<property> 
  <name>hbase.regionserver.thrift.framed.max_frame_size_in_mb</name> 
  <value>2</value> 
</property> 
<property> 
  <name>hbase.regionserver.thrift.compact</name> 
  <value>true</value> 
</property>
After that, the old Hbase thrift client code need to be modified, otherwise it will fail with below error:
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
This article explains what to modify in hbase thrift code to make the job compatible with framed transport and compact protocol.

Popular Posts