Thursday, July 31, 2014

Hands on Apache Drill

This article refers to Apache Drill in 10 Minutes and show steps to install and test Apache Drill on CDH5 cluster.

1. Download Drill binary

wget https://builds.apache.org/job/drill-scm/70/artifact/distribution/target/apache-drill-1.0.0-m2-incubating-SNAPSHOT-binary-release.tar.gz
Note: above link may change, so please check the latest binary link.

2. Install Drill

mkdir /usr/local/drill
tar xzf apache-drill-1.0.0-m2-incubating-SNAPSHOT-binary-release.tar.gz --strip=1 -C /usr/local/drill

3. Edit conf/drill-env.sh to add HADOOP_HOME

/usr/local/drill/conf/drill-env.sh
HADOOP_HOME=/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hadoop

4. Edit conf/drill-override.conf to add zookeeper quorum info

/usr/local/drill/conf/drill-override.conf
drill.exec: {
  cluster-id: "mydrill",
  zk.connect: "admin.xxx.com:2181,hdw1.xxx.com:2181,hdw3.xxx.com:2181"
}

5. Start Drill on each node

/usr/local/drill/bin/drillbit.sh start

6. Connect to SQLline

/usr/local/drill/bin/sqlline -u jdbc:drill:zk=admin.xxx.com:2181,hdw1.xxx.com:2181,hdw3.xxx.com:2181 -n admin -p admin

7. Check the status of all nodes

0: jdbc:drill:zk=admin.xxx.com:2181,hdw1.v> select * from sys.drillbits;
+------------+------------+--------------+------------+
|    host    | user_port  | control_port | data_port  |
+------------+------------+--------------+------------+
| hdw1.xxx.com | 31010      | 31011        | 31012      |
| hdm.xxx.com | 31010      | 31011        | 31012      |
| hdw2.xxx.com | 31010      | 31011        | 31012      |
| hdw3.xxx.com | 31010      | 31011        | 31012      |
+------------+------------+--------------+------------+
4 rows selected (0.135 seconds)

8.  Show current schema metadata

0: jdbc:drill:zk=admin.xxx.com:2181,hdw1.v> SELECT SCHEMA_NAME AS Database
. . . . . . . . . . . . . . . . . . . . . . .> FROM INFORMATION_SCHEMA.SCHEMATA;
+-------------+
| SCHEMA_NAME |
+-------------+
| dfs.default |
| dfs.root    |
| dfs.tmp     |
| cp.default  |
| sys         |
| INFORMATION_SCHEMA |
+-------------+

9. Configure or check the storage plugin instances on GUI

http://<IP>:8047/storage
For example, default "dfs" storage plugin instance:

10. Test query

0: jdbc:drill:zk=admin.xxx.com:2181,hdw1.v> SELECT * FROM dfs.`/usr/local/drill/sample-data/region.parquet`;
+-------------+------------+------------+
| R_REGIONKEY |   R_NAME   | R_COMMENT  |
+-------------+------------+------------+
| 0           | [B@89a39b8 | [B@45436371 |
| 1           | [B@4b35b33d | [B@328a905e |
| 2           | [B@270c253e | [B@e611d79 |
| 3           | [B@9d5fa4f | [B@3f245d94 |
| 4           | [B@6c7bbfee | [B@3075d9e6 |
+-------------+------------+------------+
5 rows selected (0.317 seconds)

No comments:

Post a Comment

Popular Posts