Friday, October 7, 2016

How to use Drill to parse ResourceManager Rest API results

Goal:

ResourceManager REST API provides all detailed information regarding YARN applications, metrics of YARN cluster, etc.
This article provides a simple demo on how to use Drill to query the result of the REST APIs.
One example use case is to show the largest YARN applications which are currently running.

Env:

Drill 1.8
Hadoop 2.7.0

Solution:

1. Save the current output of RM REST API as a json file on MFS(or HDFS).

curl -v -X GET -H "Content-Type: application/json" http://s1.poc.com:8088/ws/v1/cluster/apps > /mapr/mysuper.cluster.com/tmp/restapi/data.json
Here "s1.poc.com" is the RM node.

2. Use Drill to parse the json file to answer the question Cluster Admin want to ask.

For example, to show the largest YARN applications which are currently running:
with tmp as
(
select flatten(t.apps.app) as col from dfs.tmp.`restapi/data.json` t
)
select tmp.col.id,tmp.col.`user` as `user`,tmp.col.runningContainers as `runningContainers`,tmp.col.allocatedMB as `allocatedMB`,tmp.col.allocatedVCores as `allocatedVCores` from tmp
where tmp.col.state='RUNNING'
order by tmp.col.runningContainers desc;
The result is:
+---------------------------------+-------+--------------------+--------------+------------------+
|             EXPR$0              | user  | runningContainers  | allocatedMB  | allocatedVCores  |
+---------------------------------+-------+--------------------+--------------+------------------+
| application_1475192050844_0003  | mapr  | 4                  | 16384        | 4                |
| application_1475192050844_0004  | mapr  | 1                  | 2048         | 1                |
+---------------------------------+-------+--------------------+--------------+------------------+
2 rows selected (0.525 seconds)


1 comment:

  1. Some reader is asking how to parse the job configuration. Actually that is related to Job History Server Rest API instead of RM rest API. Please refer to below Doc:
    https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/HistoryServerRest.html#Job_Conf_API

    One example is:
    1. Download the json file which contains the configuration for specific job ID:
    curl -v -X GET -H "Content-Type: application/json" http://v7.poc.com:19888/ws/v1/history/mapreduce/jobs/job_1473379941569_0096/conf > /mapr/my2.cluster.com/tmp/restapi/hs_data.json


    2.Using Drill to query it to get what is the value for parameter "hive.query.string":
    with tmp as
    (
    select flatten(t.conf.property) as col from dfs.tmp.`restapi/hs_data.json` t
    )
    select tmp.col.`value` from tmp where tmp.col.`name`='hive.query.string';

    +---------------------------------+
    | EXPR$0 |
    +---------------------------------+
    | select count(*) from passwords |
    +---------------------------------+
    1 row selected (0.232 seconds)

    ReplyDelete

Popular Posts