Tuesday, December 23, 2014

How to troubleshoot Hue issue using Restful API calls to httpfs

Goal:

Hue is operating files or directories on MapR-FS by sending restful API calls to httpfs.
This article shows how to troubleshoot Hue issues using the same set of restful API calls.

Solution:

1. Enable DEBUG logging for runcpserver.log.

Runcpserver is a web server that provides the core web functionality of Hue.
To enable the DEBUG level logging, please change /opt/mapr/hue/hue-<version>/desktop/conf/log.conf:
[handler_logfile]
class=handlers.RotatingFileHandler
# Choices are DEBUG, INFO, WARNING, ERROR, CRITICAL
level=DEBUG
formatter=default
args=('%LOG_DIR%/%PROC_NAME%.log', 'a', 1000000, 3)
After that, restart Hue.
maprcli node services -name hue -action stop -nodes hostname
maprcli node services -name hue -action start -nodes hostname

2. Make sure httpfs process is running fine.

Use below command to identify which server is running httpfs in Hadoop cluster.
maprcli node list -columns service
Go to that server, and check if it is listening on the port(default is 14000).
[root]# lsof -i:14000
COMMAND    PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
python2.6 7048 mapr   13u  IPv4 35689564      0t0  TCP mapr4-3:48061->mapr4-3:scotty-ft (CLOSE_WAIT)
java      7848 mapr  134u  IPv6 35688826      0t0  TCP *:scotty-ft (LISTEN)

3. Verify hue.ini is pointing to correct httpfs IP and port.

In section "[[hdfs_clusters]]" of hue.ini, for example:
      # Use WebHdfs/HttpFs as the communication mechanism.
      # Domain should be the NameNode or HttpFs host.
      # Default port is 14000 for HttpFs.
      webhdfs_url=http://mapr4-3:14000/webhdfs/v1

4. Troubleshoot Hue issues by monitoring runcpserver.log to capture the restful API calls to httpfs.

For example, if we copy a file /tmp/mapr/Master.csv to /tmp/mapr/Master.csv.2 using Hue file browser, we can capture below calls and so on.
a. Get metadata of source file.
From runcpserver.log:
GET /webhdfs/v1/tmp/mapr/Master.csv?op=GETFILESTATUS&user.name=mapr&doas=mapr HTTP/1.1
Then we can use below curl command to manually check(Note: "mapr4-3" is the hostname of httpfs server):
# curl "http://mapr4-3:14000/webhdfs/v1/tmp/mapr/Master.csv?op=GETFILESTATUS&user.name=mapr"
{"FileStatus":{"pathSuffix":"","type":"FILE","length":6049426,"owner":"mapr","group":"mapr","permission":"755","accessTime":1419263547000,"modificationTime":1419263556835,"blockSize":268435456,"replication":3}}
b. Open and read source file.
From runcpserver.log:
GET /webhdfs/v1/tmp/mapr/Master.csv?length=67108864&op=OPEN&user.name=mapr&offset=0&doas=mapr HTTP/1.1
We can use below curl command to verify the same:
curl -X GET -L "http://mapr4-3:14000/webhdfs/v1/tmp/mapr/Master.csv?length=67108864&op=OPEN&user.name=mapr&offset=0&doas=mapr"
Please refer to webhdfs API call for more details on syntax.

Note:The reason of adding "&user.name=mapr" is to avoid below error:
HTTP Status 403 - Anonymous requests are disallowed
If results from restful API calls are not expected results, the issue could be in httpfs side.


No comments:

Post a Comment

Popular Posts