Friday, October 27, 2017

How to install Thrift and run a sample Hbase thrift job towards Hbase Thrift Gateway on MapR Cluster


How to install Thrift and run a sample Hbase thrift job towards Hbase Thrift Gateway on MapR Cluster.
The example job is written in Python, and it just scans a MapR-DB table.


Hbase 1.1.1
MapR 5.2
CentOS 6.5


Before following this article, please follow MapR Documentation to install and start Hbase thrift service.

1. Install Thrift

The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.

Please follow this link to download and install Thrift on CentOS env.
After installation, please identify the thrift source code location. For example:

2.Download Hbase Source Code from github based on your Hbase version

Here my Hbase version is 1.1.1.
git clone
cd hbase
git checkout remotes/origin/branch-1.1

Please identify the location of Hbase.thrift file after downloading the Hbase source code:

3. Generate the bindings for Python language

Copy above Hbase.thrift file identified in step #2 to current working directory and run below command:
thrift -gen py ./Hbase.thrift
mv gen-py/* .
rm -rf gen-py/

Copy above thrift source code identifyed in step #1 here:
mkdir thrift
cp -rp /home/mapr/hao/thrift/thrift/lib/py/src/* ./thrift/

4. Install needed python library

You may skip this step if you already have installed needed python library.
yum install python-pip
pip install six

5. Create a sample Hbase thrift job named "" in Python

from thrift.transport import TSocket
from thrift.protocol import TBinaryProtocol
from thrift.transport import TTransport
from hbase import Hbase

# host is where Hbase thrift service is running.
host = "localhost"
# port is Hbase thrift service default port -- 9090.
port = "9090"
# tablename is the MapR-DB table which this sample job scans.
tablename = "/user/mapr/maprdb_sample_table"
# numRows is the number of rows that "scannerGetList" retrieves from the scanner at once.
numRows = 5
# columnName is the column which will be printed out later.
columnName = "cf:mycolumn"

# Connect to HBase Thrift server
transport = TTransport.TBufferedTransport(TSocket.TSocket(host, port))
protocol = TBinaryProtocol.TBinaryProtocolAccelerated(transport)

# Create and open the client connection
client = Hbase.Client(protocol)

# Scan the MapR-DB table
scan = Hbase.TScan(startRow="111111", stopRow="22222")
scannerId = client.scannerOpenWithScan(tablename, scan, None)
row = client.scannerGet(scannerId)
rowList = client.scannerGetList(scannerId,numRows)
while rowList:
          for row in rowList:
                    message = row.columns.get(columnName).value
                    rowKey = row.row
                    print "rowKey = " + rowKey + ", columnValue = " + message
          rowList = client.scannerGetList(scannerId,numRows)


# Close the client connection

Above sample job scans a MapR-DB table named "/user/mapr/maprdb_sample_table" and prints out row key and column "cf:mycolumn"'s value.

6. Execute Thrift job in Python

python ./

No comments:

Post a Comment

Popular Posts