Goal:
How to install Thrift and run a sample Hbase thrift job towards Hbase Thrift Gateway on MapR Cluster.The example job is written in Python, and it just scans a MapR-DB table.
Env:
Hbase 1.1.1MapR 5.2
CentOS 6.5
Solution:
Before following this article, please follow MapR Documentation to install and start Hbase thrift service.1. Install Thrift
The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.Please follow this link to download and install Thrift on CentOS env.
After installation, please identify the thrift source code location. For example:
/home/mapr/hao/thrift/thrift/lib/py/src
2.Download Hbase Source Code from github based on your Hbase version
Here my Hbase version is 1.1.1.git clone git@github.com:apache/hbase.git cd hbase git checkout remotes/origin/branch-1.1
Please identify the location of Hbase.thrift file after downloading the Hbase source code:
./hbase-thrift/src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift
3. Generate the bindings for Python language
Copy above Hbase.thrift file identified in step #2 to current working directory and run below command:thrift -gen py ./Hbase.thrift mv gen-py/* . rm -rf gen-py/
Copy above thrift source code identifyed in step #1 here:
mkdir thrift cp -rp /home/mapr/hao/thrift/thrift/lib/py/src/* ./thrift/
4. Install needed python library
You may skip this step if you already have installed needed python library.yum install python-pip pip install six
5. Create a sample Hbase thrift job named "test.py" in Python
from thrift.transport import TSocket
from thrift.protocol import TBinaryProtocol
from thrift.transport import TTransport
from hbase import Hbase
# host is where Hbase thrift service is running.
host = "localhost"
# port is Hbase thrift service default port -- 9090.
port = "9090"
# tablename is the MapR-DB table which this sample job scans.
tablename = "/user/mapr/maprdb_sample_table"
# numRows is the number of rows that "scannerGetList" retrieves from the scanner at once.
numRows = 5
# columnName is the column which will be printed out later.
columnName = "cf:mycolumn"
# Connect to HBase Thrift server
transport = TTransport.TBufferedTransport(TSocket.TSocket(host, port))
protocol = TBinaryProtocol.TBinaryProtocolAccelerated(transport)
# Create and open the client connection
client = Hbase.Client(protocol)
transport.open()
# Scan the MapR-DB table
scan = Hbase.TScan(startRow="111111", stopRow="22222")
scannerId = client.scannerOpenWithScan(tablename, scan, None)
row = client.scannerGet(scannerId)
rowList = client.scannerGetList(scannerId,numRows)
while rowList:
for row in rowList:
message = row.columns.get(columnName).value
rowKey = row.row
print "rowKey = " + rowKey + ", columnValue = " + message
rowList = client.scannerGetList(scannerId,numRows)
client.scannerClose(scannerId)
# Close the client connection
transport.close()
Above sample job scans a MapR-DB table named "/user/mapr/maprdb_sample_table" and prints out row key and column "cf:mycolumn"'s value.
6. Execute Thrift job in Python
python ./test.py
No comments:
Post a Comment