Friday, October 27, 2017

How to install Thrift and run a sample Hbase thrift job towards Hbase Thrift Gateway on MapR Cluster

Goal:

How to install Thrift and run a sample Hbase thrift job towards Hbase Thrift Gateway on MapR Cluster.
The example job is written in Python, and it just scans a MapR-DB table.

Env:

Hbase 1.1.1
MapR 5.2
CentOS 6.5

Solution:

Before following this article, please follow MapR Documentation to install and start Hbase thrift service.

1. Install Thrift

The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.

Please follow this link to download and install Thrift on CentOS env.
After installation, please identify the thrift source code location. For example:
/home/mapr/hao/thrift/thrift/lib/py/src

2.Download Hbase Source Code from github based on your Hbase version

Here my Hbase version is 1.1.1.
git clone git@github.com:apache/hbase.git
cd hbase
git checkout remotes/origin/branch-1.1

Please identify the location of Hbase.thrift file after downloading the Hbase source code:
./hbase-thrift/src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift

3. Generate the bindings for Python language

Copy above Hbase.thrift file identified in step #2 to current working directory and run below command:
thrift -gen py ./Hbase.thrift
mv gen-py/* .
rm -rf gen-py/

Copy above thrift source code identifyed in step #1 here:
mkdir thrift
cp -rp /home/mapr/hao/thrift/thrift/lib/py/src/* ./thrift/

4. Install needed python library

You may skip this step if you already have installed needed python library.
yum install python-pip
pip install six

5. Create a sample Hbase thrift job named "test.py" in Python

from thrift.transport import TSocket
from thrift.protocol import TBinaryProtocol
from thrift.transport import TTransport
from hbase import Hbase

# host is where Hbase thrift service is running.
host = "localhost"
# port is Hbase thrift service default port -- 9090.
port = "9090"
# tablename is the MapR-DB table which this sample job scans.
tablename = "/user/mapr/maprdb_sample_table"
# numRows is the number of rows that "scannerGetList" retrieves from the scanner at once.
numRows = 5
# columnName is the column which will be printed out later.
columnName = "cf:mycolumn"

# Connect to HBase Thrift server
transport = TTransport.TBufferedTransport(TSocket.TSocket(host, port))
protocol = TBinaryProtocol.TBinaryProtocolAccelerated(transport)

# Create and open the client connection
client = Hbase.Client(protocol)
transport.open()

# Scan the MapR-DB table
scan = Hbase.TScan(startRow="111111", stopRow="22222")
scannerId = client.scannerOpenWithScan(tablename, scan, None)
row = client.scannerGet(scannerId)
rowList = client.scannerGetList(scannerId,numRows)
 
while rowList:
          for row in rowList:
                    message = row.columns.get(columnName).value
                    rowKey = row.row
                    print "rowKey = " + rowKey + ", columnValue = " + message
          rowList = client.scannerGetList(scannerId,numRows)

client.scannerClose(scannerId)

# Close the client connection
transport.close()

Above sample job scans a MapR-DB table named "/user/mapr/maprdb_sample_table" and prints out row key and column "cf:mycolumn"'s value.

6. Execute Thrift job in Python

python ./test.py

1 comment:

  1. This is an awesome blog. Really very informative and creative contents. This concept is a good way to enhance the knowledge. Thanks for sharing.
    ExcelR business analytics course

    ReplyDelete

Popular Posts