Friday, October 27, 2017

How to install Thrift and run a sample Hbase thrift job towards Hbase Thrift Gateway on MapR Cluster

Goal:

How to install Thrift and run a sample Hbase thrift job towards Hbase Thrift Gateway on MapR Cluster.
The example job is written in Python, and it just scans a MapR-DB table.

Env:

Hbase 1.1.1
MapR 5.2
CentOS 6.5

Solution:

Before following this article, please follow MapR Documentation to install and start Hbase thrift service.

1. Install Thrift

The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.

Please follow this link to download and install Thrift on CentOS env.
After installation, please identify the thrift source code location. For example:
/home/mapr/hao/thrift/thrift/lib/py/src

2.Download Hbase Source Code from github based on your Hbase version

Here my Hbase version is 1.1.1.
git clone git@github.com:apache/hbase.git
cd hbase
git checkout remotes/origin/branch-1.1

Please identify the location of Hbase.thrift file after downloading the Hbase source code:
./hbase-thrift/src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift

3. Generate the bindings for Python language

Copy above Hbase.thrift file identified in step #2 to current working directory and run below command:
thrift -gen py ./Hbase.thrift
mv gen-py/* .
rm -rf gen-py/

Copy above thrift source code identifyed in step #1 here:
mkdir thrift
cp -rp /home/mapr/hao/thrift/thrift/lib/py/src/* ./thrift/

4. Install needed python library

You may skip this step if you already have installed needed python library.
yum install python-pip
pip install six

5. Create a sample Hbase thrift job named "test.py" in Python

from thrift.transport import TSocket
from thrift.protocol import TBinaryProtocol
from thrift.transport import TTransport
from hbase import Hbase

# host is where Hbase thrift service is running.
host = "localhost"
# port is Hbase thrift service default port -- 9090.
port = "9090"
# tablename is the MapR-DB table which this sample job scans.
tablename = "/user/mapr/maprdb_sample_table"
# numRows is the number of rows that "scannerGetList" retrieves from the scanner at once.
numRows = 5
# columnName is the column which will be printed out later.
columnName = "cf:mycolumn"

# Connect to HBase Thrift server
transport = TTransport.TBufferedTransport(TSocket.TSocket(host, port))
protocol = TBinaryProtocol.TBinaryProtocolAccelerated(transport)

# Create and open the client connection
client = Hbase.Client(protocol)
transport.open()

# Scan the MapR-DB table
scan = Hbase.TScan(startRow="111111", stopRow="22222")
scannerId = client.scannerOpenWithScan(tablename, scan, None)
row = client.scannerGet(scannerId)
rowList = client.scannerGetList(scannerId,numRows)
 
while rowList:
          for row in rowList:
                    message = row.columns.get(columnName).value
                    rowKey = row.row
                    print "rowKey = " + rowKey + ", columnValue = " + message
          rowList = client.scannerGetList(scannerId,numRows)

client.scannerClose(scannerId)

# Close the client connection
transport.close()

Above sample job scans a MapR-DB table named "/user/mapr/maprdb_sample_table" and prints out row key and column "cf:mycolumn"'s value.

6. Execute Thrift job in Python

python ./test.py

10 comments:

  1. An amazing blog, it is very useful and Excellent Blog! I would like to say thanks for the efforts you have made in writing this post.


    Data Science

    ReplyDelete
  2. The material and aggregation is excellent and telltale as comfortably. Data Science Course in Pune

    ReplyDelete
  3. Nice information, valuable and excellent design, as share good stuff with good ideas and concepts, lots of great information and inspiration, both of which I need, thanks to offer such a helpful information here.





    DATA SCIENCE COURSE MALAYSIA

    ReplyDelete
  4. i am for the first time here. I found this board and I in finding It truly helpful & it helped me out a lot. I hope to present something back and help others such as you helped me.
    AI learning course malaysia

    ReplyDelete
  5. I just got to this amazing site not long ago. I was actually captured with the piece of resources you have got here. Big thumbs up for making such wonderful blog page digital marketing course in singapore

    ReplyDelete

  6. I am looking for and I love to post a comment that "The content of your post is awesome" Great work!
    love it

    ReplyDelete
  7. I finally found great post here.I will get back here. I just added your blog to my bookmark sites. thanks.Quality posts is the crucial to invite the visitors to visit the web page, that's what this web page is providing.
    big data course malaysia

    ReplyDelete
  8. Nice Post...I have learn some new information.thanks for sharing.
    Click here for ExcelR Business Analytics Course

    ReplyDelete

Popular Posts