Friday, May 16, 2014

Hbase Raw Scan

Hbase raw scan can see invisible(aka deleted) rows.
"Version" decides how many versions of data can be showed for each *column* for each *column family* for each *row*.

For example(Hbase 0.94.8):
Create a table with version=5 and update/insert the same column of the same row for 6 times.
create 't1', {NAME => 'f1', VERSIONS => 5}
put 't1','row1','f1:col1','1'
put 't1','row1','f1:col1','2'
put 't1','row1','f1:col1','3'
put 't1','row1','f1:col1','4'
put 't1','row1','f1:col1','5'
put 't1','row1','f1:col1','6'

1. Raw scan can see the latest 5 versions of data in order. 

hbase(main):025:0> scan 't1', {RAW => true, VERSIONS => 6}
ROW                                        COLUMN+CELL
 row1                                      column=f1:col1, timestamp=1400264962097, value=6
 row1                                      column=f1:col1, timestamp=1400264933707, value=5
 row1                                      column=f1:col1, timestamp=1400264928122, value=4
 row1                                      column=f1:col1, timestamp=1400264924764, value=3
 row1                                      column=f1:col1, timestamp=1400264596173, value=2
1 row(s) in 0.0280 seconds

2. The 5 versions are per column, not per row.

Then update/insert another column for 7 times.
put 't1','row2','f1:col1','2_1'
put 't1','row2','f1:col1','2_2'
put 't1','row2','f1:col1','2_3'
put 't1','row2','f1:col1','2_4'
put 't1','row2','f1:col1','2_5'
put 't1','row2','f1:col1','2_6'
put 't1','row2','f1:col1','2_7'
hbase(main):034:0> scan 't1', {RAW => true, VERSIONS => 10}
ROW                                        COLUMN+CELL
 row1                                      column=f1:col1, timestamp=1400264962097, value=6
 row1                                      column=f1:col1, timestamp=1400264933707, value=5
 row1                                      column=f1:col1, timestamp=1400264928122, value=4
 row1                                      column=f1:col1, timestamp=1400264924764, value=3
 row1                                      column=f1:col1, timestamp=1400264596173, value=2
 row2                                      column=f1:col1, timestamp=1400265195640, value=2_7
 row2                                      column=f1:col1, timestamp=1400265194944, value=2_6
 row2                                      column=f1:col1, timestamp=1400265194927, value=2_5
 row2                                      column=f1:col1, timestamp=1400265194908, value=2_4
 row2                                      column=f1:col1, timestamp=1400265194883, value=2_3
2 row(s) in 0.0360 seconds
Delete row2 for column f1:col1.
hbase(main):036:0> delete 't1','row2','f1:col1'
0 row(s) in 0.0120 seconds

3. Deleted column is shown as "type=DeleteColumn".

hbase(main):037:0> scan 't1', {RAW => true, VERSIONS => 10}
ROW                                        COLUMN+CELL
 row1                                      column=f1:col1, timestamp=1400264962097, value=6
 row1                                      column=f1:col1, timestamp=1400264933707, value=5
 row1                                      column=f1:col1, timestamp=1400264928122, value=4
 row1                                      column=f1:col1, timestamp=1400264924764, value=3
 row1                                      column=f1:col1, timestamp=1400264596173, value=2
 row2                                      column=f1:col1, timestamp=1400265585864, type=DeleteColumn
 row2                                      column=f1:col1, timestamp=1400265195640, value=2_7
 row2                                      column=f1:col1, timestamp=1400265194944, value=2_6
 row2                                      column=f1:col1, timestamp=1400265194927, value=2_5
 row2                                      column=f1:col1, timestamp=1400265194908, value=2_4
 row2                                      column=f1:col1, timestamp=1400265194883, value=2_3
2 row(s) in 0.0210 seconds

4. Deleted whole column family is always the 1st one in order.

Per scanning in hbase,  "because family delete marker affects potentially many columns in this row, so in order to allow scanners to scan forward-only, the family delete markers need to be seen by a scanner first." Please try to understand below graph.


hbase(main):009:0> scan 't1'
ROW                                        COLUMN+CELL
 row1                                      column=f1:col1, timestamp=1400264962097, value=6
 row1                                      column=f1:col2, timestamp=1400267214363, value=col2_7
1 row(s) in 0.0150 seconds

hbase(main):010:0> scan 't1', {RAW => true, VERSIONS => 6}
ROW                                        COLUMN+CELL
 row1                                      column=f1:col1, timestamp=1400264962097, value=6
 row1                                      column=f1:col1, timestamp=1400264933707, value=5
 row1                                      column=f1:col1, timestamp=1400264928122, value=4
 row1                                      column=f1:col1, timestamp=1400264924764, value=3
 row1                                      column=f1:col1, timestamp=1400264596173, value=2
 row1                                      column=f1:col2, timestamp=1400267214363, value=col2_7
 row1                                      column=f1:col2, timestamp=1400267213932, value=col2_6
 row1                                      column=f1:col2, timestamp=1400267213914, value=col2_5
 row1                                      column=f1:col2, timestamp=1400267213889, value=col2_4
 row1                                      column=f1:col2, timestamp=1400267213862, value=col2_3
 row2                                      column=f1:col1, timestamp=1400265585864, type=DeleteColumn
2 row(s) in 0.0490 seconds

hbase(main):011:0> deleteall 't1','row1'
0 row(s) in 0.0400 seconds

hbase(main):015:0> scan 't1', {RAW => true, VERSIONS => 6}
ROW                                        COLUMN+CELL
 row1                                      column=f1:, timestamp=1400274062009, type=DeleteFamily
 row1                                      column=f1:col1, timestamp=1400264962097, value=6
 row1                                      column=f1:col1, timestamp=1400264933707, value=5
 row1                                      column=f1:col1, timestamp=1400264928122, value=4
 row1                                      column=f1:col1, timestamp=1400264924764, value=3
 row1                                      column=f1:col1, timestamp=1400264596173, value=2
 row1                                      column=f1:col2, timestamp=1400267214363, value=col2_7
 row1                                      column=f1:col2, timestamp=1400267213932, value=col2_6
 row1                                      column=f1:col2, timestamp=1400267213914, value=col2_5
 row1                                      column=f1:col2, timestamp=1400267213889, value=col2_4
 row1                                      column=f1:col2, timestamp=1400267213862, value=col2_3
 row2                                      column=f1:col1, timestamp=1400265585864, type=DeleteColumn
2 row(s) in 0.0390 seconds

hbase(main):016:0> scan 't1'
ROW                                        COLUMN+CELL
0 row(s) in 0.0130 seconds

hbase(main):017:0> put 't1','row1','f1:col1','supernewrow'
0 row(s) in 0.0220 seconds

hbase(main):018:0> scan 't1'
ROW                                        COLUMN+CELL
 row1                                      column=f1:col1, timestamp=1400274112052, value=supernewrow
1 row(s) in 0.0140 seconds

hbase(main):019:0> scan 't1', {RAW => true, VERSIONS => 6}
ROW                                        COLUMN+CELL
 row1                                      column=f1:, timestamp=1400274062009, type=DeleteFamily
 row1                                      column=f1:col1, timestamp=1400274112052, value=supernewrow
 row1                                      column=f1:col1, timestamp=1400264962097, value=6
 row1                                      column=f1:col1, timestamp=1400264933707, value=5
 row1                                      column=f1:col1, timestamp=1400264928122, value=4
 row1                                      column=f1:col1, timestamp=1400264924764, value=3
 row1                                      column=f1:col2, timestamp=1400267214363, value=col2_7
 row1                                      column=f1:col2, timestamp=1400267213932, value=col2_6
 row1                                      column=f1:col2, timestamp=1400267213914, value=col2_5
 row1                                      column=f1:col2, timestamp=1400267213889, value=col2_4
 row1                                      column=f1:col2, timestamp=1400267213862, value=col2_3
 row2                                      column=f1:col1, timestamp=1400265585864, type=DeleteColumn
2 row(s) in 0.0260 seconds

No comments:

Post a Comment

Popular Posts