"Version" decides how many versions of data can be showed for each *column* for each *column family* for each *row*.
For example(Hbase 0.94.8):
Create a table with version=5 and update/insert the same column of the same row for 6 times.
create 't1', {NAME => 'f1', VERSIONS => 5}
put 't1','row1','f1:col1','1'
put 't1','row1','f1:col1','2'
put 't1','row1','f1:col1','3'
put 't1','row1','f1:col1','4'
put 't1','row1','f1:col1','5'
put 't1','row1','f1:col1','6'
1. Raw scan can see the latest 5 versions of data in order.
hbase(main):025:0> scan 't1', {RAW => true, VERSIONS => 6}
ROW COLUMN+CELL
row1 column=f1:col1, timestamp=1400264962097, value=6
row1 column=f1:col1, timestamp=1400264933707, value=5
row1 column=f1:col1, timestamp=1400264928122, value=4
row1 column=f1:col1, timestamp=1400264924764, value=3
row1 column=f1:col1, timestamp=1400264596173, value=2
1 row(s) in 0.0280 seconds
2. The 5 versions are per column, not per row.
Then update/insert another column for 7 times.put 't1','row2','f1:col1','2_1' put 't1','row2','f1:col1','2_2' put 't1','row2','f1:col1','2_3' put 't1','row2','f1:col1','2_4' put 't1','row2','f1:col1','2_5' put 't1','row2','f1:col1','2_6' put 't1','row2','f1:col1','2_7'
hbase(main):034:0> scan 't1', {RAW => true, VERSIONS => 10}
ROW COLUMN+CELL
row1 column=f1:col1, timestamp=1400264962097, value=6
row1 column=f1:col1, timestamp=1400264933707, value=5
row1 column=f1:col1, timestamp=1400264928122, value=4
row1 column=f1:col1, timestamp=1400264924764, value=3
row1 column=f1:col1, timestamp=1400264596173, value=2
row2 column=f1:col1, timestamp=1400265195640, value=2_7
row2 column=f1:col1, timestamp=1400265194944, value=2_6
row2 column=f1:col1, timestamp=1400265194927, value=2_5
row2 column=f1:col1, timestamp=1400265194908, value=2_4
row2 column=f1:col1, timestamp=1400265194883, value=2_3
2 row(s) in 0.0360 seconds
Delete row2 for column f1:col1.hbase(main):036:0> delete 't1','row2','f1:col1' 0 row(s) in 0.0120 seconds
3. Deleted column is shown as "type=DeleteColumn".
hbase(main):037:0> scan 't1', {RAW => true, VERSIONS => 10}
ROW COLUMN+CELL
row1 column=f1:col1, timestamp=1400264962097, value=6
row1 column=f1:col1, timestamp=1400264933707, value=5
row1 column=f1:col1, timestamp=1400264928122, value=4
row1 column=f1:col1, timestamp=1400264924764, value=3
row1 column=f1:col1, timestamp=1400264596173, value=2
row2 column=f1:col1, timestamp=1400265585864, type=DeleteColumn
row2 column=f1:col1, timestamp=1400265195640, value=2_7
row2 column=f1:col1, timestamp=1400265194944, value=2_6
row2 column=f1:col1, timestamp=1400265194927, value=2_5
row2 column=f1:col1, timestamp=1400265194908, value=2_4
row2 column=f1:col1, timestamp=1400265194883, value=2_3
2 row(s) in 0.0210 seconds
4. Deleted whole column family is always the 1st one in order.
Per scanning in hbase, "because family delete marker affects potentially many columns in this row, so in order to allow scanners to scan forward-only, the family delete markers need to be seen by a scanner first." Please try to understand below graph.hbase(main):009:0> scan 't1'
ROW COLUMN+CELL
row1 column=f1:col1, timestamp=1400264962097, value=6
row1 column=f1:col2, timestamp=1400267214363, value=col2_7
1 row(s) in 0.0150 seconds
hbase(main):010:0> scan 't1', {RAW => true, VERSIONS => 6}
ROW COLUMN+CELL
row1 column=f1:col1, timestamp=1400264962097, value=6
row1 column=f1:col1, timestamp=1400264933707, value=5
row1 column=f1:col1, timestamp=1400264928122, value=4
row1 column=f1:col1, timestamp=1400264924764, value=3
row1 column=f1:col1, timestamp=1400264596173, value=2
row1 column=f1:col2, timestamp=1400267214363, value=col2_7
row1 column=f1:col2, timestamp=1400267213932, value=col2_6
row1 column=f1:col2, timestamp=1400267213914, value=col2_5
row1 column=f1:col2, timestamp=1400267213889, value=col2_4
row1 column=f1:col2, timestamp=1400267213862, value=col2_3
row2 column=f1:col1, timestamp=1400265585864, type=DeleteColumn
2 row(s) in 0.0490 seconds
hbase(main):011:0> deleteall 't1','row1'
0 row(s) in 0.0400 seconds
hbase(main):015:0> scan 't1', {RAW => true, VERSIONS => 6}
ROW COLUMN+CELL
row1 column=f1:, timestamp=1400274062009, type=DeleteFamily
row1 column=f1:col1, timestamp=1400264962097, value=6
row1 column=f1:col1, timestamp=1400264933707, value=5
row1 column=f1:col1, timestamp=1400264928122, value=4
row1 column=f1:col1, timestamp=1400264924764, value=3
row1 column=f1:col1, timestamp=1400264596173, value=2
row1 column=f1:col2, timestamp=1400267214363, value=col2_7
row1 column=f1:col2, timestamp=1400267213932, value=col2_6
row1 column=f1:col2, timestamp=1400267213914, value=col2_5
row1 column=f1:col2, timestamp=1400267213889, value=col2_4
row1 column=f1:col2, timestamp=1400267213862, value=col2_3
row2 column=f1:col1, timestamp=1400265585864, type=DeleteColumn
2 row(s) in 0.0390 seconds
hbase(main):016:0> scan 't1'
ROW COLUMN+CELL
0 row(s) in 0.0130 seconds
hbase(main):017:0> put 't1','row1','f1:col1','supernewrow'
0 row(s) in 0.0220 seconds
hbase(main):018:0> scan 't1'
ROW COLUMN+CELL
row1 column=f1:col1, timestamp=1400274112052, value=supernewrow
1 row(s) in 0.0140 seconds
hbase(main):019:0> scan 't1', {RAW => true, VERSIONS => 6}
ROW COLUMN+CELL
row1 column=f1:, timestamp=1400274062009, type=DeleteFamily
row1 column=f1:col1, timestamp=1400274112052, value=supernewrow
row1 column=f1:col1, timestamp=1400264962097, value=6
row1 column=f1:col1, timestamp=1400264933707, value=5
row1 column=f1:col1, timestamp=1400264928122, value=4
row1 column=f1:col1, timestamp=1400264924764, value=3
row1 column=f1:col2, timestamp=1400267214363, value=col2_7
row1 column=f1:col2, timestamp=1400267213932, value=col2_6
row1 column=f1:col2, timestamp=1400267213914, value=col2_5
row1 column=f1:col2, timestamp=1400267213889, value=col2_4
row1 column=f1:col2, timestamp=1400267213862, value=col2_3
row2 column=f1:col1, timestamp=1400265585864, type=DeleteColumn
2 row(s) in 0.0260 seconds

No comments:
Post a Comment