You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by WangYQ <wa...@163.com> on 2015/11/13 09:51:33 UTC

improvement on compaction

in hbase0.98.10, DefaultCompactPolicy sort HFiles using seq_id as the main factor.the new file created after compaction will get ist seq_id from hregion,if we have some HFiles, seq_ids are as follows:f1  4f2   6f3   8f4    9f5   12

if we compact file f2,f3,f4, get f6_new, we will get seq_id larger than f5, say 14, for example
f1  4
f5   12
f6_new    14


when we do compact, we will delete HFiles whose maxTimeStamp is expire,
but in the example above, HFiles with small timestamp are compacted with files with large timestamp, just because they have similar  seq_id
so will decrease the chance of delete whole old HFiles


so, i think we can modify the way new HFile create from compaction get seq_id, just get the max seq_id from the files compacted
in the above example, the seq_id of file f6_new will be max(6,8,9) = 9
in this way, files with similar  timestamp will also have similar  seq_id, will increase the chance of deleting whole HFiles, reduce the pressure of compaction


so, do you think this will works
and, are there any problem if i set seq_id like this