You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by WangYQ <wa...@163.com> on 2015/11/13 09:51:33 UTC
improvement on compaction
in hbase0.98.10, DefaultCompactPolicy sort HFiles using seq_id as the main factor.the new file created after compaction will get ist seq_id from hregion,if we have some HFiles, seq_ids are as follows:f1 4f2 6f3 8f4 9f5 12
if we compact file f2,f3,f4, get f6_new, we will get seq_id larger than f5, say 14, for example
f1 4
f5 12
f6_new 14
when we do compact, we will delete HFiles whose maxTimeStamp is expire,
but in the example above, HFiles with small timestamp are compacted with files with large timestamp, just because they have similar seq_id
so will decrease the chance of delete whole old HFiles
so, i think we can modify the way new HFile create from compaction get seq_id, just get the max seq_id from the files compacted
in the above example, the seq_id of file f6_new will be max(6,8,9) = 9
in this way, files with similar timestamp will also have similar seq_id, will increase the chance of deleting whole HFiles, reduce the pressure of compaction
so, do you think this will works
and, are there any problem if i set seq_id like this