You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Adrien Mogenet <ad...@gmail.com> on 2012/12/08 20:19:58 UTC

Minor compactions and impact of number of HFiles within a Store

Hi there,

I was about to tune major/minor compaction behavior and I'm wondering what
are the exact (negative) aspects of handling lots (let say between 3 and
20) HFiles within a single region, considering there are only a few regions
(~10) per RS.

My 2 cents :
- OS/HBase have to handle more file descriptors
- A random GET would have to potentially search into several files (but I
setup bloom filters)
- Overhead of IndexSize / BloomSize is a bit larger than with a single file
- We might increase data locality when rewriting a new HFile

And my questions :
- How could it be critical ?
- Do the minor compactions help reducing major compaction time ? (e.g. for
a same data volume, is it faster to merge 3 files rather than 20 files ?)
- Considering I have 100% data-locality, compaction will generate lots of
disk-IO reading the HFile, but is the network layer "blocking"   anything
when writing new HFile and spreading these new HFile's HDFS blocks among
Datanode ?

-- 
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me