You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2016/01/05 08:01:39 UTC

[jira] [Updated] (HBASE-14468) Compaction improvements: FIFO compaction policy

     [ https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-14468:
----------------------------------
    Attachment: 14468-0.98-v2.txt

V2 for 0.98.
* uses compactionPolicy correctly in checkCompactionPolicy
* gets rid of a bunch of trailing white space (I hate to the whitespace nazi :) )

I did some basic testing on this, but nothing intensive.

> Compaction improvements: FIFO compaction policy
> -----------------------------------------------
>
>                 Key: HBASE-14468
>                 URL: https://issues.apache.org/jira/browse/HBASE-14468
>             Project: HBase
>          Issue Type: Improvement
>          Components: Compaction, Performance
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>             Fix For: 2.0.0, 1.2.0, 1.3.0
>
>         Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the source of another data (after additional processing). Example: Raw time-series vs. time-based rollup aggregates and compacted time-series. We collect raw time-series and store them into CF with FIFO compaction policy, periodically we run  task which creates rollup aggregates and compacts time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). Say we have local SSD (1TB) which we can use as a block cache. No need for compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and network), we do not evict hot data from a block cache. The result: improved throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
>     desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>       FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
>     desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>       FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be disabled, either by setting explicitly DisabledRegionSplitPolicy or by setting ConstantSizeRegionSplitPolicy and very large max region size. You will have to increase to a very large number store's blocking file number : *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)