You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Anastasia Braginsky (JIRA)" <ji...@apache.org> on 2018/01/03 10:11:01 UTC

[jira] [Commented] (HBASE-19506) Support variable sized chunks from ChunkCreator

    [ https://issues.apache.org/jira/browse/HBASE-19506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309415#comment-16309415 ] 

Anastasia Braginsky commented on HBASE-19506:
---------------------------------------------

Now we come back to this idea. Looking deeper into details, the size of cell-representation is 20Bytes, the chunk size is 2MB (2097152Bytes), therefore one chunk can hold representations of 104857.6 cells. 

How much cells are inserted before in-memory flush, very depends on the workload. However, seeking for some average, let's say cell size is 1KB and we flush in-memory every 12.8MB (10% out of 128MB), thus 12.8MB/1KB=12.8KB ~= 12800 cells are written (in this case).

After that each 5 immutable segments in pipeline are compacted, so 5 under-utilized index chunks are released, and one index chunk with about 52800 cell-representations is allocated (which is about half-capacity). So looks like indeed there is some under utilization of index chunks, however the index chunks are at most 5 per memstore, so this impact can be not so significant.

As for solution, we suggest to create another pool for "small" chunks in ChunkCreator. Let's say chunks of 256KB size. It means we will need to define also new type of chunks. But it is very important to avoid on-demand allocation. This "small-chunks" pool can be pre-allocated and its chunks can be reused.

  

> Support variable sized chunks from ChunkCreator
> -----------------------------------------------
>
>                 Key: HBASE-19506
>                 URL: https://issues.apache.org/jira/browse/HBASE-19506
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Anastasia Braginsky
>
> When CellChunkMap is created it allocates a special index chunk (or chunks) where array of cell-representations is stored. When the number of cell-representations is small, it is preferable to allocate a chunk smaller than a default value which is 2MB.
> On the other hand, those "non-standard size" chunks can not be used in pool. On-demand allocations in off-heap are costly. So this JIRA is about to investigate the trade of between memory usage and the final performance. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)