You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Anoop Sam John (JIRA)" <ji...@apache.org> on 2019/05/29 05:57:00 UTC

[jira] [Commented] (HBASE-22483) Maybe it's better to use 65KB as the default buffer size in ByteBuffAllocator

    [ https://issues.apache.org/jira/browse/HBASE-22483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16850519#comment-16850519 ] 

Anoop Sam John commented on HBASE-22483:
----------------------------------------

Good one..  Ya now we try to read the block data into these pooled BBs, it would be better to consider the extra.  In BC also, the size of the buckets are chosen like +1KB extra (4+1, 8+1...  64+1,.....  512+1)..  Adding  1 KB extra is very much fine.

> Maybe it's better to use 65KB as the default buffer size in ByteBuffAllocator
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-22483
>                 URL: https://issues.apache.org/jira/browse/HBASE-22483
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Major
>         Attachments: 121240.stack, BucketCacheWriter-is-busy.png, checksum-stacktrace.png
>
>
> There're some reason why it's better to choose 64KB as the default buffer size: 
> 1. Almost all of the data block have a block size: 64KB + delta, whose delta is very small, depends on the size of lastKeyValue. If we use the default hbase.ipc.server.allocator.buffer.size=64KB, then each block will be allocated as a MultiByteBuff: one 64KB DirectByteBuffer and delta bytes HeapByteBuffer, the HeapByteBuffer will increase the GC pressure. Ideally, we should let the data block to be allocated as a SingleByteBuff, it has simpler data structure, faster access speed, less heap usage... 
> 2. In my benchmark, I found some checksum stack traces . (see [checksum-stacktrace.png |https://issues.apache.org/jira/secure/attachment/12969905/checksum-stacktrace.png]) 
>  Since the block are MultiByteBuff, so we have to calculate the checksum by an temp heap copying ( see HBASE-21917), while if we're a SingleByteBuff, we can speed the checksum by calling the hadoop' checksum in native lib, it's more faster.
> 3. Seems the BucketCacheWriters were always busy because of the higher cost of copying from MultiByteBuff to DirectByteBuffer.  For SingleByteBuff, we can just use the unsafe array copying while for MultiByteBuff we have to copy byte one by one.
> Anyway, I will give a benchmark for this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)