You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2016/05/24 04:15:12 UTC

[jira] [Updated] (KUDU-1465) Large allocations for scanner result buffers harm allocator thread caching

     [ https://issues.apache.org/jira/browse/KUDU-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated KUDU-1465:
------------------------------
    Description: 
I was looking at the performance of a random-read stress test on a 70 node cluster and found that threads were often spending time in allocator contention, particularly when deallocating RpcSidecar objects. After a bit of analysis, I determined this is because we always preallocate buffers of 1MB (the default batch size) even if the response is only going to be a single row. Such large allocations go directly to the central freelist instead of using thread-local caches.

As a simple test, I used the set_flag command to drop the default batch size to 4KB, and the read throughput (reads/second) increased substantially.

  was:
I was looking at the performance of a random-read stress test on a 70 node cluster and found that threads were often spending time in allocator contention, particularly when deallocating RpcSidecar objects. After a bit of analysis, I determined this is because we always preallocate buffers of 1MB (the default batch size) even if the response is only going to be a single row.

As a simple test, I used the set_flag command to drop the default batch size to 4KB, and the read throughput (reads/second) increased substantially.


> Large allocations for scanner result buffers harm allocator thread caching
> --------------------------------------------------------------------------
>
>                 Key: KUDU-1465
>                 URL: https://issues.apache.org/jira/browse/KUDU-1465
>             Project: Kudu
>          Issue Type: Bug
>          Components: perf
>    Affects Versions: 0.8.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> I was looking at the performance of a random-read stress test on a 70 node cluster and found that threads were often spending time in allocator contention, particularly when deallocating RpcSidecar objects. After a bit of analysis, I determined this is because we always preallocate buffers of 1MB (the default batch size) even if the response is only going to be a single row. Such large allocations go directly to the central freelist instead of using thread-local caches.
> As a simple test, I used the set_flag command to drop the default batch size to 4KB, and the read throughput (reads/second) increased substantially.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)