You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jim Plush (JIRA)" <ji...@apache.org> on 2015/09/11 18:41:46 UTC
[jira] [Comment Edited] (CASSANDRA-8894) Our default buffer size for (uncompressed) buffered reads should be smaller, and based on the expected record size

    [ https://issues.apache.org/jira/browse/CASSANDRA-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741102#comment-14741102 ] 

Jim Plush edited comment on CASSANDRA-8894 at 9/11/15 4:40 PM:
---------------------------------------------------------------

Uploading some testing screenshots I was doing the last couple days when trying to establish some benchmarks. With compression off I was looking to do 1million writes (RF3) with 50K reads on a 60 node cluster. with the default of 64K buffer size I/O was saturated and read latency was 100+ms. with the buffer at 4K I/O was quite stable at that rate. This was a straight row key look up test. e.g. no wide rows. It was reading way too much data for the queries. Would it be possible to have the buffer size set on a per table setting?

(screenshots attached)



was (Author: jimplush):
Uploading some testing screenshots I was doing the last couple days when trying to establish some benchmarks. With compression off I was looking to do 1million writes (RF3) with 50K reads on a 60 node cluster. with the default of 64K buffer size I/O was saturated and read latency was 100+ms. with the buffer at 4K I/O was quite stable at that rate. This was a straight row key look up test. e.g. no wide rows. It was reading way too much data for the queries. Would it be possible to have the buffer size set on a per table setting?



> Our default buffer size for (uncompressed) buffered reads should be smaller, and based on the expected record size
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8894
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8894
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Stefania
>              Labels: benedict-to-commit
>             Fix For: 3.0 alpha 1
>
>         Attachments: 8894_25pct.yaml, 8894_5pct.yaml, 8894_tiny.yaml, screenshot-1.png, screenshot-2.png
>
>
> A large contributor to slower buffered reads than mmapped is likely that we read a full 64Kb at once, when average record sizes may be as low as 140 bytes on our stress tests. The TLB has only 128 entries on a modern core, and each read will touch 32 of these, meaning we are unlikely to almost ever be hitting the TLB, and will be incurring at least 30 unnecessary misses each time (as well as the other costs of larger than necessary accesses). When working with an SSD there is little to no benefit reading more than 4Kb at once, and in either case reading more data than we need is wasteful. So, I propose selecting a buffer size that is the next larger power of 2 than our average record size (with a minimum of 4Kb), so that we expect to read in one operation. I also propose that we create a pool of these buffers up-front, and that we ensure they are all exactly aligned to a virtual page, so that the source and target operations each touch exactly one virtual page per 4Kb of expected record size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)