You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Michael Kjellman (JIRA)" <ji...@apache.org> on 2016/09/01 07:21:21 UTC
[jira] [Comment Edited] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

    [ https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15454525#comment-15454525 ] 

Michael Kjellman edited comment on CASSANDRA-9754 at 9/1/16 7:21 AM:
---------------------------------------------------------------------

I've discovered a performance regression caused by the original logic in PageAlignedReader. I always knew the original design wasn't ideal, however, I felt that the additional code complexity wasn't worth the performance improvements. However, now that the code is stabilized and I've moved on to performance validation (and not just bugs and implementation) I found it was horribly inefficient.

https://github.com/mkjellman/cassandra/commit/b4b3152ec7d92d85c032cfbcbfae705e9dc36989

I've updated the documentation in PageAlignedWriter to cover the new PageAligned file format. The new implementation allows lazy deserialization of segment metadata as required, and enables binary search across segments via the fixed length starting offsets. This means deserialization of the segments are no longer required ahead of time -- deserialization of the segment metadata only occurs when required to return a result.

Initial benchmarking and profiling makes me a pretty happy guy. I think the new design is a massive improvement over the old one and looks pretty good so far.


was (Author: mkjellman):
I've discovered a performance regression caused by the original logic in PageAlignedReader. I always knew the original design wasn't ideal, however, I felt that the additional code complexity wasn't worth the performance improvements. However, now that the code is stabilized and I've moved on to performance validation (and not just bugs and implementation) I found it was horribly inefficient.

https://github.com/mkjellman/cassandra/commit/33d35272ae50803bac626ab60d5ecd3a36f5b283

I've updated the documentation in PageAlignedWriter to cover the new PageAligned file format. The new implementation allows lazy deserialization of segment metadata as required, and enables binary search across segments via the fixed length starting offsets. This means deserialization of the segments are no longer required ahead of time -- deserialization of the segment metadata only occurs when required to return a result.

Initial benchmarking and profiling makes me a pretty happy guy. I think the new design is a massive improvement over the old one and looks pretty good so far.

> Make index info heap friendly for large CQL partitions
> ------------------------------------------------------
>
>                 Key: CASSANDRA-9754
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Michael Kjellman
>            Priority: Minor
>             Fix For: 4.x
>
>         Attachments: 9754_part1-v1.diff, 9754_part2-v1.diff
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects are IndexInfo and its ByteBuffers. This is specially bad in endpoints with large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)