You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict (JIRA)" <ji...@apache.org> on 2015/07/30 00:28:04 UTC

[jira] [Comment Edited] (CASSANDRA-9894) Serialize the header only once per message

    [ https://issues.apache.org/jira/browse/CASSANDRA-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646840#comment-14646840 ] 

Benedict edited comment on CASSANDRA-9894 at 7/29/15 10:27 PM:
---------------------------------------------------------------

I've pushed an initial version [here|https://github.com/belliottsmith/cassandra/tree/9894]. This is based on my patch for CASSANDRA-9471.

I tried starting from Sylvain's patch, and then starting from scratch, and ultimately I didn't like where either lead. So this attacks the problem a little differently: it uses the column filter sent to the replica to help encode the response, knowing that the response columns must be a subset. With a normal number of columns this translates to a presence bitmap (otherwise it is a sequence of ints either adding or removing from the set, but these codepaths should rarely be taken). If the columns are identical, a single 0 byte is sent for all the columns.

This permits us to save work when serializing even single partitions, and also permits us to encode per-partition encoding stats, so that our timestamps can most likely be more efficiently encoded. It also touches far less code.

I am not 100% certain I haven't broken things, as dtests are a little tricky to read right now, but nothing jumps out at me. I still need to introduce some unit tests, and also want to invert the bitmap to make it more efficiently vint encoded. But the patch is generally ready for a first round of review, as it will change only minimally.


was (Author: benedict):
I've pushed an initial version [here|https://github.com/belliottsmith/cassandra/tree/9894]. This is based on my patch for CASSANDRA-9471.

I tried starting from Sylvain's patch, and then starting from scratch, and ultimately I didn't like where either lead. So this attacks the problem a little differently: it uses the column filter sent to the coordinator for a query to encode the response, knowing that the columns must be a subset. With a normal number of columns this translates to a bitmap of presence in the response for each column in the request (otherwise it is a sequence of vint encoded ints, but these codepaths should rarely be taken), and if the columns are identical (what we should expect), a single 0 byte is sent for all the columns.

This permits us to save work when serializing even single partitions, and also permits us to encode per-partition encoding stats, so that our timestamps can most likely be more efficiently encoded. It also touches far less code.

I am not 100% certain I haven't broken things, as dtests are a little tricky to read right now, but nothing jumps out at me. I still need to introduce some unit tests, and also want to invert the bitmap to make it more efficiently vint encoded. But the patch is generally ready for a first round of review, as it will change only minimally.

> Serialize the header only once per message
> ------------------------------------------
>
>                 Key: CASSANDRA-9894
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9894
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Benedict
>             Fix For: 3.0 beta 1
>
>
> One last improvement I'd like to do on the serialization side is that we currently serialize the {{SerializationHeader}} for each partition. That header contains the serialized columns in particular and for range queries, serializing that for every partition is wasted (note that it's only a problem for the messaging protocol as for sstable we only write the header once per sstable).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)