You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict (JIRA)" <ji...@apache.org> on 2015/07/30 00:28:04 UTC
[jira] [Comment Edited] (CASSANDRA-9894) Serialize the header only
once per message
[ https://issues.apache.org/jira/browse/CASSANDRA-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646840#comment-14646840 ]
Benedict edited comment on CASSANDRA-9894 at 7/29/15 10:27 PM:
---------------------------------------------------------------
I've pushed an initial version [here|https://github.com/belliottsmith/cassandra/tree/9894]. This is based on my patch for CASSANDRA-9471.
I tried starting from Sylvain's patch, and then starting from scratch, and ultimately I didn't like where either lead. So this attacks the problem a little differently: it uses the column filter sent to the replica to help encode the response, knowing that the response columns must be a subset. With a normal number of columns this translates to a presence bitmap (otherwise it is a sequence of ints either adding or removing from the set, but these codepaths should rarely be taken). If the columns are identical, a single 0 byte is sent for all the columns.
This permits us to save work when serializing even single partitions, and also permits us to encode per-partition encoding stats, so that our timestamps can most likely be more efficiently encoded. It also touches far less code.
I am not 100% certain I haven't broken things, as dtests are a little tricky to read right now, but nothing jumps out at me. I still need to introduce some unit tests, and also want to invert the bitmap to make it more efficiently vint encoded. But the patch is generally ready for a first round of review, as it will change only minimally.
was (Author: benedict):
I've pushed an initial version [here|https://github.com/belliottsmith/cassandra/tree/9894]. This is based on my patch for CASSANDRA-9471.
I tried starting from Sylvain's patch, and then starting from scratch, and ultimately I didn't like where either lead. So this attacks the problem a little differently: it uses the column filter sent to the coordinator for a query to encode the response, knowing that the columns must be a subset. With a normal number of columns this translates to a bitmap of presence in the response for each column in the request (otherwise it is a sequence of vint encoded ints, but these codepaths should rarely be taken), and if the columns are identical (what we should expect), a single 0 byte is sent for all the columns.
This permits us to save work when serializing even single partitions, and also permits us to encode per-partition encoding stats, so that our timestamps can most likely be more efficiently encoded. It also touches far less code.
I am not 100% certain I haven't broken things, as dtests are a little tricky to read right now, but nothing jumps out at me. I still need to introduce some unit tests, and also want to invert the bitmap to make it more efficiently vint encoded. But the patch is generally ready for a first round of review, as it will change only minimally.
> Serialize the header only once per message
> ------------------------------------------
>
> Key: CASSANDRA-9894
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9894
> Project: Cassandra
> Issue Type: Sub-task
> Components: Core
> Reporter: Sylvain Lebresne
> Assignee: Benedict
> Fix For: 3.0 beta 1
>
>
> One last improvement I'd like to do on the serialization side is that we currently serialize the {{SerializationHeader}} for each partition. That header contains the serialized columns in particular and for range queries, serializing that for every partition is wasted (note that it's only a problem for the messaging protocol as for sstable we only write the header once per sstable).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)