You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ariel Weisberg (JIRA)" <ji...@apache.org> on 2014/12/10 21:20:12 UTC

[jira] [Comment Edited] (CASSANDRA-4139) Add varint encoding to Messaging service

    [ https://issues.apache.org/jira/browse/CASSANDRA-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241644#comment-14241644 ] 

Ariel Weisberg edited comment on CASSANDRA-4139 at 12/10/14 8:19 PM:
---------------------------------------------------------------------

Is bandwidth a constraint for WAN replication? In practice is the default for messaging to have compression on? What are people doing in the wild?

I could imagine varint encoding being a win for Cells where the names and values are integers and queries are bulk loading or selecting ranges. At the storage level it seems like the kind of thing that could beat general purpose compression if you know what data type you are dealing with and have a lot of 0 padded values.

I have heard talk about using a column store and run length encoding approach for storage which makes it seem like varint encoding wouldn't be the tool of choice for storage either.

The code changes don't look bad. It's mostly swapping types for streams and changes to calculating serialized size so that it is aware of the impact of variable length encoded integers. It could save bandwidth, but it could also be slower since you spend more cycles calculating serialized size and encoding/decoding integers. If you end up using compression in bandwidth sensitive scenarios you may not win much.

Not varint encoding the data going in/out of the database means you only save real space proportionally when you have small operations going in/out. The flip side is that you can't do that many small ops anyways so you aren't bandwidth constrained.


was (Author: aweisberg):
Is bandwidth a constraint for WAN replication? In practice is the default for messaging to have compression on? What are people doing in the wild?

I could imagine varint encoding being a win for Cells where the names and values are integers and queries are bulk loading or selecting ranges. At the storage level it seems like the kind of thing that could beat general purpose compression if you know what data type you are dealing with and have a lot of 0 padded values.

I have heard talk about using a column store and run length encoding approach for storage which makes it seem like varint encoding would be the tool of choice for storage either.

The code changes don't look bad. It's mostly swapping types for streams and changes to calculating serialized size so that it is aware of the impact of variable length encoded integers. It could save bandwidth, but it could also be slower since you spend more cycles calculating serialized size and encoding/decoding integers. If you end up using compression in bandwidth sensitive scenarios you may not win much.

Not varint encoding the data going in/out of the database means you only save real space proportionally when you have small operations going in/out. The flip side is that you can't do that many small ops anyways so you aren't bandwidth constrained.

> Add varint encoding to Messaging service
> ----------------------------------------
>
>                 Key: CASSANDRA-4139
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4139
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Vijay
>            Assignee: Ariel Weisberg
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-4139-v1.patch, 0001-CASSANDRA-4139-v2.patch, 0001-CASSANDRA-4139-v4.patch, 0002-add-bytes-written-metric.patch, 4139-Test.rtf, ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-4139-v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)