You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2011/09/02 18:49:10 UTC

[jira] [Created] (CASSANDRA-3127) Message (inter-node) compression

Message (inter-node) compression
--------------------------------

                 Key: CASSANDRA-3127
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3127
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Sylvain Lebresne
            Priority: Minor


CASSANDRA-3015 adds compression of streams. But it could be useful to also compress some messages.
Compressing messages is easy, but what may be little bit trickier is when and what messages to compress to get the best performances.

The simple solution would be to just have it either always on or always off. But for very small messages (gossip?) that may be counter-productive. On the other side of the spectrum, this is likely always a good choice to compress for say the exchange of merkle trees across data-centers. We could maybe define a size of messages after which we start to compress. Maybe the option to only compress for cross data-center messages would be useful too (but I may also just be getting carried away). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3127) Message (inter-node) compression

Posted by "Marcus Eriksson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283474#comment-13283474 ] 

Marcus Eriksson commented on CASSANDRA-3127:
--------------------------------------------

Could we perhaps always compress the message and check if the resulting message is smaller than the original one?

MySQL does that when using client->server compression for example.

I'll assign this to me and start poking around a bit
                
> Message (inter-node) compression
> --------------------------------
>
>                 Key: CASSANDRA-3127
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3127
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>
> CASSANDRA-3015 adds compression of streams. But it could be useful to also compress some messages.
> Compressing messages is easy, but what may be little bit trickier is when and what messages to compress to get the best performances.
> The simple solution would be to just have it either always on or always off. But for very small messages (gossip?) that may be counter-productive. On the other side of the spectrum, this is likely always a good choice to compress for say the exchange of merkle trees across data-centers. We could maybe define a size of messages after which we start to compress. Maybe the option to only compress for cross data-center messages would be useful too (but I may also just be getting carried away). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3127) Message (inter-node) compression

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286277#comment-13286277 ] 

Jonathan Ellis commented on CASSANDRA-3127:
-------------------------------------------

Did you experiment w/ Snappy vs LZF?

Creating a new DOS + LZFOS per message seems pretty wasteful, couldn't we compress the entire stream instead?  Since we can't enable compression for old-version nodes anyway.  (Which needs to be handled, btw.)

That would also give us better compression by compressing multiple messages together -- we could rely on the flush in writeConnected that only flushes if there are no more messages on the queue, so under load compression would *improve* which is a cool property.
                
> Message (inter-node) compression
> --------------------------------
>
>                 Key: CASSANDRA-3127
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3127
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Marcus Eriksson
>            Priority: Minor
>         Attachments: CASSANDRA-3127.patch, CHECK_SIZES-CASSANDRA-3127.patch
>
>
> CASSANDRA-3015 adds compression of streams. But it could be useful to also compress some messages.
> Compressing messages is easy, but what may be little bit trickier is when and what messages to compress to get the best performances.
> The simple solution would be to just have it either always on or always off. But for very small messages (gossip?) that may be counter-productive. On the other side of the spectrum, this is likely always a good choice to compress for say the exchange of merkle trees across data-centers. We could maybe define a size of messages after which we start to compress. Maybe the option to only compress for cross data-center messages would be useful too (but I may also just be getting carried away). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3127) Message (inter-node) compression

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3127:
--------------------------------------

         Reviewer: jbellis
    Fix Version/s: 1.2
    
> Message (inter-node) compression
> --------------------------------
>
>                 Key: CASSANDRA-3127
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3127
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Marcus Eriksson
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-CASSANDRA-3127-compress-messages-between-nodes.patch, CASSANDRA-3127.patch, CHECK_SIZES-CASSANDRA-3127.patch
>
>
> CASSANDRA-3015 adds compression of streams. But it could be useful to also compress some messages.
> Compressing messages is easy, but what may be little bit trickier is when and what messages to compress to get the best performances.
> The simple solution would be to just have it either always on or always off. But for very small messages (gossip?) that may be counter-productive. On the other side of the spectrum, this is likely always a good choice to compress for say the exchange of merkle trees across data-centers. We could maybe define a size of messages after which we start to compress. Maybe the option to only compress for cross data-center messages would be useful too (but I may also just be getting carried away). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3127) Message (inter-node) compression

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126944#comment-13126944 ] 

Jonathan Ellis commented on CASSANDRA-3127:
-------------------------------------------

It would be pretty easy to configure "off", "on", and "cross-dc".  In the cross-dc case, OutboundTcpConnection could just ask the snitch (DatabaseDescriptor.getEndpointSnitch) if the target node is in another DC, and make the compression decision based on that.
                
> Message (inter-node) compression
> --------------------------------
>
>                 Key: CASSANDRA-3127
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3127
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>
> CASSANDRA-3015 adds compression of streams. But it could be useful to also compress some messages.
> Compressing messages is easy, but what may be little bit trickier is when and what messages to compress to get the best performances.
> The simple solution would be to just have it either always on or always off. But for very small messages (gossip?) that may be counter-productive. On the other side of the spectrum, this is likely always a good choice to compress for say the exchange of merkle trees across data-centers. We could maybe define a size of messages after which we start to compress. Maybe the option to only compress for cross data-center messages would be useful too (but I may also just be getting carried away). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3127) Message (inter-node) compression

Posted by "Marcus Eriksson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcus Eriksson updated CASSANDRA-3127:
---------------------------------------

    Attachment: CHECK_SIZES-CASSANDRA-3127.patch
                CASSANDRA-3127.patch
    
> Message (inter-node) compression
> --------------------------------
>
>                 Key: CASSANDRA-3127
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3127
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Marcus Eriksson
>            Priority: Minor
>         Attachments: CASSANDRA-3127.patch, CHECK_SIZES-CASSANDRA-3127.patch
>
>
> CASSANDRA-3015 adds compression of streams. But it could be useful to also compress some messages.
> Compressing messages is easy, but what may be little bit trickier is when and what messages to compress to get the best performances.
> The simple solution would be to just have it either always on or always off. But for very small messages (gossip?) that may be counter-productive. On the other side of the spectrum, this is likely always a good choice to compress for say the exchange of merkle trees across data-centers. We could maybe define a size of messages after which we start to compress. Maybe the option to only compress for cross data-center messages would be useful too (but I may also just be getting carried away). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3127) Message (inter-node) compression

Posted by "Marcus Eriksson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289246#comment-13289246 ] 

Marcus Eriksson commented on CASSANDRA-3127:
--------------------------------------------

I captured 75M of real traffic in one of our clusters and ran a few benchmarks
Both Snappy and LZF compressed to ~58M (LZF gave 0.2% better compression)
Snappy did the roundtrip (compress -> uncompress) in ~790ms for the 75M file
LZF did it in ~1170ms

Patch attached changes to Snappy, i did not see any of the issues xedin mentioned in CASSANDRA-3015, also removes VERSION_13
                
> Message (inter-node) compression
> --------------------------------
>
>                 Key: CASSANDRA-3127
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3127
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Marcus Eriksson
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-CASSANDRA-3127-compress-messages-between-nodes.patch, CASSANDRA-3127.patch, CHECK_SIZES-CASSANDRA-3127.patch
>
>
> CASSANDRA-3015 adds compression of streams. But it could be useful to also compress some messages.
> Compressing messages is easy, but what may be little bit trickier is when and what messages to compress to get the best performances.
> The simple solution would be to just have it either always on or always off. But for very small messages (gossip?) that may be counter-productive. On the other side of the spectrum, this is likely always a good choice to compress for say the exchange of merkle trees across data-centers. We could maybe define a size of messages after which we start to compress. Maybe the option to only compress for cross data-center messages would be useful too (but I may also just be getting carried away). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3127) Message (inter-node) compression

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286278#comment-13286278 ] 

Jonathan Ellis commented on CASSANDRA-3127:
-------------------------------------------

(That would also address the "don't bother compressing very small messages" point in the original ticket description.)
                
> Message (inter-node) compression
> --------------------------------
>
>                 Key: CASSANDRA-3127
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3127
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Marcus Eriksson
>            Priority: Minor
>         Attachments: CASSANDRA-3127.patch, CHECK_SIZES-CASSANDRA-3127.patch
>
>
> CASSANDRA-3015 adds compression of streams. But it could be useful to also compress some messages.
> Compressing messages is easy, but what may be little bit trickier is when and what messages to compress to get the best performances.
> The simple solution would be to just have it either always on or always off. But for very small messages (gossip?) that may be counter-productive. On the other side of the spectrum, this is likely always a good choice to compress for say the exchange of merkle trees across data-centers. We could maybe define a size of messages after which we start to compress. Maybe the option to only compress for cross data-center messages would be useful too (but I may also just be getting carried away). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3127) Message (inter-node) compression

Posted by "Marcus Eriksson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285721#comment-13285721 ] 

Marcus Eriksson commented on CASSANDRA-3127:
--------------------------------------------

Built the version which always sends the smallest message, saw great results in compression ratios, a standard stress test gave a 20% compression ratio, basically all messages where compressed. The gain in checking which message was smallest was minimal.

A drawback was that memory usage increased quite a lot since we need to serialize the message, compress and compare sizes instead of just serializing the message to the DataOutputStream

So, instead i just compressed all messages with good results

I attach both patches, they add a configuration option like;
+# internode_compression controls whether traffic between nodes is
+# compressed.
+# can be:  all  - all traffic is compressed
+#          dc   - traffic between different datacenters is compressed
+#          none - nothing is compressed.
+internode_compression: all

                
> Message (inter-node) compression
> --------------------------------
>
>                 Key: CASSANDRA-3127
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3127
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Marcus Eriksson
>            Priority: Minor
>         Attachments: CASSANDRA-3127.patch, CHECK_SIZES-CASSANDRA-3127.patch
>
>
> CASSANDRA-3015 adds compression of streams. But it could be useful to also compress some messages.
> Compressing messages is easy, but what may be little bit trickier is when and what messages to compress to get the best performances.
> The simple solution would be to just have it either always on or always off. But for very small messages (gossip?) that may be counter-productive. On the other side of the spectrum, this is likely always a good choice to compress for say the exchange of merkle trees across data-centers. We could maybe define a size of messages after which we start to compress. Maybe the option to only compress for cross data-center messages would be useful too (but I may also just be getting carried away). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3127) Message (inter-node) compression

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3127:
--------------------------------------

    Comment: was deleted

(was: It would be pretty easy to configure "off", "on", and "cross-dc".  In the cross-dc case, OutboundTcpConnection could just ask the snitch (DatabaseDescriptor.getEndpointSnitch) if the target node is in another DC, and make the compression decision based on that.)
    
> Message (inter-node) compression
> --------------------------------
>
>                 Key: CASSANDRA-3127
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3127
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>
> CASSANDRA-3015 adds compression of streams. But it could be useful to also compress some messages.
> Compressing messages is easy, but what may be little bit trickier is when and what messages to compress to get the best performances.
> The simple solution would be to just have it either always on or always off. But for very small messages (gossip?) that may be counter-productive. On the other side of the spectrum, this is likely always a good choice to compress for say the exchange of merkle trees across data-centers. We could maybe define a size of messages after which we start to compress. Maybe the option to only compress for cross data-center messages would be useful too (but I may also just be getting carried away). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3127) Message (inter-node) compression

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247296#comment-13247296 ] 

Sylvain Lebresne commented on CASSANDRA-3127:
---------------------------------------------

bq. Shouldn't merkle trees (in theory) already be the most information dense representation possible and thus be uncompressable?

It's a tree holding hashes and it's true that the hashes probably won't compress much, but it's not compressed either so it may compress a bit. Anyway, I didn't meant to imply that merkle tree would or would not compress well. I just meant that for cross-DC messages, any non-trivially small message is likely worth being compressed. Merkle tree may have been a poorly chosen example.
                
> Message (inter-node) compression
> --------------------------------
>
>                 Key: CASSANDRA-3127
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3127
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>
> CASSANDRA-3015 adds compression of streams. But it could be useful to also compress some messages.
> Compressing messages is easy, but what may be little bit trickier is when and what messages to compress to get the best performances.
> The simple solution would be to just have it either always on or always off. But for very small messages (gossip?) that may be counter-productive. On the other side of the spectrum, this is likely always a good choice to compress for say the exchange of merkle trees across data-centers. We could maybe define a size of messages after which we start to compress. Maybe the option to only compress for cross data-center messages would be useful too (but I may also just be getting carried away). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3127) Message (inter-node) compression

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126957#comment-13126957 ] 

Jonathan Ellis commented on CASSANDRA-3127:
-------------------------------------------

bq. define a size of messages after which we start to compress

Let's go with this: intranode_message_compression_threshold > 0 means compress messages larger than it.  <= 0 means off.  Let's leave off the cross-dc complexity for now.

The code should be fairly self-contained in OutboundTcpConnection and IncomingTcpConnection. FileStreamTask provides an example of using compression over a socket.
                
> Message (inter-node) compression
> --------------------------------
>
>                 Key: CASSANDRA-3127
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3127
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>
> CASSANDRA-3015 adds compression of streams. But it could be useful to also compress some messages.
> Compressing messages is easy, but what may be little bit trickier is when and what messages to compress to get the best performances.
> The simple solution would be to just have it either always on or always off. But for very small messages (gossip?) that may be counter-productive. On the other side of the spectrum, this is likely always a good choice to compress for say the exchange of merkle trees across data-centers. We could maybe define a size of messages after which we start to compress. Maybe the option to only compress for cross data-center messages would be useful too (but I may also just be getting carried away). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (CASSANDRA-3127) Message (inter-node) compression

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-3127.
---------------------------------------

    Resolution: Fixed

committed, thanks!

will follow up w/ some related changes in CASSANDRA-4311.
                
> Message (inter-node) compression
> --------------------------------
>
>                 Key: CASSANDRA-3127
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3127
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Marcus Eriksson
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-CASSANDRA-3127-compress-messages-between-nodes.patch, CASSANDRA-3127-snappy.patch, CASSANDRA-3127.patch, CHECK_SIZES-CASSANDRA-3127.patch
>
>
> CASSANDRA-3015 adds compression of streams. But it could be useful to also compress some messages.
> Compressing messages is easy, but what may be little bit trickier is when and what messages to compress to get the best performances.
> The simple solution would be to just have it either always on or always off. But for very small messages (gossip?) that may be counter-productive. On the other side of the spectrum, this is likely always a good choice to compress for say the exchange of merkle trees across data-centers. We could maybe define a size of messages after which we start to compress. Maybe the option to only compress for cross data-center messages would be useful too (but I may also just be getting carried away). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (CASSANDRA-3127) Message (inter-node) compression

Posted by "Marcus Eriksson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283474#comment-13283474 ] 

Marcus Eriksson edited comment on CASSANDRA-3127 at 5/25/12 2:32 PM:
---------------------------------------------------------------------

Could we perhaps always compress the message and check if the resulting message is smaller than the original one? And then of course send the smallest one over the wire.

MySQL does that when using client->server compression for example.

I'll assign this to me and start poking around a bit
                
      was (Author: krummas):
    Could we perhaps always compress the message and check if the resulting message is smaller than the original one?

MySQL does that when using client->server compression for example.

I'll assign this to me and start poking around a bit
                  
> Message (inter-node) compression
> --------------------------------
>
>                 Key: CASSANDRA-3127
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3127
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Marcus Eriksson
>            Priority: Minor
>
> CASSANDRA-3015 adds compression of streams. But it could be useful to also compress some messages.
> Compressing messages is easy, but what may be little bit trickier is when and what messages to compress to get the best performances.
> The simple solution would be to just have it either always on or always off. But for very small messages (gossip?) that may be counter-productive. On the other side of the spectrum, this is likely always a good choice to compress for say the exchange of merkle trees across data-centers. We could maybe define a size of messages after which we start to compress. Maybe the option to only compress for cross data-center messages would be useful too (but I may also just be getting carried away). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (CASSANDRA-3127) Message (inter-node) compression

Posted by "Marcus Eriksson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcus Eriksson reassigned CASSANDRA-3127:
------------------------------------------

    Assignee: Marcus Eriksson
    
> Message (inter-node) compression
> --------------------------------
>
>                 Key: CASSANDRA-3127
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3127
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Marcus Eriksson
>            Priority: Minor
>
> CASSANDRA-3015 adds compression of streams. But it could be useful to also compress some messages.
> Compressing messages is easy, but what may be little bit trickier is when and what messages to compress to get the best performances.
> The simple solution would be to just have it either always on or always off. But for very small messages (gossip?) that may be counter-productive. On the other side of the spectrum, this is likely always a good choice to compress for say the exchange of merkle trees across data-centers. We could maybe define a size of messages after which we start to compress. Maybe the option to only compress for cross data-center messages would be useful too (but I may also just be getting carried away). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3127) Message (inter-node) compression

Posted by "Marcus Eriksson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286493#comment-13286493 ] 

Marcus Eriksson commented on CASSANDRA-3127:
--------------------------------------------

I have not (yet) tried Snappy for this, should i?

new patch does what you suggest, approach when node A starts communicating with node B is:
# node A sends the first message uncompressed, but with the compression bit set in the header (this is only done if it is enabled in the conf and the version of node B is >= current, but since MessagingService returns current version if it does not know about the remote node, not sure how effective it is, guess we get an exception and a reconnect, and then we might know the remote version)
# node A "upgrades" its dataoutputstream to be compressed
# node B gets the first, uncompressed message, sees the compression flag in the header and upgrades the DataInputStream to be compressed
                
> Message (inter-node) compression
> --------------------------------
>
>                 Key: CASSANDRA-3127
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3127
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Marcus Eriksson
>            Priority: Minor
>         Attachments: 0001-CASSANDRA-3127-compress-messages-between-nodes.patch, CASSANDRA-3127.patch, CHECK_SIZES-CASSANDRA-3127.patch
>
>
> CASSANDRA-3015 adds compression of streams. But it could be useful to also compress some messages.
> Compressing messages is easy, but what may be little bit trickier is when and what messages to compress to get the best performances.
> The simple solution would be to just have it either always on or always off. But for very small messages (gossip?) that may be counter-productive. On the other side of the spectrum, this is likely always a good choice to compress for say the exchange of merkle trees across data-centers. We could maybe define a size of messages after which we start to compress. Maybe the option to only compress for cross data-center messages would be useful too (but I may also just be getting carried away). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3127) Message (inter-node) compression

Posted by "Marcus Eriksson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcus Eriksson updated CASSANDRA-3127:
---------------------------------------

    Attachment: CASSANDRA-3127-snappy.patch
    
> Message (inter-node) compression
> --------------------------------
>
>                 Key: CASSANDRA-3127
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3127
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Marcus Eriksson
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-CASSANDRA-3127-compress-messages-between-nodes.patch, CASSANDRA-3127-snappy.patch, CASSANDRA-3127.patch, CHECK_SIZES-CASSANDRA-3127.patch
>
>
> CASSANDRA-3015 adds compression of streams. But it could be useful to also compress some messages.
> Compressing messages is easy, but what may be little bit trickier is when and what messages to compress to get the best performances.
> The simple solution would be to just have it either always on or always off. But for very small messages (gossip?) that may be counter-productive. On the other side of the spectrum, this is likely always a good choice to compress for say the exchange of merkle trees across data-centers. We could maybe define a size of messages after which we start to compress. Maybe the option to only compress for cross data-center messages would be useful too (but I may also just be getting carried away). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3127) Message (inter-node) compression

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288712#comment-13288712 ] 

Jonathan Ellis commented on CASSANDRA-3127:
-------------------------------------------

Yes, I'd like to see LZF vs Snappy if possible.  Otherwise this looks reasonable.  One change to make is that we're introducing just one MessagingService version change per major release, so sticking with VERSION_12 here is fine.
                
> Message (inter-node) compression
> --------------------------------
>
>                 Key: CASSANDRA-3127
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3127
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Marcus Eriksson
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-CASSANDRA-3127-compress-messages-between-nodes.patch, CASSANDRA-3127.patch, CHECK_SIZES-CASSANDRA-3127.patch
>
>
> CASSANDRA-3015 adds compression of streams. But it could be useful to also compress some messages.
> Compressing messages is easy, but what may be little bit trickier is when and what messages to compress to get the best performances.
> The simple solution would be to just have it either always on or always off. But for very small messages (gossip?) that may be counter-productive. On the other side of the spectrum, this is likely always a good choice to compress for say the exchange of merkle trees across data-centers. We could maybe define a size of messages after which we start to compress. Maybe the option to only compress for cross data-center messages would be useful too (but I may also just be getting carried away). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3127) Message (inter-node) compression

Posted by "Chris Burroughs (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247287#comment-13247287 ] 

Chris Burroughs commented on CASSANDRA-3127:
--------------------------------------------

bq. On the other side of the spectrum, this is likely always a good choice to compress for say the exchange of merkle trees across data-centers. 

Shouldn't merkle trees (in theory) already be the most information dense representation possible and thus be uncompressable?
                
> Message (inter-node) compression
> --------------------------------
>
>                 Key: CASSANDRA-3127
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3127
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>
> CASSANDRA-3015 adds compression of streams. But it could be useful to also compress some messages.
> Compressing messages is easy, but what may be little bit trickier is when and what messages to compress to get the best performances.
> The simple solution would be to just have it either always on or always off. But for very small messages (gossip?) that may be counter-productive. On the other side of the spectrum, this is likely always a good choice to compress for say the exchange of merkle trees across data-centers. We could maybe define a size of messages after which we start to compress. Maybe the option to only compress for cross data-center messages would be useful too (but I may also just be getting carried away). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3127) Message (inter-node) compression

Posted by "Marcus Eriksson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcus Eriksson updated CASSANDRA-3127:
---------------------------------------

    Attachment: 0001-CASSANDRA-3127-compress-messages-between-nodes.patch
    
> Message (inter-node) compression
> --------------------------------
>
>                 Key: CASSANDRA-3127
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3127
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Marcus Eriksson
>            Priority: Minor
>         Attachments: 0001-CASSANDRA-3127-compress-messages-between-nodes.patch, CASSANDRA-3127.patch, CHECK_SIZES-CASSANDRA-3127.patch
>
>
> CASSANDRA-3015 adds compression of streams. But it could be useful to also compress some messages.
> Compressing messages is easy, but what may be little bit trickier is when and what messages to compress to get the best performances.
> The simple solution would be to just have it either always on or always off. But for very small messages (gossip?) that may be counter-productive. On the other side of the spectrum, this is likely always a good choice to compress for say the exchange of merkle trees across data-centers. We could maybe define a size of messages after which we start to compress. Maybe the option to only compress for cross data-center messages would be useful too (but I may also just be getting carried away). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3127) Message (inter-node) compression

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127341#comment-13127341 ] 

Sylvain Lebresne commented on CASSANDRA-3127:
---------------------------------------------

bq. Let's go with this: intranode_message_compression_threshold > 0 means compress messages larger than it. <= 0 means off

Agreed with the idea. Though to nitpick, I would find it more natural to have == 0 means 'compress all messages'. And maybe < 0 means off.

bq. Let's leave off the cross-dc complexity for now.

Totally agree.
                
> Message (inter-node) compression
> --------------------------------
>
>                 Key: CASSANDRA-3127
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3127
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>
> CASSANDRA-3015 adds compression of streams. But it could be useful to also compress some messages.
> Compressing messages is easy, but what may be little bit trickier is when and what messages to compress to get the best performances.
> The simple solution would be to just have it either always on or always off. But for very small messages (gossip?) that may be counter-productive. On the other side of the spectrum, this is likely always a good choice to compress for say the exchange of merkle trees across data-centers. We could maybe define a size of messages after which we start to compress. Maybe the option to only compress for cross data-center messages would be useful too (but I may also just be getting carried away). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira