You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Sachin Nikam <sk...@gmail.com> on 2015/08/02 01:17:28 UTC

Cassandra Data Stax java driver & Snappy Compression library

I am currently running a Cassandra 1.2 cluster. This cluster has 2 tables
i.e.
TableA and TableB.

TableA is read and written to by Services S1 and S2 which use Astyanax
client library.

TableB is read and written by Service S3 which uses the datastax java
driver 2.1. S3 also reads data from TableA.

Both TableA and TableB are defined on the Cassandra nodes to use
SnappyCompressor.

On start-up service, Service S3 throws the following WARNing messages. The
service is able to continue doing its normal operation thereafter

**************
[main] WARN loggerClass=com.datastax.driver.core.FrameCompressor;Cannot find
Snappy class, you should make sure the Snappy library is in the classpath if
you intend to use it. Snappy compression will not be available for the
protocol.
***********


My questions are as follows--
#1. Does the compression happen on the cassandra client side or within
cassandra server side itself?
#2. Does Service S3 need to pull in additional dependencies for Snappy
Compressions as mentioned here --
http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error
#3. What happens without this additional library not being present on class
path of Service S3. Any data that S3 writes to TableB will not be
compressed?
Regards
Sachin

Re: Cassandra Data Stax java driver & Snappy Compression library

Posted by Janne Jalkanen <ja...@ecyrd.com>.
I’ve never used Astyanax, so it’s difficult to say, but if you can find the snappy-java in the classpath, it’s quite possible that compression is enabled for S1 and S2 automatically. You could try removing the snappy jar from S1 and see if that changes the latencies compared to S2. ;-)

It probably has some impact on end-to-end latency, but there are multiple other things which also impact latency, such as whether you’re using prepared queries with the Datastax driver, how large your queries are, etc.  In general the consensus seems to be that using CQL over the Datastax driver is 1) very fast and since 2.1 of Cassandra, arguably faster than the Thrift interface that the older clients still use, b) the clarity of the CQL interface gives a productivity boost for developers and iii) all new features will be implemented using it, so using CQL is future-proof.

/Janne

> On 5 Aug 2015, at 06:34, Sachin Nikam <sk...@gmail.com> wrote:
> 
> Janne,
> A little clarification i found snappy-java-1.0.4.1.jar on class path. But other questions still remain.
> 
> On Tue, Aug 4, 2015 at 8:24 PM, Sachin Nikam <sknikam@gmail.com <ma...@gmail.com>> wrote:
> Janne,
> Thanks for continuing to take the time to answer my queries. We noticed that write latency (tp99) from Services S1 and S2 is 50% of the write latency (tp99) for Service S3. I also noticed that S1 and S2, which also use astyanax client library also have compress-lzf.jar on their class path. Although the table is defined to use Snappy Compression. Is this compression library or some other transitive dependency pulled in by Astyanax enabling compression of the payload i.e. sent over the wire and account for the difference in tp99?
> Regards
> Sachin
> 
> On Mon, Aug 3, 2015 at 12:14 AM, Janne Jalkanen <janne.jalkanen@ecyrd.com <ma...@ecyrd.com>> wrote:
> 
> Correct. Note that you may lose some performance this way though; in a typical case saving bandwidth by increasing CPU usage is good. However, it always depends on your usecase and whether you’re running your cluster to the max. It’s a good, low-hanging optimization to keep in mind though for production environments, if you choose not to enable compression now.
> 
> /Janne
> 
>> On 3 Aug 2015, at 08:40, Sachin Nikam <sknikam@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Thanks Janne...
>> To clarify, Service S3 should not run in to any issues and I may choose to not fix the issue?
>> Regards
>> Sachin
>> 
>> On Sat, Aug 1, 2015 at 11:50 PM, Janne Jalkanen <Janne.Jalkanen@ecyrd.com <ma...@ecyrd.com>> wrote:
>> No, this just tells that your client (S3 using Datastax driver) cannot communicate to the Cassandra cluster using a compressed protocol, since the necessary libraries are missing on the client side.  Servers will still compress the data they receive when they write it to disk.
>> 
>> In other words
>> 
>> Client  <- [uncompressed data] -> Server <- [compressed data] -> Disk. 
>> 
>> To fix, make sure that the Snappy libraries are in the classpath of your S3 service application.  As always, there’s no guarantee that this improves your performance, since if your app is already CPU-heavy, the extra CPU overhead of compression *may* be a problem.  So measure :-)
>> 
>> /Janne
>> 
>>> On 02 Aug 2015, at 02:17 , Sachin Nikam <sknikam@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> I am currently running a Cassandra 1.2 cluster. This cluster has 2 tables i.e.
>>> TableA and TableB.
>>> 
>>> TableA is read and written to by Services S1 and S2 which use Astyanax client library.
>>> 
>>> TableB is read and written by Service S3 which uses the datastax java driver 2.1. S3 also reads data from TableA.
>>> 
>>> Both TableA and TableB are defined on the Cassandra nodes to use SnappyCompressor.
>>> 
>>> On start-up service, Service S3 throws the following WARNing messages. The service is able to continue doing its normal operation thereafter
>>> 
>>> **************
>>> [main] WARN  loggerClass=com.datastax.driver.core.FrameCompressor;Cannot find Snappy class, you should make sure the Snappy library is in the classpath if you intend to use it. Snappy compression will not be available for the protocol.
>>> ***********
>>> 
>>> 
>>> My questions are as follows--
>>> #1. Does the compression happen on the cassandra client side or within cassandra server side itself?
>>> #2. Does Service S3 need to pull in additional dependencies for Snappy Compressions as mentioned here --
>>> http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error <http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error>
>>> #3. What happens without this additional library not being present on class path of Service S3. Any data that S3 writes to TableB will not be compressed? 
>>> Regards
>>> Sachin
>> 
>> 
> 
> 
> 


Re: Cassandra Data Stax java driver & Snappy Compression library

Posted by Sachin Nikam <sk...@gmail.com>.
Janne,
A little clarification i found snappy-java-1.0.4.1.jar on class path. But
other questions still remain.

On Tue, Aug 4, 2015 at 8:24 PM, Sachin Nikam <sk...@gmail.com> wrote:

> Janne,
> Thanks for continuing to take the time to answer my queries. We noticed
> that write latency (tp99) from Services S1 and S2 is 50% of the write
> latency (tp99) for Service S3. I also noticed that S1 and S2, which also
> use astyanax client library also have compress-lzf.jar on their class path.
> Although the table is defined to use Snappy Compression. Is this
> compression library or some other transitive dependency pulled in by
> Astyanax enabling compression of the payload i.e. sent over the wire and
> account for the difference in tp99?
> Regards
> Sachin
>
> On Mon, Aug 3, 2015 at 12:14 AM, Janne Jalkanen <ja...@ecyrd.com>
> wrote:
>
>>
>> Correct. Note that you may lose some performance this way though; in a
>> typical case saving bandwidth by increasing CPU usage is good. However, it
>> always depends on your usecase and whether you’re running your cluster to
>> the max. It’s a good, low-hanging optimization to keep in mind though for
>> production environments, if you choose not to enable compression now.
>>
>> /Janne
>>
>> On 3 Aug 2015, at 08:40, Sachin Nikam <sk...@gmail.com> wrote:
>>
>> Thanks Janne...
>> To clarify, Service S3 should not run in to any issues and I may choose
>> to not fix the issue?
>> Regards
>> Sachin
>>
>> On Sat, Aug 1, 2015 at 11:50 PM, Janne Jalkanen <Janne.Jalkanen@ecyrd.com
>> > wrote:
>>
>>> No, this just tells that your client (S3 using Datastax driver) cannot
>>> communicate to the Cassandra cluster using a compressed protocol, since the
>>> necessary libraries are missing on the client side.  Servers will still
>>> compress the data they receive when they write it to disk.
>>>
>>> In other words
>>>
>>> Client  <- [uncompressed data] -> Server <- [compressed data] -> Disk.
>>>
>>> To fix, make sure that the Snappy libraries are in the classpath of your
>>> S3 service application.  As always, there’s no guarantee that this improves
>>> your performance, since if your app is already CPU-heavy, the extra CPU
>>> overhead of compression *may* be a problem.  So measure :-)
>>>
>>> /Janne
>>>
>>> On 02 Aug 2015, at 02:17 , Sachin Nikam <sk...@gmail.com> wrote:
>>>
>>> I am currently running a Cassandra 1.2 cluster. This cluster has 2
>>> tables i.e.
>>> TableA and TableB.
>>>
>>> TableA is read and written to by Services S1 and S2 which use Astyanax
>>> client library.
>>>
>>> TableB is read and written by Service S3 which uses the datastax java
>>> driver 2.1. S3 also reads data from TableA.
>>>
>>> Both TableA and TableB are defined on the Cassandra nodes to use
>>> SnappyCompressor.
>>>
>>> On start-up service, Service S3 throws the following WARNing messages.
>>> The service is able to continue doing its normal operation thereafter
>>>
>>> **************
>>> [main] WARN loggerClass=com.datastax.driver.core.FrameCompressor;Cannot
>>> find Snappy class, you should make sure the Snappy library is in the
>>> classpath if you intend to use it. Snappy compression will not be
>>> available for the protocol.
>>> ***********
>>>
>>>
>>> My questions are as follows--
>>> #1. Does the compression happen on the cassandra client side or within
>>> cassandra server side itself?
>>> #2. Does Service S3 need to pull in additional dependencies for Snappy
>>> Compressions as mentioned here --
>>>
>>> http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error
>>> #3. What happens without this additional library not being present on
>>> class path of Service S3. Any data that S3 writes to TableB will not be
>>> compressed?
>>> Regards
>>> Sachin
>>>
>>>
>>>
>>
>>
>

Re: Cassandra Data Stax java driver & Snappy Compression library

Posted by Sachin Nikam <sk...@gmail.com>.
Janne,
Thanks for continuing to take the time to answer my queries. We noticed
that write latency (tp99) from Services S1 and S2 is 50% of the write
latency (tp99) for Service S3. I also noticed that S1 and S2, which also
use astyanax client library also have compress-lzf.jar on their class path.
Although the table is defined to use Snappy Compression. Is this
compression library or some other transitive dependency pulled in by
Astyanax enabling compression of the payload i.e. sent over the wire and
account for the difference in tp99?
Regards
Sachin

On Mon, Aug 3, 2015 at 12:14 AM, Janne Jalkanen <ja...@ecyrd.com>
wrote:

>
> Correct. Note that you may lose some performance this way though; in a
> typical case saving bandwidth by increasing CPU usage is good. However, it
> always depends on your usecase and whether you’re running your cluster to
> the max. It’s a good, low-hanging optimization to keep in mind though for
> production environments, if you choose not to enable compression now.
>
> /Janne
>
> On 3 Aug 2015, at 08:40, Sachin Nikam <sk...@gmail.com> wrote:
>
> Thanks Janne...
> To clarify, Service S3 should not run in to any issues and I may choose to
> not fix the issue?
> Regards
> Sachin
>
> On Sat, Aug 1, 2015 at 11:50 PM, Janne Jalkanen <Ja...@ecyrd.com>
> wrote:
>
>> No, this just tells that your client (S3 using Datastax driver) cannot
>> communicate to the Cassandra cluster using a compressed protocol, since the
>> necessary libraries are missing on the client side.  Servers will still
>> compress the data they receive when they write it to disk.
>>
>> In other words
>>
>> Client  <- [uncompressed data] -> Server <- [compressed data] -> Disk.
>>
>> To fix, make sure that the Snappy libraries are in the classpath of your
>> S3 service application.  As always, there’s no guarantee that this improves
>> your performance, since if your app is already CPU-heavy, the extra CPU
>> overhead of compression *may* be a problem.  So measure :-)
>>
>> /Janne
>>
>> On 02 Aug 2015, at 02:17 , Sachin Nikam <sk...@gmail.com> wrote:
>>
>> I am currently running a Cassandra 1.2 cluster. This cluster has 2 tables
>> i.e.
>> TableA and TableB.
>>
>> TableA is read and written to by Services S1 and S2 which use Astyanax
>> client library.
>>
>> TableB is read and written by Service S3 which uses the datastax java
>> driver 2.1. S3 also reads data from TableA.
>>
>> Both TableA and TableB are defined on the Cassandra nodes to use
>> SnappyCompressor.
>>
>> On start-up service, Service S3 throws the following WARNing messages.
>> The service is able to continue doing its normal operation thereafter
>>
>> **************
>> [main] WARN loggerClass=com.datastax.driver.core.FrameCompressor;Cannot
>> find Snappy class, you should make sure the Snappy library is in the
>> classpath if you intend to use it. Snappy compression will not be
>> available for the protocol.
>> ***********
>>
>>
>> My questions are as follows--
>> #1. Does the compression happen on the cassandra client side or within
>> cassandra server side itself?
>> #2. Does Service S3 need to pull in additional dependencies for Snappy
>> Compressions as mentioned here --
>>
>> http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error
>> #3. What happens without this additional library not being present on
>> class path of Service S3. Any data that S3 writes to TableB will not be
>> compressed?
>> Regards
>> Sachin
>>
>>
>>
>
>

Re: Cassandra Data Stax java driver & Snappy Compression library

Posted by Janne Jalkanen <ja...@ecyrd.com>.
Correct. Note that you may lose some performance this way though; in a typical case saving bandwidth by increasing CPU usage is good. However, it always depends on your usecase and whether you’re running your cluster to the max. It’s a good, low-hanging optimization to keep in mind though for production environments, if you choose not to enable compression now.

/Janne

> On 3 Aug 2015, at 08:40, Sachin Nikam <sk...@gmail.com> wrote:
> 
> Thanks Janne...
> To clarify, Service S3 should not run in to any issues and I may choose to not fix the issue?
> Regards
> Sachin
> 
> On Sat, Aug 1, 2015 at 11:50 PM, Janne Jalkanen <Janne.Jalkanen@ecyrd.com <ma...@ecyrd.com>> wrote:
> No, this just tells that your client (S3 using Datastax driver) cannot communicate to the Cassandra cluster using a compressed protocol, since the necessary libraries are missing on the client side.  Servers will still compress the data they receive when they write it to disk.
> 
> In other words
> 
> Client  <- [uncompressed data] -> Server <- [compressed data] -> Disk. 
> 
> To fix, make sure that the Snappy libraries are in the classpath of your S3 service application.  As always, there’s no guarantee that this improves your performance, since if your app is already CPU-heavy, the extra CPU overhead of compression *may* be a problem.  So measure :-)
> 
> /Janne
> 
>> On 02 Aug 2015, at 02:17 , Sachin Nikam <sknikam@gmail.com <ma...@gmail.com>> wrote:
>> 
>> I am currently running a Cassandra 1.2 cluster. This cluster has 2 tables i.e.
>> TableA and TableB.
>> 
>> TableA is read and written to by Services S1 and S2 which use Astyanax client library.
>> 
>> TableB is read and written by Service S3 which uses the datastax java driver 2.1. S3 also reads data from TableA.
>> 
>> Both TableA and TableB are defined on the Cassandra nodes to use SnappyCompressor.
>> 
>> On start-up service, Service S3 throws the following WARNing messages. The service is able to continue doing its normal operation thereafter
>> 
>> **************
>> [main] WARN  loggerClass=com.datastax.driver.core.FrameCompressor;Cannot find Snappy class, you should make sure the Snappy library is in the classpath if you intend to use it. Snappy compression will not be available for the protocol.
>> ***********
>> 
>> 
>> My questions are as follows--
>> #1. Does the compression happen on the cassandra client side or within cassandra server side itself?
>> #2. Does Service S3 need to pull in additional dependencies for Snappy Compressions as mentioned here --
>> http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error <http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error>
>> #3. What happens without this additional library not being present on class path of Service S3. Any data that S3 writes to TableB will not be compressed? 
>> Regards
>> Sachin
> 
> 


Re: Cassandra Data Stax java driver & Snappy Compression library

Posted by Sachin Nikam <sk...@gmail.com>.
Thanks Janne...
To clarify, Service S3 should not run in to any issues and I may choose to
not fix the issue?
Regards
Sachin

On Sat, Aug 1, 2015 at 11:50 PM, Janne Jalkanen <Ja...@ecyrd.com>
wrote:

> No, this just tells that your client (S3 using Datastax driver) cannot
> communicate to the Cassandra cluster using a compressed protocol, since the
> necessary libraries are missing on the client side.  Servers will still
> compress the data they receive when they write it to disk.
>
> In other words
>
> Client  <- [uncompressed data] -> Server <- [compressed data] -> Disk.
>
> To fix, make sure that the Snappy libraries are in the classpath of your
> S3 service application.  As always, there’s no guarantee that this improves
> your performance, since if your app is already CPU-heavy, the extra CPU
> overhead of compression *may* be a problem.  So measure :-)
>
> /Janne
>
> On 02 Aug 2015, at 02:17 , Sachin Nikam <sk...@gmail.com> wrote:
>
> I am currently running a Cassandra 1.2 cluster. This cluster has 2 tables
> i.e.
> TableA and TableB.
>
> TableA is read and written to by Services S1 and S2 which use Astyanax
> client library.
>
> TableB is read and written by Service S3 which uses the datastax java
> driver 2.1. S3 also reads data from TableA.
>
> Both TableA and TableB are defined on the Cassandra nodes to use
> SnappyCompressor.
>
> On start-up service, Service S3 throws the following WARNing messages. The
> service is able to continue doing its normal operation thereafter
>
> **************
> [main] WARN loggerClass=com.datastax.driver.core.FrameCompressor;Cannot
> find Snappy class, you should make sure the Snappy library is in the
> classpath if you intend to use it. Snappy compression will not be
> available for the protocol.
> ***********
>
>
> My questions are as follows--
> #1. Does the compression happen on the cassandra client side or within
> cassandra server side itself?
> #2. Does Service S3 need to pull in additional dependencies for Snappy
> Compressions as mentioned here --
>
> http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error
> #3. What happens without this additional library not being present on
> class path of Service S3. Any data that S3 writes to TableB will not be
> compressed?
> Regards
> Sachin
>
>
>

Re: Cassandra Data Stax java driver & Snappy Compression library

Posted by Janne Jalkanen <Ja...@ecyrd.com>.
No, this just tells that your client (S3 using Datastax driver) cannot communicate to the Cassandra cluster using a compressed protocol, since the necessary libraries are missing on the client side.  Servers will still compress the data they receive when they write it to disk.

In other words

Client  <- [uncompressed data] -> Server <- [compressed data] -> Disk. 

To fix, make sure that the Snappy libraries are in the classpath of your S3 service application.  As always, there’s no guarantee that this improves your performance, since if your app is already CPU-heavy, the extra CPU overhead of compression *may* be a problem.  So measure :-)

/Janne

> On 02 Aug 2015, at 02:17 , Sachin Nikam <sk...@gmail.com> wrote:
> 
> I am currently running a Cassandra 1.2 cluster. This cluster has 2 tables i.e.
> TableA and TableB.
> 
> TableA is read and written to by Services S1 and S2 which use Astyanax client library.
> 
> TableB is read and written by Service S3 which uses the datastax java driver 2.1. S3 also reads data from TableA.
> 
> Both TableA and TableB are defined on the Cassandra nodes to use SnappyCompressor.
> 
> On start-up service, Service S3 throws the following WARNing messages. The service is able to continue doing its normal operation thereafter
> 
> **************
> [main] WARN  loggerClass=com.datastax.driver.core.FrameCompressor;Cannot find Snappy class, you should make sure the Snappy library is in the classpath if you intend to use it. Snappy compression will not be available for the protocol.
> ***********
> 
> 
> My questions are as follows--
> #1. Does the compression happen on the cassandra client side or within cassandra server side itself?
> #2. Does Service S3 need to pull in additional dependencies for Snappy Compressions as mentioned here --
> http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error <http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error>
> #3. What happens without this additional library not being present on class path of Service S3. Any data that S3 writes to TableB will not be compressed? 
> Regards
> Sachin