You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Alexandr Porunov <al...@gmail.com> on 2016/09/07 20:54:18 UTC

Is a blob storage cost of cassandra is the same as bigint storage cost for long variables?

Hello,

I need to store a "Long" Java variable.
The question is: whether the storage cost is the same both for store hex
representation of "Long" variable to the blob and for store "Long" variable
to the bigint?
Are there any performance pros or cons?
Is it OK to use blob as primary key?

Sincerely,
Alexandr

Re: Is a blob storage cost of cassandra is the same as bigint storage cost for long variables?

Posted by Romain Hardouin <ro...@yahoo.fr>.
Note that LZ4 compression is used by default. If you want to disable compression you can do this:CREATE/ALTER TABLE ... WITH compression = { 'sstable_compression' : '' };
Best,
Romain
 

    Le Vendredi 9 septembre 2016 8h12, Alexandr Porunov <al...@gmail.com> a écrit :
 

 Hello Romain,
Thank you very much for the explanation!
I have just run a simple test to compare both situations.I have run two VM equivalent machines.Machine 1:CREATE KEYSPACE "test" WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
CREATE TABLE test.simple (  id bigint PRIMARY KEY);
Machine 2:CREATE KEYSPACE "test" WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
CREATE TABLE test.simple (  id blob PRIMARY KEY);
And have put 13421772 primary keys from 1 to 13421772 in both machines.
Results:Machine 1: size of the data folder: 495864 bytesMachine 2: size of the data folder: 495004 bytes
So here is almost no any difference between them (even happened with blob storage cost 1 MB less).
I am happy about it because I need to store special encoded primary keys with 80 bits each. So I can use blob as a primary key without hesitation.
Best regards,Alexandr
On Fri, Sep 9, 2016 at 1:20 AM, Romain Hardouin <ro...@yahoo.fr> wrote:

Hi,
Disk-wise it's the same because a bigint is serialized as a 8 bytes ByteBuffer and if you want to store a Long as bytes into a blob type it will take 8 bytes too, right?The difference is the validation. The blob ByteBuffer will be stored as is whereas the bigint will be validated. So technically the Long is slower, but I guess that's not noticeable.
Yes you can use a blob as a partition key. I would use the bigint both for validation and clarity. 
Best,
Romain 

    Le Mercredi 7 septembre 2016 22h54, Alexandr Porunov <al...@gmail.com> a écrit :
 

 Hello,

I need to store a "Long" Java variable.The question is: whether the storage cost is the same both for store hex representation of "Long" variable to the blob and for store "Long" variable to the bigint?Are there any performance pros or cons?Is it OK to use blob as primary key?
Sincerely,Alexandr

   



   

Re: Is a blob storage cost of cassandra is the same as bigint storage cost for long variables?

Posted by Jim Witschey <ji...@datastax.com>.
Alexandr,

>> Are there any performance pros or cons?

I would do some benchmarking under realistic data load before going
forward with this data model. I don't have numbers to back this up,
but if the column in question is in the key, I'd be concerned about
the cost of comparing and sorting blobs over integers. This could make
read and flush operations, among others, unnecessarily expensive.

Performance concerns aside:

> So here is almost no any difference between them

Like Romain, I'd be surprised if the (very small) disk footprint
benefit were worth the extra complexity and loss of clarity in client
code.

Jim Witschey

Re: Is a blob storage cost of cassandra is the same as bigint storage cost for long variables?

Posted by Alexandr Porunov <al...@gmail.com>.
Hello Romain,

Thank you very much for the explanation!

I have just run a simple test to compare both situations.
I have run two VM equivalent machines.
Machine 1:
CREATE KEYSPACE "test" WITH REPLICATION = { 'class' : 'SimpleStrategy',
'replication_factor' : 1 };

CREATE TABLE test.simple (
  id bigint PRIMARY KEY
);

Machine 2:
CREATE KEYSPACE "test" WITH REPLICATION = { 'class' : 'SimpleStrategy',
'replication_factor' : 1 };

CREATE TABLE test.simple (
  id blob PRIMARY KEY
);

And have put 13421772 primary keys from 1 to 13421772 in both machines.

Results:
Machine 1: size of the data folder: 495864 bytes
Machine 2: size of the data folder: 495004 bytes

So here is almost no any difference between them (even happened with blob
storage cost 1 MB less).

I am happy about it because I need to store special encoded primary keys
with 80 bits each. So I can use blob as a primary key without hesitation.

Best regards,
Alexandr

On Fri, Sep 9, 2016 at 1:20 AM, Romain Hardouin <ro...@yahoo.fr> wrote:

> Hi,
>
> Disk-wise it's the same because a bigint is serialized as a 8 bytes
> ByteBuffer and if you want to store a Long as bytes into a blob type it
> will take 8 bytes too, right?
> The difference is the validation. The blob ByteBuffer will be stored as is
> whereas the bigint will be validated. So technically the Long is slower,
> but I guess that's not noticeable.
>
> Yes you can use a blob as a partition key. I would use the bigint both
> for validation and clarity.
>
> Best,
>
> Romain
>
>
> Le Mercredi 7 septembre 2016 22h54, Alexandr Porunov <
> alexandr.porunov@gmail.com> a écrit :
>
>
> Hello,
>
> I need to store a "Long" Java variable.
> The question is: whether the storage cost is the same both for store hex
> representation of "Long" variable to the blob and for store "Long" variable
> to the bigint?
> Are there any performance pros or cons?
> Is it OK to use blob as primary key?
>
> Sincerely,
> Alexandr
>
>
>

Re: Is a blob storage cost of cassandra is the same as bigint storage cost for long variables?

Posted by Romain Hardouin <ro...@yahoo.fr>.
Hi,
Disk-wise it's the same because a bigint is serialized as a 8 bytes ByteBuffer and if you want to store a Long as bytes into a blob type it will take 8 bytes too, right?The difference is the validation. The blob ByteBuffer will be stored as is whereas the bigint will be validated. So technically the Long is slower, but I guess that's not noticeable.
Yes you can use a blob as a partition key. I would use the bigint both for validation and clarity. 
Best,
Romain 

    Le Mercredi 7 septembre 2016 22h54, Alexandr Porunov <al...@gmail.com> a écrit :
 

 Hello,

I need to store a "Long" Java variable.The question is: whether the storage cost is the same both for store hex representation of "Long" variable to the blob and for store "Long" variable to the bigint?Are there any performance pros or cons?Is it OK to use blob as primary key?
Sincerely,Alexandr