You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Ertio Lew <er...@gmail.com> on 2012/01/19 08:49:32 UTC

Re: Using 5-6 bytes for cassandra timestamps vs 8…

I believe the timestamps *on per column basis* are only required until
the compaction time after that it may also work if the timestamp range
could be specified globally on per SST table basis. and thus the
timestamps until compaction are only required to be measure the time
from the initialization of the new memtable to the point the column is
written to that memtable. Thus you can easily fit that time in 4
bytes. This I believe would save atleast  4 bytes overhead for each
column.

Is anything related to these overheads under consideration/ or planned
in the roadmap ?

On Tue, Sep 6, 2011 at 11:44 AM, Oleg Anastastasyev <ol...@gmail.com> wrote:
>
>>
>> I have a patch for trunk which I just have to get time to test a bit before I
> submit.
>> It is for super columns and will use the super columns timestamp as the base
> and only store variant encoded offsets in the underlying columns.
>>
>
> Could you please measure how much real benefit it brings (in real RAM
> consumption by JVM). It is hard to tell will it give noticeable results or not.
> AFAIK memory structures used for memtable consume much more memory. And 64-bit
> JVM allocates memory aligned to 64-bit word boundary. So 37% of memory
> consumption reduction looks doubtful.
>
>

Re: Using 5-6 bytes for cassandra timestamps vs 8…

Posted by Ertio Lew <er...@gmail.com>.

It wont obviously matter in case your columns are fat but in several cases,
(at least I could think of several cases) where you need to, for example,
just store an integer column name & empty column value. Thus 12 bytes for
the column where 8 bytes is just the overhead to store timestamps doesn't
look very nice. And skinny columns is a very common use-case, I believe.

On Thu, Jan 19, 2012 at 1:26 PM, Maxim Potekhin <po...@bnl.gov> wrote:

> I must have accidentally deleted all messages in this thread save this one.
>
> On the face value, we are talking about saving 2 bytes per column. I know
> it can add up with many columns, but relative to the size of the column --
> is it THAT significant?
>
> I made an effort to minimize my CF footprint by replacing the "natural"
> column keys with integers (and translating back and forth when writing and
> reading). It's easy to see that in my case I achieve almost 50% storage
> savings and at least 30%. But if the column in question contains more than
> 20 bytes -- what's up with trying to save 2?
>
> Cheers
>
> Maxim
>
>
>
> On 1/18/2012 11:49 PM, Ertio Lew wrote:
>
>> I believe the timestamps *on per column basis* are only required until
>> the compaction time after that it may also work if the timestamp range
>> could be specified globally on per SST table basis. and thus the
>> timestamps until compaction are only required to be measure the time
>> from the initialization of the new memtable to the point the column is
>> written to that memtable. Thus you can easily fit that time in 4
>> bytes. This I believe would save atleast  4 bytes overhead for each
>> column.
>>
>> Is anything related to these overheads under consideration/ or planned
>> in the roadmap ?
>>
>>
>>
>> On Tue, Sep 6, 2011 at 11:44 AM, Oleg Anastastasyev<ol...@gmail.com>>
>>  wrote:
>>
>>> I have a patch for trunk which I just have to get time to test a bit
>>>> before I
>>>>
>>> submit.
>>>
>>>> It is for super columns and will use the super columns timestamp as the
>>>> base
>>>>
>>> and only store variant encoded offsets in the underlying columns.
>>> Could you please measure how much real benefit it brings (in real RAM
>>> consumption by JVM). It is hard to tell will it give noticeable results
>>> or not.
>>> AFAIK memory structures used for memtable consume much more memory. And
>>> 64-bit
>>> JVM allocates memory aligned to 64-bit word boundary. So 37% of memory
>>> consumption reduction looks doubtful.
>>>
>>>
>>>
>

Re: Using 5-6 bytes for cassandra timestamps vs 8…

Posted by Maxim Potekhin <po...@bnl.gov>.

I must have accidentally deleted all messages in this thread save this one.

On the face value, we are talking about saving 2 bytes per column. I 
know it can add up with many columns, but relative to the size of the 
column -- is it THAT significant?

I made an effort to minimize my CF footprint by replacing the "natural" 
column keys with integers (and translating back and forth when writing 
and reading). It's easy to see that in my case I achieve almost 50% 
storage savings and at least 30%. But if the column in question contains 
more than 20 bytes -- what's up with trying to save 2?

Cheers

Maxim

On 1/18/2012 11:49 PM, Ertio Lew wrote:
> I believe the timestamps *on per column basis* are only required until
> the compaction time after that it may also work if the timestamp range
> could be specified globally on per SST table basis. and thus the
> timestamps until compaction are only required to be measure the time
> from the initialization of the new memtable to the point the column is
> written to that memtable. Thus you can easily fit that time in 4
> bytes. This I believe would save atleast  4 bytes overhead for each
> column.
>
> Is anything related to these overheads under consideration/ or planned
> in the roadmap ?
>
>
>
> On Tue, Sep 6, 2011 at 11:44 AM, Oleg Anastastasyev<ol...@gmail.com>  wrote:
>>> I have a patch for trunk which I just have to get time to test a bit before I
>> submit.
>>> It is for super columns and will use the super columns timestamp as the base
>> and only store variant encoded offsets in the underlying columns.
>> Could you please measure how much real benefit it brings (in real RAM
>> consumption by JVM). It is hard to tell will it give noticeable results or not.
>> AFAIK memory structures used for memtable consume much more memory. And 64-bit
>> JVM allocates memory aligned to 64-bit word boundary. So 37% of memory
>> consumption reduction looks doubtful.
>>
>>