You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Evan Weaver <ew...@gmail.com> on 2009/07/26 09:28:29 UTC

Symbolizing column names for storage and cache efficiency

This article http://bit.ly/FJgTE about MongoDB is interesting. They
prioritized low barriers to entry in their selection process, and
ignored performance/scaling of any kind.

Aside from that, they mention that for row-oriented storage,
serializing the same string column names to disk for every row is a
big waste of disk and cache space. As far as I know, this affects
Cassandra too.

Would it be possible to add symbolized column names in a
forward-compatible way? Maybe scoped per sstable, with the registries
always kept in memory. Each node could individually make a decision
about whether a column name is duplicated enough to be worth
symbolizing, and apply the transformation in the compaction phase.

Of course there are pitfalls, but it seems like it could be a big boon
to effective cache size in row-oriented applications.

Evan

-- 
Evan Weaver

Re: Symbolizing column names for storage and cache efficiency

Posted by Evan Weaver <ew...@gmail.com>.
As a related example, a columnar database I know uses RLE compression
on value streams because it's transparent. This makes calculating
statistical features of a compressed column faster than if it was
uncompressed in many use cases.

Since we have unpredictable column offsets, column-specific lookups
would benefit from similar transparency considerations.

Evan

On Sun, Jul 26, 2009 at 3:03 PM, Stu Hood<st...@rackspace.com> wrote:
>> blobs through LZW or similar. Aggregation operations benefit in
>> particular because you can often never even bother to decompress the
>> rows.
> This is an interesting consideration. Hopefully a suitably flexible implementation of pluggable codecs would be able to allow for 'keys only' compression transparently. Your aggregation example could be very optimized in this case.
>
> PS: When I envision SSTable compression, I'm definitely thinking of block level compression, so each codec would need to implement seek() at a block level. The SSTable index would then point at the first compressed block containing a key.
>
>
> -----Original Message-----
> From: "Evan Weaver" <ew...@gmail.com>
> Sent: Sunday, July 26, 2009 5:23pm
> To: cassandra-dev@incubator.apache.org
> Subject: Re: Symbolizing column names for storage and cache efficiency
>
> Re. Jonathan, I haven't run across a row-oriented use case where
> symbolizing merely the first 1000 column names seen would not work.
>
> Re. Stu, If generalized compression can cover this case that should be
> fine....burn some CPU for a more straightforward implementation.
>
> However, it's often very useful in databases to have transparent
> compression (that is, operations can be performed on the data even in
> its compressed state). So I would advocate not merely passing the row
> blobs through LZW or similar. Aggregation operations benefit in
> particular because you can often never even bother to decompress the
> rows.
>
> This isn't relevant with current Cassandra, but could be a boon to
> in-database stored procedures and the like.
>
> Evan
> On Sun, Jul 26, 2009 at 2:11 PM, Stu Hood<st...@rackspace.com> wrote:
>> Also, long term, I think it is safe to assume that we will be adding compression for ColumnFamilies, which should have similar positive effects on cache-ability without too much application specific optimization.
>>
>>
>> -----Original Message-----
>> From: "Jonathan Ellis" <jb...@gmail.com>
>> Sent: Sunday, July 26, 2009 4:46pm
>> To: cassandra-dev@incubator.apache.org
>> Subject: Re: Symbolizing column names for storage and cache efficiency
>>
>> On Sun, Jul 26, 2009 at 2:28 AM, Evan Weaver<ew...@gmail.com> wrote:
>>> Would it be possible to add symbolized column names in a
>>> forward-compatible way? Maybe scoped per sstable, with the registries
>>> always kept in memory.
>>
>> Maybe.  But it's not obvious to me how to do this in general.
>>
>> The problem is the sparse nature of the column set.  We can't encode
>> _all_ the columns this way, or in the degenerate case we OOM just
>> trying to keep the mapping in memory.  Similarly, we can't encode just
>> the top N column names, since figuring out the top N requires keeping
>> each name in memory during the counting process.  (Besides slowing
>> down compaction -- instead of just deserializing columns where there
>> are keys in common in the merged fragments, we have to deserialize
>> all.)
>>
>> ISTM that all we can do is encode the _first_ N column names we see,
>> which may be useful if the column name set is small for a given CF.
>>
>> -Jonathan
>>
>>
>>
>
>
>
> --
> Evan Weaver
>
>
>



-- 
Evan Weaver

Re: Symbolizing column names for storage and cache efficiency

Posted by Stu Hood <st...@rackspace.com>.
> blobs through LZW or similar. Aggregation operations benefit in
> particular because you can often never even bother to decompress the
> rows.
This is an interesting consideration. Hopefully a suitably flexible implementation of pluggable codecs would be able to allow for 'keys only' compression transparently. Your aggregation example could be very optimized in this case.

PS: When I envision SSTable compression, I'm definitely thinking of block level compression, so each codec would need to implement seek() at a block level. The SSTable index would then point at the first compressed block containing a key.


-----Original Message-----
From: "Evan Weaver" <ew...@gmail.com>
Sent: Sunday, July 26, 2009 5:23pm
To: cassandra-dev@incubator.apache.org
Subject: Re: Symbolizing column names for storage and cache efficiency

Re. Jonathan, I haven't run across a row-oriented use case where
symbolizing merely the first 1000 column names seen would not work.

Re. Stu, If generalized compression can cover this case that should be
fine....burn some CPU for a more straightforward implementation.

However, it's often very useful in databases to have transparent
compression (that is, operations can be performed on the data even in
its compressed state). So I would advocate not merely passing the row
blobs through LZW or similar. Aggregation operations benefit in
particular because you can often never even bother to decompress the
rows.

This isn't relevant with current Cassandra, but could be a boon to
in-database stored procedures and the like.

Evan
On Sun, Jul 26, 2009 at 2:11 PM, Stu Hood<st...@rackspace.com> wrote:
> Also, long term, I think it is safe to assume that we will be adding compression for ColumnFamilies, which should have similar positive effects on cache-ability without too much application specific optimization.
>
>
> -----Original Message-----
> From: "Jonathan Ellis" <jb...@gmail.com>
> Sent: Sunday, July 26, 2009 4:46pm
> To: cassandra-dev@incubator.apache.org
> Subject: Re: Symbolizing column names for storage and cache efficiency
>
> On Sun, Jul 26, 2009 at 2:28 AM, Evan Weaver<ew...@gmail.com> wrote:
>> Would it be possible to add symbolized column names in a
>> forward-compatible way? Maybe scoped per sstable, with the registries
>> always kept in memory.
>
> Maybe.  But it's not obvious to me how to do this in general.
>
> The problem is the sparse nature of the column set.  We can't encode
> _all_ the columns this way, or in the degenerate case we OOM just
> trying to keep the mapping in memory.  Similarly, we can't encode just
> the top N column names, since figuring out the top N requires keeping
> each name in memory during the counting process.  (Besides slowing
> down compaction -- instead of just deserializing columns where there
> are keys in common in the merged fragments, we have to deserialize
> all.)
>
> ISTM that all we can do is encode the _first_ N column names we see,
> which may be useful if the column name set is small for a given CF.
>
> -Jonathan
>
>
>



-- 
Evan Weaver



Re: Symbolizing column names for storage and cache efficiency

Posted by Evan Weaver <ew...@gmail.com>.
Re. Jonathan, I haven't run across a row-oriented use case where
symbolizing merely the first 1000 column names seen would not work.

Re. Stu, If generalized compression can cover this case that should be
fine....burn some CPU for a more straightforward implementation.

However, it's often very useful in databases to have transparent
compression (that is, operations can be performed on the data even in
its compressed state). So I would advocate not merely passing the row
blobs through LZW or similar. Aggregation operations benefit in
particular because you can often never even bother to decompress the
rows.

This isn't relevant with current Cassandra, but could be a boon to
in-database stored procedures and the like.

Evan
On Sun, Jul 26, 2009 at 2:11 PM, Stu Hood<st...@rackspace.com> wrote:
> Also, long term, I think it is safe to assume that we will be adding compression for ColumnFamilies, which should have similar positive effects on cache-ability without too much application specific optimization.
>
>
> -----Original Message-----
> From: "Jonathan Ellis" <jb...@gmail.com>
> Sent: Sunday, July 26, 2009 4:46pm
> To: cassandra-dev@incubator.apache.org
> Subject: Re: Symbolizing column names for storage and cache efficiency
>
> On Sun, Jul 26, 2009 at 2:28 AM, Evan Weaver<ew...@gmail.com> wrote:
>> Would it be possible to add symbolized column names in a
>> forward-compatible way? Maybe scoped per sstable, with the registries
>> always kept in memory.
>
> Maybe.  But it's not obvious to me how to do this in general.
>
> The problem is the sparse nature of the column set.  We can't encode
> _all_ the columns this way, or in the degenerate case we OOM just
> trying to keep the mapping in memory.  Similarly, we can't encode just
> the top N column names, since figuring out the top N requires keeping
> each name in memory during the counting process.  (Besides slowing
> down compaction -- instead of just deserializing columns where there
> are keys in common in the merged fragments, we have to deserialize
> all.)
>
> ISTM that all we can do is encode the _first_ N column names we see,
> which may be useful if the column name set is small for a given CF.
>
> -Jonathan
>
>
>



-- 
Evan Weaver

Re: Symbolizing column names for storage and cache efficiency

Posted by Stu Hood <st...@rackspace.com>.
Also, long term, I think it is safe to assume that we will be adding compression for ColumnFamilies, which should have similar positive effects on cache-ability without too much application specific optimization.


-----Original Message-----
From: "Jonathan Ellis" <jb...@gmail.com>
Sent: Sunday, July 26, 2009 4:46pm
To: cassandra-dev@incubator.apache.org
Subject: Re: Symbolizing column names for storage and cache efficiency

On Sun, Jul 26, 2009 at 2:28 AM, Evan Weaver<ew...@gmail.com> wrote:
> Would it be possible to add symbolized column names in a
> forward-compatible way? Maybe scoped per sstable, with the registries
> always kept in memory.

Maybe.  But it's not obvious to me how to do this in general.

The problem is the sparse nature of the column set.  We can't encode
_all_ the columns this way, or in the degenerate case we OOM just
trying to keep the mapping in memory.  Similarly, we can't encode just
the top N column names, since figuring out the top N requires keeping
each name in memory during the counting process.  (Besides slowing
down compaction -- instead of just deserializing columns where there
are keys in common in the merged fragments, we have to deserialize
all.)

ISTM that all we can do is encode the _first_ N column names we see,
which may be useful if the column name set is small for a given CF.

-Jonathan



Re: Symbolizing column names for storage and cache efficiency

Posted by Jonathan Ellis <jb...@gmail.com>.
On Sun, Jul 26, 2009 at 2:28 AM, Evan Weaver<ew...@gmail.com> wrote:
> Would it be possible to add symbolized column names in a
> forward-compatible way? Maybe scoped per sstable, with the registries
> always kept in memory.

Maybe.  But it's not obvious to me how to do this in general.

The problem is the sparse nature of the column set.  We can't encode
_all_ the columns this way, or in the degenerate case we OOM just
trying to keep the mapping in memory.  Similarly, we can't encode just
the top N column names, since figuring out the top N requires keeping
each name in memory during the counting process.  (Besides slowing
down compaction -- instead of just deserializing columns where there
are keys in common in the merged fragments, we have to deserialize
all.)

ISTM that all we can do is encode the _first_ N column names we see,
which may be useful if the column name set is small for a given CF.

-Jonathan