You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jonathan Shook <js...@gmail.com> on 2010/04/27 23:24:58 UTC

Storage Layout Questions

I'm trying to model a one-to-many set of data in which both sides of the
relation may grow arbitrarily large.
There are arbitrarily many FOOs. For each FOO, there are arbitrarily many
BARs.
Both types are modeled as an object, containing multiple fields (columns) in
the application.
Given a key-addressable FOO element, I'd like to be able to do range access
operations on the associated BARs according to their temporal names.

I wish to avoid:
1) using a super column to nest the temporal ids (or column names) within a
row of the primary key,
     due to the memory-based limitations of super column deserialization.
(and implicit compute costs that go with it)
2) keeping a separate map between the FOO type and the BAR type.
3) serializing all BAR types into the value field of each FOO-keyed,
BAR-named column.

Were the super column addressing more scalable, I'd see it as a natural fit.
Does anybody have an elegant solution to this which I am overlooking? In the
absence of ideas, I'd like some feedback on the trade-offs of the above
"avoids".

Jonathan

Re: Storage Layout Questions

Posted by Jonathan Shook <js...@gmail.com>.

Ah, now I understand. Supercolumns it is.

On Wed, Apr 28, 2010 at 9:40 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> I don't think you are missing anything.  You'll have to pick your poison.
>
> FWIW, if each BAR has relatively few fields then supercolumns aren't
> bad.  It's when a BAR has dynamically growing numbers of fields
> (subcolumns) that you get in trouble with that model.
>
> On Tue, Apr 27, 2010 at 4:24 PM, Jonathan Shook <js...@gmail.com> wrote:
> > I'm trying to model a one-to-many set of data in which both sides of the
> > relation may grow arbitrarily large.
> > There are arbitrarily many FOOs. For each FOO, there are arbitrarily many
> > BARs.
> > Both types are modeled as an object, containing multiple fields (columns)
> in
> > the application.
> > Given a key-addressable FOO element, I'd like to be able to do range
> access
> > operations on the associated BARs according to their temporal names.
> >
> > I wish to avoid:
> > 1) using a super column to nest the temporal ids (or column names) within
> a
> > row of the primary key,
> >      due to the memory-based limitations of super column deserialization.
> > (and implicit compute costs that go with it)
> > 2) keeping a separate map between the FOO type and the BAR type.
> > 3) serializing all BAR types into the value field of each FOO-keyed,
> > BAR-named column.
> >
> > Were the super column addressing more scalable, I'd see it as a natural
> fit.
> > Does anybody have an elegant solution to this which I am overlooking? In
> the
> > absence of ideas, I'd like some feedback on the trade-offs of the above
> > "avoids".
> >
> > Jonathan
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: Storage Layout Questions

Posted by Jonathan Ellis <jb...@gmail.com>.

I don't think you are missing anything.  You'll have to pick your poison.

FWIW, if each BAR has relatively few fields then supercolumns aren't
bad.  It's when a BAR has dynamically growing numbers of fields
(subcolumns) that you get in trouble with that model.

On Tue, Apr 27, 2010 at 4:24 PM, Jonathan Shook <js...@gmail.com> wrote:
> I'm trying to model a one-to-many set of data in which both sides of the
> relation may grow arbitrarily large.
> There are arbitrarily many FOOs. For each FOO, there are arbitrarily many
> BARs.
> Both types are modeled as an object, containing multiple fields (columns) in
> the application.
> Given a key-addressable FOO element, I'd like to be able to do range access
> operations on the associated BARs according to their temporal names.
>
> I wish to avoid:
> 1) using a super column to nest the temporal ids (or column names) within a
> row of the primary key,
>      due to the memory-based limitations of super column deserialization.
> (and implicit compute costs that go with it)
> 2) keeping a separate map between the FOO type and the BAR type.
> 3) serializing all BAR types into the value field of each FOO-keyed,
> BAR-named column.
>
> Were the super column addressing more scalable, I'd see it as a natural fit.
> Does anybody have an elegant solution to this which I am overlooking? In the
> absence of ideas, I'd like some feedback on the trade-offs of the above
> "avoids".
>
> Jonathan
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com