You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by T Akhayo <t....@gmail.com> on 2011/03/29 16:05:26 UTC

Two column families or One super column family?

Good afternoon,

I'm making my data model from scratch for cassandra, this means i can tune
and fine tune it for performance.

At this time i'm having problems choosing between a 2 column families or 1
super column family. I will illustrate with a example.

Sector, this defines a place, this is one or two properties.
Entry, a entry that is bound to a sector, this is simply some text and a few
properties.

I can model this with a super column family:

sectors{ //super column family
sector1{
uid1{
text: a text
user: joop
}
uid2{
text: more text
user: piet
}
}
sector2{
uid10{
text: even more text
user: marie
}
}
}

But i can also model this with 2 column families:

sectors{ // column family
sector1{
textid1: null
textid2: null
}
sector2{
textid4: null
}
}

texts{ //column family
textid1{
text: a text
user: joop
}
textid2{
text: more text
user: piet
}
}

With the super column family i can retrieve a list of texts for a specific
sector with only 1 request to cassandra.

With the 2 column families i need to send 2 requests to cassandra:
1. give me all textids from sector x. (returns x, y, z)
2. give me all texts that have id x, y, z.

In my final application it is likely that there will be a bit more writes
compared to reads.

I was wondering what the best approach is when it comes to performance. I
suspect that using super column families is slower compared the using column
families, but is it stil slower when using 2 column families and with 2
request to cassandra instead of 1 (with super column family).

Kind regards,
T. Akhayo

Re: Two column families or One super column family?

Posted by Edward Capriolo <ed...@gmail.com>.

On Thu, Mar 31, 2011 at 3:52 AM, T Akhayo <t....@gmail.com> wrote:
> Hi Aaron,
>
> Thank you for your reply, i appreciate the suggestions you made.
>
> Yesterday i managed to get everything (our main read) in one CF, with the
> use of a structure in a value like you suggested.
>
> Designing a new data model is different from what i'm used to, but if you
> keep in mind that you designing for performance instead of flexibility then
> everything gets a bit easier.
>
> Kind regards,
> T. Akhayo
>
> 2011/3/30 aaron morton <aa...@thelastpickle.com>
>>
>> I would go with the solution that means you only have to make one request
>> to serve your reads, so consider the super CF approach.
>> There are some downsides to super columns
>> see http://wiki.apache.org/cassandra/CassandraLimitations and they tend to
>> have a love-them-hate-them reputation.
>> One thing to consider is that you do not need to model every attribute of
>> your entity as a column in cassandra. Especially if you are always going to
>> pull back all the attributes. So you could do your super CF approach with a
>> standard CF, just pack the columns into some sort of structure such as JSON
>> and store them as a blob.
>> Or you can use a naming scheme in the column names with a standard CF,
>> e.g. uuid1.text and uuid2.text
>> Hope that helps.
>> Aaron
>> On 30 Mar 2011, at 01:05, T Akhayo wrote:
>>
>> Good afternoon,
>>
>> I'm making my data model from scratch for cassandra, this means i can tune
>> and fine tune it for performance.
>>
>> At this time i'm having problems choosing between a 2 column families or 1
>> super column family. I will illustrate with a example.
>>
>> Sector, this defines a place, this is one or two properties.
>> Entry, a entry that is bound to a sector, this is simply some text and a
>> few properties.
>>
>> I can model this with a super column family:
>>
>> sectors{ //super column family
>> sector1{
>> uid1{
>> text: a text
>> user: joop
>> }
>> uid2{
>> text: more text
>> user: piet
>> }
>> }
>> sector2{
>> uid10{
>> text: even more text
>> user: marie
>> }
>> }
>> }
>>
>> But i can also model this with 2 column families:
>>
>> sectors{ // column family
>> sector1{
>> textid1: null
>> textid2: null
>> }
>> sector2{
>> textid4: null
>> }
>> }
>>
>> texts{ //column family
>> textid1{
>> text: a text
>> user: joop
>> }
>> textid2{
>> text: more text
>> user: piet
>> }
>> }
>>
>> With the super column family i can retrieve a list of texts for a specific
>> sector with only 1 request to cassandra.
>>
>> With the 2 column families i need to send 2 requests to cassandra:
>> 1. give me all textids from sector x. (returns x, y, z)
>> 2. give me all texts that have id x, y, z.
>>
>> In my final application it is likely that there will be a bit more writes
>> compared to reads.
>>
>> I was wondering what the best approach is when it comes to performance. I
>> suspect that using super column families is slower compared the using column
>> families, but is it stil slower when using 2 column families and with 2
>> request to cassandra instead of 1 (with super column family).
>>
>> Kind regards,
>> T. Akhayo
>>
>
>

I decided to write this as a general guide to the topic of
denormalizing things into multiple CF's or not.
  	 http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/whytf_would_i_need_with

Re: Two column families or One super column family?

Posted by T Akhayo <t....@gmail.com>.

Hi Aaron,

Thank you for your reply, i appreciate the suggestions you made.

Yesterday i managed to get everything (our main read) in one CF, with the
use of a structure in a value like you suggested.

Designing a new data model is different from what i'm used to, but if you
keep in mind that you designing for performance instead of flexibility then
everything gets a bit easier.

Kind regards,
T. Akhayo

2011/3/30 aaron morton <aa...@thelastpickle.com>

> I would go with the solution that means you only have to make one request
> to serve your reads, so consider the super CF approach.
>
> There are some downsides to super columns see
> http://wiki.apache.org/cassandra/CassandraLimitations and they tend to
> have a love-them-hate-them reputation.
>
> One thing to consider is that you do not need to model every attribute of
> your entity as a column in cassandra. Especially if you are always going to
> pull back all the attributes. So you could do your super CF approach with a
> standard CF, just pack the columns into some sort of structure such as JSON
> and store them as a blob.
>
> Or you can use a naming scheme in the column names with a standard CF, e.g.
> uuid1.text and uuid2.text
>
> Hope that helps.
> Aaron
>
> On 30 Mar 2011, at 01:05, T Akhayo wrote:
>
> Good afternoon,
>
> I'm making my data model from scratch for cassandra, this means i can tune
> and fine tune it for performance.
>
> At this time i'm having problems choosing between a 2 column families or 1
> super column family. I will illustrate with a example.
>
> Sector, this defines a place, this is one or two properties.
> Entry, a entry that is bound to a sector, this is simply some text and a
> few properties.
>
> I can model this with a super column family:
>
> sectors{ //super column family
> sector1{
> uid1{
> text: a text
> user: joop
> }
> uid2{
> text: more text
> user: piet
> }
> }
> sector2{
> uid10{
> text: even more text
> user: marie
> }
> }
> }
>
> But i can also model this with 2 column families:
>
> sectors{ // column family
> sector1{
> textid1: null
> textid2: null
> }
> sector2{
> textid4: null
> }
> }
>
> texts{ //column family
> textid1{
> text: a text
> user: joop
> }
> textid2{
> text: more text
> user: piet
> }
> }
>
> With the super column family i can retrieve a list of texts for a specific
> sector with only 1 request to cassandra.
>
> With the 2 column families i need to send 2 requests to cassandra:
> 1. give me all textids from sector x. (returns x, y, z)
> 2. give me all texts that have id x, y, z.
>
> In my final application it is likely that there will be a bit more writes
> compared to reads.
>
> I was wondering what the best approach is when it comes to performance. I
> suspect that using super column families is slower compared the using column
> families, but is it stil slower when using 2 column families and with 2
> request to cassandra instead of 1 (with super column family).
>
> Kind regards,
> T. Akhayo
>
>
>

Re: Two column families or One super column family?

Posted by aaron morton <aa...@thelastpickle.com>.

I would go with the solution that means you only have to make one request to serve your reads, so consider the super CF approach. 

There are some downsides to super columns see http://wiki.apache.org/cassandra/CassandraLimitations and they tend to have a love-them-hate-them reputation.

One thing to consider is that you do not need to model every attribute of your entity as a column in cassandra. Especially if you are always going to pull back all the attributes. So you could do your super CF approach with a standard CF, just pack the columns into some sort of structure such as JSON and store them as a blob. 

Or you can use a naming scheme in the column names with a standard CF, e.g. uuid1.text and uuid2.text 

Hope that helps. 
Aaron

On 30 Mar 2011, at 01:05, T Akhayo wrote:

> Good afternoon,
> 
> I'm making my data model from scratch for cassandra, this means i can tune and fine tune it for performance.
> 
> At this time i'm having problems choosing between a 2 column families or 1 super column family. I will illustrate with a example.
> 
> Sector, this defines a place, this is one or two properties.
> Entry, a entry that is bound to a sector, this is simply some text and a few properties.
> 
> I can model this with a super column family:
> 
> sectors{ //super column family
> sector1{
> uid1{
> text: a text
> user: joop
> }
> uid2{
> text: more text
> user: piet
> }
> }
> sector2{
> uid10{
> text: even more text
> user: marie
> }
> }
> }
> 
> But i can also model this with 2 column families:
> 
> sectors{ // column family
> sector1{
> textid1: null
> textid2: null
> }
> sector2{
> textid4: null
> }
> }
> 
> texts{ //column family
> textid1{
> text: a text
> user: joop
> }
> textid2{
> text: more text
> user: piet
> }
> }
> 
> With the super column family i can retrieve a list of texts for a specific sector with only 1 request to cassandra.
> 
> With the 2 column families i need to send 2 requests to cassandra:
> 1. give me all textids from sector x. (returns x, y, z)
> 2. give me all texts that have id x, y, z.
> 
> In my final application it is likely that there will be a bit more writes compared to reads.
> 
> I was wondering what the best approach is when it comes to performance. I suspect that using super column families is slower compared the using column families, but is it stil slower when using 2 column families and with 2 request to cassandra instead of 1 (with super column family).
> 
> Kind regards,
> T. Akhayo