You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Philip Shon <ph...@gmail.com> on 2012/04/12 21:03:08 UTC
Trying to avoid super columns
I am currently working on a data model where the purpose is to look up
multiple products for given days of the year. Right now, that model
involves the usage of a super column family. e.g.
"2012-04-12": {
"product_id_1": {
price: 12.44,
tax: 1.00,
fees: 3.00,
},
"product_id_2": {
price: 50.00,
tax: 4.00,
fees: 10.00
}
}
I should note that for a given day/key, we are expecting in the range of 2
million to 4 million products (subcolumns).
With this model, I am able to retrieve any of the products for a given day
using hector's MultigetSuperSliceQuery.
I am looking into changing this model to use Composite column names. How
would I go about modeling this? My initial thought is to migrate the above
model into something more like the following.
"2012-04-12": {
"product_id_1:price": 12.44,
"product_id_1:tax": 1.00,
"product_id_1:fees": 3.00,
"product_id_2:price": 50.00,
"product_id_2:tax": 4.00,
"product_id_2:fees": 10.00,
}
The one thing that stands out to me with this approach is the number of
additonal columns that will be created for a single key. Will the increase
in columns, create new issues I will need to deal with?
Are there any other thoughts about if I should actually move forward (or
not) with migration this super column family to the model with the
component column names?
Thanks,
Phil
Re: Trying to avoid super columns
Posted by aaron morton <aa...@thelastpickle.com>.
If this is write once read many data you may get some benefit from packing all the info for a product into one column, using something like JSON for the column value.
>> The one thing that stands out to me with this approach is the number of additonal columns that will be created for a single key. Will the increase in columns, create new issues I will need to deal with?
Millions of columns in a row may be ok, depending on the types of queries you want to run (some background http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/)
The more important issue is the byte size of the row. Wide rows take longer to compact and repair, and I try to avoid rows above a few 10's of MB. By default rows larger than 64MB require slower compaction.
Compression in 1.X will help where you have lots of repeating column names.
Cheers
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 13/04/2012, at 7:32 AM, Dave Brosius wrote:
> If you want to reduce the number of columns, you could pack all the data for a product into one column, as in
>
>
> composite column name-> product_id_1:12.44:1.00:3.00
>
>
>
> On 04/12/2012 03:03 PM, Philip Shon wrote:
>> I am currently working on a data model where the purpose is to look up multiple products for given days of the year. Right now, that model involves the usage of a super column family. e.g.
>>
>> "2012-04-12": {
>> "product_id_1": {
>> price: 12.44,
>> tax: 1.00,
>> fees: 3.00,
>> },
>> "product_id_2": {
>> price: 50.00,
>> tax: 4.00,
>> fees: 10.00
>> }
>> }
>>
>> I should note that for a given day/key, we are expecting in the range of 2 million to 4 million products (subcolumns).
>>
>> With this model, I am able to retrieve any of the products for a given day using hector's MultigetSuperSliceQuery.
>>
>>
>> I am looking into changing this model to use Composite column names. How would I go about modeling this? My initial thought is to migrate the above model into something more like the following.
>>
>> "2012-04-12": {
>> "product_id_1:price": 12.44,
>> "product_id_1:tax": 1.00,
>> "product_id_1:fees": 3.00,
>> "product_id_2:price": 50.00,
>> "product_id_2:tax": 4.00,
>> "product_id_2:fees": 10.00,
>> }
>>
>> The one thing that stands out to me with this approach is the number of additonal columns that will be created for a single key. Will the increase in columns, create new issues I will need to deal with?
>>
>> Are there any other thoughts about if I should actually move forward (or not) with migration this super column family to the model with the component column names?
>>
>> Thanks,
>>
>> Phil
>
Re: Trying to avoid super columns
Posted by Dave Brosius <db...@mebigfatguy.com>.
If you want to reduce the number of columns, you could pack all the data
for a product into one column, as in
composite column name-> product_id_1:12.44:1.00:3.00
On 04/12/2012 03:03 PM, Philip Shon wrote:
> I am currently working on a data model where the purpose is to look up
> multiple products for given days of the year. Right now, that model
> involves the usage of a super column family. e.g.
>
> "2012-04-12": {
> "product_id_1": {
> price: 12.44,
> tax: 1.00,
> fees: 3.00,
> },
> "product_id_2": {
> price: 50.00,
> tax: 4.00,
> fees: 10.00
> }
> }
>
> I should note that for a given day/key, we are expecting in the range
> of 2 million to 4 million products (subcolumns).
>
> With this model, I am able to retrieve any of the products for a given
> day using hector's MultigetSuperSliceQuery.
>
>
> I am looking into changing this model to use Composite column names.
> How would I go about modeling this? My initial thought is to migrate
> the above model into something more like the following.
>
> "2012-04-12": {
> "product_id_1:price": 12.44,
> "product_id_1:tax": 1.00,
> "product_id_1:fees": 3.00,
> "product_id_2:price": 50.00,
> "product_id_2:tax": 4.00,
> "product_id_2:fees": 10.00,
> }
>
> The one thing that stands out to me with this approach is the number
> of additonal columns that will be created for a single key. Will the
> increase in columns, create new issues I will need to deal with?
>
> Are there any other thoughts about if I should actually move forward
> (or not) with migration this super column family to the model with the
> component column names?
>
> Thanks,
>
> Phil