You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Roshni Rajagopal <Ro...@wal-mart.com> on 2012/08/23 12:26:50 UTC

Data Modelling Suggestions

Hi,

Need some help on a data modelling question. We're using Hector & Datastax Enterprise 2.1.


I want to associate a list of items for a user. It should be sorted on the time added. And items can be updated (quantity of the item can be changed), and items can be deleted.
I can model it like this so that its denormalized and I get all my information in one go from one row, sorted by time added. I can use composite columns.

Row key: User Id
Column Name: TimeUUID:item ID: Item Name: Item Description: Item Price: Item Qty
Column Value : Null

Now, how do I handle manipulations

 1.  Add new item :Easy , just a new column
 2.  Add exiting item or modify qty: I want to get to the correct column to update . Can I search by second column in the composite column (equals condition) & update the column name itself to reflect new TimeUUID and qty?  Or would it be better to just add it as a new column and always use the latest column for an item in the application code and delete duplicates in the background.
 3.  Delete item: Can I search by second column in the composite column to find the correct column to delete?

I was trying to find hector examples where we search for second column in a composite column, but I couldn't find any good one. Im not sure if its possible.…if you have any do have any example please share.

Regards,
Roshni


This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***

Re: Data Modelling Suggestions

Posted by aaron morton <aa...@thelastpickle.com>.
> Im finding that only the first component is used ….is this understanding correct?
The result is correct. 

> to (end)component1=timestamp3,component2=123 
is less than 
> Timestamp3: 777

Example:

CREATE COLUMN FAMILY 
    Foo
WITH 
    key_validation_class = UTF8Type
AND 
    comparator = 'CompositeType(IntegerType, IntegerType)'
AND 
    default_validation_class = UTF8Type
;


set Foo['bar']['1:1'] = 'baz1';
set Foo['bar']['2:2'] = 'baz2';
set Foo['bar']['3:3'] = 'baz3';
set Foo['bar']['4:4'] = 'baz4';


aarons-MBP-2011:pycassa aaron$ ./pycassaShell -k dev
In [2]: FOO.get("bar")
Out[2]: OrderedDict([((1, 1), u'baz1'), ((2, 2), u'baz2'), ((3, 3), u'baz3'), ((4, 4), u'baz4')])

In [6]: FOO.get("bar", column_start=(2,2))
Out[6]: OrderedDict([((2, 2), u'baz2'), ((3, 3), u'baz3'), ((4, 4), u'baz4')])

In [8]: FOO.get("bar", column_start=(2,2), column_finish=(3,3))
Out[8]: OrderedDict([((2, 2), u'baz2'), ((3, 3), u'baz3')])

In [9]: FOO.get("bar", column_start=(2,2), column_finish=(3,1))
Out[9]: OrderedDict([((2, 2), u'baz2')])

In [10]: FOO.get("bar", column_start=(2,), column_finish=(3,))
Out[10]: OrderedDict([((2, 2), u'baz2'), ((3, 3), u'baz3')])

> We see a lot of examples about Timeseries modelling ...

Sorry I do not understand this question. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/08/2012, at 11:17 PM, Roshni Rajagopal <Ro...@wal-mart.com> wrote:

> Thank you Aaron & Guillermo,
> 
> I find composite columns very confusing :(
> To reconfirm ,
> 
> 1.  we can only search for columns  range with the first component on the composite column.
> 2.  After specifying a range for the first component, we cannot further filter for the second component.  I found this link http://doanduyhai.wordpress.com/2012/07/05/apache-cassandra-tricks-and-traps/  which seems to suggest filtering is possible by second component in addition to first, and I tried the same example but I couldn't get it to work. Does anyone have an example where suppose I have data like this in my column names
> 
> Timestamp1: 123, Timestamp2: 456, Timestamp3: 777,Timestamp4: 654  ---get range of columns for (start)component1 = timestamp1, component2=123 , to (end)component1=timestamp3,component2=123  --> should give me only one column
> Im finding that only the first component is used ….is this understanding correct?
> 
> 
> We see a lot of examples about Timeseries modelling with TimeUUID as column names. But how is the updating or deletion of columns happening here, how are the columns found to know which ones to delete or modify. Does one always need a separate column family to handle updating/deletion for time series, or is usually handled by setting TTL for data outside the archival period, or does time series modelling usually not involve any manipulation of past records?
> 
> Regards,
> Roshni
> 
> 
> 
> From: aaron morton <aa...@thelastpickle.com>>
> Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
> To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
> Subject: Re: Data Modelling Suggestions
> 
> I was trying to find hector examples where we search for second column in a composite column, but I couldn't find any good one. Im not sure if its possible.…if you have any do have any example please share.
> It's not. When slicing columns you can only return one contiguous range.
> 
> Anyway I would prefer storing the item-ids as column names in the main column family and having a second CF for the order-by-date query only with the pair timestamp_itemid. That way you can add later other query strategies without messing with how you store the item
> +1
> Have the orders somewhere, and build a time ordered custom index to show them in order.
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 24/08/2012, at 6:28 AM, Guillermo Winkler <gw...@inconcertcc.com>> wrote:
> 
> I think you need another CF as index.
> 
> user_itemid -> timestamped column_name
> 
> Otherwise you can't guess what's the timestamp to use in the column name.
> 
> Anyway I would prefer storing the item-ids as column names in the main column family and having a second CF for the order-by-date query only with the pair timestamp_itemid. That way you can add later other query strategies without messing with how you store the item information.
> 
> Maybe you can solve it with a secondary index by timestamp too.
> 
> Guille
> 
> 
> On Thu, Aug 23, 2012 at 7:26 AM, Roshni Rajagopal <Ro...@wal-mart.com>> wrote:
> Hi,
> 
> Need some help on a data modelling question. We're using Hector & Datastax Enterprise 2.1.
> 
> 
> I want to associate a list of items for a user. It should be sorted on the time added. And items can be updated (quantity of the item can be changed), and items can be deleted.
> I can model it like this so that its denormalized and I get all my information in one go from one row, sorted by time added. I can use composite columns.
> 
> Row key: User Id
> Column Name: TimeUUID:item ID: Item Name: Item Description: Item Price: Item Qty
> Column Value : Null
> 
> Now, how do I handle manipulations
> 
> 1.  Add new item :Easy , just a new column
> 2.  Add exiting item or modify qty: I want to get to the correct column to update . Can I search by second column in the composite column (equals condition) & update the column name itself to reflect new TimeUUID and qty?  Or would it be better to just add it as a new column and always use the latest column for an item in the application code and delete duplicates in the background.
> 3.  Delete item: Can I search by second column in the composite column to find the correct column to delete?
> 
> I was trying to find hector examples where we search for second column in a composite column, but I couldn't find any good one. Im not sure if its possible.…if you have any do have any example please share.
> 
> Regards,
> Roshni
> 
> 
> This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***
> 
> 
> This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***


Re: Data Modelling Suggestions

Posted by Roshni Rajagopal <Ro...@wal-mart.com>.
Thank you Aaron & Guillermo,

I find composite columns very confusing :(
To reconfirm ,

 1.  we can only search for columns  range with the first component on the composite column.
 2.  After specifying a range for the first component, we cannot further filter for the second component.  I found this link http://doanduyhai.wordpress.com/2012/07/05/apache-cassandra-tricks-and-traps/  which seems to suggest filtering is possible by second component in addition to first, and I tried the same example but I couldn't get it to work. Does anyone have an example where suppose I have data like this in my column names

Timestamp1: 123, Timestamp2: 456, Timestamp3: 777,Timestamp4: 654  ---get range of columns for (start)component1 = timestamp1, component2=123 , to (end)component1=timestamp3,component2=123  --> should give me only one column
Im finding that only the first component is used ….is this understanding correct?


We see a lot of examples about Timeseries modelling with TimeUUID as column names. But how is the updating or deletion of columns happening here, how are the columns found to know which ones to delete or modify. Does one always need a separate column family to handle updating/deletion for time series, or is usually handled by setting TTL for data outside the archival period, or does time series modelling usually not involve any manipulation of past records?

Regards,
Roshni



From: aaron morton <aa...@thelastpickle.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: Data Modelling Suggestions

I was trying to find hector examples where we search for second column in a composite column, but I couldn't find any good one. Im not sure if its possible.…if you have any do have any example please share.
It's not. When slicing columns you can only return one contiguous range.

Anyway I would prefer storing the item-ids as column names in the main column family and having a second CF for the order-by-date query only with the pair timestamp_itemid. That way you can add later other query strategies without messing with how you store the item
+1
Have the orders somewhere, and build a time ordered custom index to show them in order.

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/08/2012, at 6:28 AM, Guillermo Winkler <gw...@inconcertcc.com>> wrote:

I think you need another CF as index.

user_itemid -> timestamped column_name

Otherwise you can't guess what's the timestamp to use in the column name.

Anyway I would prefer storing the item-ids as column names in the main column family and having a second CF for the order-by-date query only with the pair timestamp_itemid. That way you can add later other query strategies without messing with how you store the item information.

Maybe you can solve it with a secondary index by timestamp too.

Guille


On Thu, Aug 23, 2012 at 7:26 AM, Roshni Rajagopal <Ro...@wal-mart.com>> wrote:
Hi,

Need some help on a data modelling question. We're using Hector & Datastax Enterprise 2.1.


I want to associate a list of items for a user. It should be sorted on the time added. And items can be updated (quantity of the item can be changed), and items can be deleted.
I can model it like this so that its denormalized and I get all my information in one go from one row, sorted by time added. I can use composite columns.

Row key: User Id
Column Name: TimeUUID:item ID: Item Name: Item Description: Item Price: Item Qty
Column Value : Null

Now, how do I handle manipulations

 1.  Add new item :Easy , just a new column
 2.  Add exiting item or modify qty: I want to get to the correct column to update . Can I search by second column in the composite column (equals condition) & update the column name itself to reflect new TimeUUID and qty?  Or would it be better to just add it as a new column and always use the latest column for an item in the application code and delete duplicates in the background.
 3.  Delete item: Can I search by second column in the composite column to find the correct column to delete?

I was trying to find hector examples where we search for second column in a composite column, but I couldn't find any good one. Im not sure if its possible.…if you have any do have any example please share.

Regards,
Roshni


This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***


This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***

Re: Data Modelling Suggestions

Posted by aaron morton <aa...@thelastpickle.com>.
> I was trying to find hector examples where we search for second column in a composite column, but I couldn't find any good one. Im not sure if its possible.…if you have any do have any example please share.
It's not. When slicing columns you can only return one contiguous range. 

> Anyway I would prefer storing the item-ids as column names in the main column family and having a second CF for the order-by-date query only with the pair timestamp_itemid. That way you can add later other query strategies without messing with how you store the item 
+1
Have the orders somewhere, and build a time ordered custom index to show them in order. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/08/2012, at 6:28 AM, Guillermo Winkler <gw...@inconcertcc.com> wrote:

> I think you need another CF as index.
> 
> user_itemid -> timestamped column_name
> 
> Otherwise you can't guess what's the timestamp to use in the column name.
> 
> Anyway I would prefer storing the item-ids as column names in the main column family and having a second CF for the order-by-date query only with the pair timestamp_itemid. That way you can add later other query strategies without messing with how you store the item information.
> 
> Maybe you can solve it with a secondary index by timestamp too.
> 
> Guille
> 
> 
> On Thu, Aug 23, 2012 at 7:26 AM, Roshni Rajagopal <Ro...@wal-mart.com> wrote:
> Hi,
> 
> Need some help on a data modelling question. We're using Hector & Datastax Enterprise 2.1.
> 
> 
> I want to associate a list of items for a user. It should be sorted on the time added. And items can be updated (quantity of the item can be changed), and items can be deleted.
> I can model it like this so that its denormalized and I get all my information in one go from one row, sorted by time added. I can use composite columns.
> 
> Row key: User Id
> Column Name: TimeUUID:item ID: Item Name: Item Description: Item Price: Item Qty
> Column Value : Null
> 
> Now, how do I handle manipulations
> 
>  1.  Add new item :Easy , just a new column
>  2.  Add exiting item or modify qty: I want to get to the correct column to update . Can I search by second column in the composite column (equals condition) & update the column name itself to reflect new TimeUUID and qty?  Or would it be better to just add it as a new column and always use the latest column for an item in the application code and delete duplicates in the background.
>  3.  Delete item: Can I search by second column in the composite column to find the correct column to delete?
> 
> I was trying to find hector examples where we search for second column in a composite column, but I couldn't find any good one. Im not sure if its possible.…if you have any do have any example please share.
> 
> Regards,
> Roshni
> 
> 
> This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***
> 


Re: Data Modelling Suggestions

Posted by Guillermo Winkler <gw...@inconcertcc.com>.
I think you need another CF as index.

user_itemid -> timestamped column_name

Otherwise you can't guess what's the timestamp to use in the column name.

Anyway I would prefer storing the item-ids as column names in the main
column family and having a second CF for the order-by-date query only with
the pair timestamp_itemid. That way you can add later other query
strategies without messing with how you store the item information.

Maybe you can solve it with a secondary index by timestamp too.

Guille


On Thu, Aug 23, 2012 at 7:26 AM, Roshni Rajagopal <
Roshni.Rajagopal@wal-mart.com> wrote:

> Hi,
>
> Need some help on a data modelling question. We're using Hector & Datastax
> Enterprise 2.1.
>
>
> I want to associate a list of items for a user. It should be sorted on the
> time added. And items can be updated (quantity of the item can be changed),
> and items can be deleted.
> I can model it like this so that its denormalized and I get all my
> information in one go from one row, sorted by time added. I can use
> composite columns.
>
> Row key: User Id
> Column Name: TimeUUID:item ID: Item Name: Item Description: Item Price:
> Item Qty
> Column Value : Null
>
> Now, how do I handle manipulations
>
>  1.  Add new item :Easy , just a new column
>  2.  Add exiting item or modify qty: I want to get to the correct column
> to update . Can I search by second column in the composite column (equals
> condition) & update the column name itself to reflect new TimeUUID and qty?
>  Or would it be better to just add it as a new column and always use the
> latest column for an item in the application code and delete duplicates in
> the background.
>  3.  Delete item: Can I search by second column in the composite column to
> find the correct column to delete?
>
> I was trying to find hector examples where we search for second column in
> a composite column, but I couldn't find any good one. Im not sure if its
> possible.…if you have any do have any example please share.
>
> Regards,
> Roshni
>
>
> This email and any files transmitted with it are confidential and intended
> solely for the individual or entity to whom they are addressed. If you have
> received this email in error destroy it immediately. *** Walmart
> Confidential ***
>