You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Kanwar Sangha <ka...@mavenir.com> on 2013/02/06 02:39:42 UTC

DataModel Question

Hi -  We are designing a Cassandra based storage for the following use cases-


*        Store SMS messages

*        Store MMS messages

*        Store Chat history

What would be the ideal was to design the data model for this kind of application ? I am thinking on these lines ..

Row-Key :  Composite key [ PhoneNum : Day]


*        Example:   19876543456:05022013

Dynamic Column Families


*        Composite column key for SMS [SMS:MessageId:TimeUUID]

*        Composite column key for MMS [MMS:MessageId:TimeUUID]

*        Composite column key for user I am chatting with [UserId:198765432345] - This can have multiple values since each chat conv can have many messages. Should this be a super column ?


19866666666:05022013

SMS:xxxx:ttttttt

SMS:xxx12:ttttttt

MMS:xxxx:ttttttt

XXXX:1933333333

19877777777:05022013









19878888888:05022013











Thanks,
Kanwar



Re: DataModel Question

Posted by aaron morton <aa...@thelastpickle.com>.
> Go day / phone instead of phone / day this way you won't have a rk growing forever .
Not sure I understand. 

+1 for month partition.

> When I go offline and come online again, I need to retrieve all pending messages from all my conversations.
You need to have some sort of token that includes the last time stamp seen by the client. Then make as many queries as necessary to get the missing data. 

> > I guess this makes the data model span across many CFs ?
Yes. 
Sorry I have not considered conversations. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/02/2013, at 3:04 AM, Edward Capriolo <ed...@gmail.com> wrote:

> Go day / phone instead of phone / day this way you won't have a rk growing forever .
> 
> A comprise would be month / phone as the row key and then use the date time as the first part of a composite column. 
> 
> On Thursday, February 7, 2013, Kanwar Sangha <ka...@mavenir.com> wrote:
> > Thanks Aaron !
> >
> >  
> >
> > My use case is modeled like “skype” which stores IM + SMS + MMS in one conversation.
> >
> >  
> >
> > I need to have the following functionality –
> >
> >  
> >
> > ·        When I go offline and come online again, I need to retrieve all pending messages from all my conversations.
> >
> > ·        I should be able to select a contact and view the ‘history’ of the messages (last 7 days, last 14 days, last 21 days…)
> >
> > ·        If I log in to a different device, I should be able to synch at least a “few days” of messages.
> >
> > ·        One conversation can have multiple participants.
> >
> > ·        Support full synch or delta synch based on number of messages/history.
> >
> >  
> >
> > I guess this makes the data model span across many CFs ?
> >
> >  
> >
> >  
> >
> >  
> >
> >  
> >
> > From: aaron morton [mailto:aaron@thelastpickle.com]
> > Sent: 06 February 2013 22:20
> > To: user@cassandra.apache.org
> > Subject: Re: DataModel Question
> >
> >  
> >
> > 2)      DynamicComposites : I read somewhere that they are not recommended ?
> >
> > You probably wont need them. 
> >
> >  
> >
> > Your current model will not sort message by the time they arrive in a day. The sort order will be based on Message type and the message ID. 
> >
> >  
> >
> > I'm assuming you want to order messages, so put the time uuid at the start of the composite columns. If you often want to get the most recent messages use a reverse comparator. 
> >
> >  
> >
> > You could probably also have wider rows if you want to, not sure how many messages kids send a day but you may get by with weekly partitions. 
> >
> >  
> >
> > The CLI model could be:
> >
> > row_key: <phone_number : day>
> >
> > column: <time_uuid : message_id : message_type> 
> >
> >  
> >
> > You could also pack extra data used JSON, ProtoBuffers etc and store more that just the message in the column value. 
> >
> >  
> >
> > If you use using CQL 3 consider this:
> >
> >  
> >
> > create table messages (
> >
> >             phone_number                        text, 
> >
> >             day                                                      timestamp, 
> >
> >             message_sequence     timeuuid, # your timestamp
> >
> >             message_id                             integer, 
> >
> >             message_type                         text, 
> >
> >             message_body                        text
> >
> > ) with PRIMARY KEY ( (phone_number, day), message_sequence, message_id)
> >
> >  
> >
> > (phone_number, day) is the partition key, same the thrift row key. 
> >
> >  
> >
> >  message_sequence, message_id is the grouping columns, all instances will be grouped / ordered by these columns. 
> >
> >  
> >
> > Hope that helps. 
> >
> >  
> >
> >  
> >
> >  
> >
> > -----------------
> >
> > Aaron Morton
> >
> > Freelance Cassandra Developer
> >
> > New Zealand
> >
> >  
> >
> > @aaronmorton
> >
> > http://www.thelastpickle.com


Re: DataModel Question

Posted by Edward Capriolo <ed...@gmail.com>.
Go day / phone instead of phone / day this way you won't have a rk growing
forever .

A comprise would be month / phone as the row key and then use the date time
as the first part of a composite column.

On Thursday, February 7, 2013, Kanwar Sangha <ka...@mavenir.com> wrote:
> Thanks Aaron !
>
>
>
> My use case is modeled like “skype” which stores IM + SMS + MMS in one
conversation.
>
>
>
> I need to have the following functionality –
>
>
>
> ·        When I go offline and come online again, I need to retrieve all
pending messages from all my conversations.
>
> ·        I should be able to select a contact and view the ‘history’ of
the messages (last 7 days, last 14 days, last 21 days…)
>
> ·        If I log in to a different device, I should be able to synch at
least a “few days” of messages.
>
> ·        One conversation can have multiple participants.
>
> ·        Support full synch or delta synch based on number of
messages/history.
>
>
>
> I guess this makes the data model span across many CFs ?
>
>
>
>
>
>
>
>
>
> From: aaron morton [mailto:aaron@thelastpickle.com]
> Sent: 06 February 2013 22:20
> To: user@cassandra.apache.org
> Subject: Re: DataModel Question
>
>
>
> 2)      DynamicComposites : I read somewhere that they are not
recommended ?
>
> You probably wont need them.
>
>
>
> Your current model will not sort message by the time they arrive in a
day. The sort order will be based on Message type and the message ID.
>
>
>
> I'm assuming you want to order messages, so put the time uuid at the
start of the composite columns. If you often want to get the most recent
messages use a reverse comparator.
>
>
>
> You could probably also have wider rows if you want to, not sure how many
messages kids send a day but you may get by with weekly partitions.
>
>
>
> The CLI model could be:
>
> row_key: <phone_number : day>
>
> column: <time_uuid : message_id : message_type>
>
>
>
> You could also pack extra data used JSON, ProtoBuffers etc and store more
that just the message in the column value.
>
>
>
> If you use using CQL 3 consider this:
>
>
>
> create table messages (
>
>             phone_number                        text,
>
>             day
timestamp,
>
>             message_sequence     timeuuid, # your timestamp
>
>             message_id                             integer,
>
>             message_type                         text,
>
>             message_body                        text
>
> ) with PRIMARY KEY ( (phone_number, day), message_sequence, message_id)
>
>
>
> (phone_number, day) is the partition key, same the thrift row key.
>
>
>
>  message_sequence, message_id is the grouping columns, all instances will
be grouped / ordered by these columns.
>
>
>
> Hope that helps.
>
>
>
>
>
>
>
> -----------------
>
> Aaron Morton
>
> Freelance Cassandra Developer
>
> New Zealand
>
>
>
> @aaronmorton
>
> http://www.thelastpickle.com

RE: DataModel Question

Posted by Kanwar Sangha <ka...@mavenir.com>.
Thanks Aaron !

My use case is modeled like "skype" which stores IM + SMS + MMS in one conversation.

I need to have the following functionality -


*        When I go offline and come online again, I need to retrieve all pending messages from all my conversations.

*        I should be able to select a contact and view the 'history' of the messages (last 7 days, last 14 days, last 21 days...)

*        If I log in to a different device, I should be able to synch at least a "few days" of messages.

*        One conversation can have multiple participants.

*        Support full synch or delta synch based on number of messages/history.

I guess this makes the data model span across many CFs ?




From: aaron morton [mailto:aaron@thelastpickle.com]
Sent: 06 February 2013 22:20
To: user@cassandra.apache.org
Subject: Re: DataModel Question

2)      DynamicComposites : I read somewhere that they are not recommended ?
You probably wont need them.

Your current model will not sort message by the time they arrive in a day. The sort order will be based on Message type and the message ID.

I'm assuming you want to order messages, so put the time uuid at the start of the composite columns. If you often want to get the most recent messages use a reverse comparator.

You could probably also have wider rows if you want to, not sure how many messages kids send a day but you may get by with weekly partitions.

The CLI model could be:
row_key: <phone_number : day>
column: <time_uuid : message_id : message_type>

You could also pack extra data used JSON, ProtoBuffers etc and store more that just the message in the column value.

If you use using CQL 3 consider this:

create table messages (
            phone_number                        text,
            day                                                      timestamp,
            message_sequence     timeuuid, # your timestamp
            message_id                             integer,
            message_type                         text,
            message_body                        text
) with PRIMARY KEY ( (phone_number, day), message_sequence, message_id)

(phone_number, day) is the partition key, same the thrift row key.

 message_sequence, message_id is the grouping columns, all instances will be grouped / ordered by these columns.

Hope that helps.



-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 7/02/2013, at 1:47 AM, Kanwar Sangha <ka...@mavenir.com>> wrote:


1)      Version is 1.2
2)      DynamicComposites : I read somewhere that they are not recommended ?
3)      Good point. I need to think about that one.



From: Tamar Fraenkel [mailto:tamar@tok-media.com<http://tok-media.com>]
Sent: 06 February 2013 00:50
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: DataModel Question

Hi!
I have couple of questions regarding your model:
 1. What Cassandra version are you using? I am still working with 1.0 and this seems to make sense, but 1.2 gives you much more power I think.
 2. Maybe I don't understand your model, but I think you need  DynamicComposite columns, as user columns are different in number of components and maybe type.
 3. How do you associate between the SMS or MMS and the user you are chating with. Is it done by a separate CF?
Thanks,
Tamar


Tamar Fraenkel
Senior Software Engineer, TOK Media
<image001.png>

tamar@tok-media.com<ma...@tok-media.com>
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956



On Wed, Feb 6, 2013 at 8:23 AM, Vivek Mishra <mi...@gmail.com>> wrote:
Avoid super columns. If you need Sorted, wide rows then go for Composite columns.

-Vivek

On Wed, Feb 6, 2013 at 7:09 AM, Kanwar Sangha <ka...@mavenir.com>> wrote:
Hi -  We are designing a Cassandra based storage for the following use cases-


*        Store SMS messages

*        Store MMS messages

*        Store Chat history

What would be the ideal was to design the data model for this kind of application ? I am thinking on these lines ..

Row-Key :  Composite key [ PhoneNum : Day]


*        Example:   19876543456:05022013

Dynamic Column Families


*        Composite column key for SMS [SMS:MessageId:TimeUUID]

*        Composite column key for MMS [MMS:MessageId:TimeUUID]

*        Composite column key for user I am chatting with [UserId:198765432345] - This can have multiple values since each chat conv can have many messages. Should this be a super column ?


19866666666:05022013

SMS:xxxx:ttttttt

SMS:xxx12:ttttttt

MMS:xxxx:ttttttt

XXXX:1933333333

19877777777:05022013









19878888888:05022013











Thanks,
Kanwar





Re: DataModel Question

Posted by aaron morton <aa...@thelastpickle.com>.
> 2)      DynamicComposites : I read somewhere that they are not recommended ?
You probably wont need them. 

Your current model will not sort message by the time they arrive in a day. The sort order will be based on Message type and the message ID. 

I'm assuming you want to order messages, so put the time uuid at the start of the composite columns. If you often want to get the most recent messages use a reverse comparator. 

You could probably also have wider rows if you want to, not sure how many messages kids send a day but you may get by with weekly partitions. 

The CLI model could be:
row_key: <phone_number : day>
column: <time_uuid : message_id : message_type> 

You could also pack extra data used JSON, ProtoBuffers etc and store more that just the message in the column value. 

If you use using CQL 3 consider this:

create table messages (
	phone_number 		text, 
	day 					timestamp, 
	message_sequence 	timeuuid, # your timestamp
	message_id 			integer, 
	message_type 		text, 
	message_body		text
) with PRIMARY KEY ( (phone_number, day), message_sequence, message_id)

(phone_number, day) is the partition key, same the thrift row key. 

 message_sequence, message_id is the grouping columns, all instances will be grouped / ordered by these columns. 

Hope that helps. 



-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 7/02/2013, at 1:47 AM, Kanwar Sangha <ka...@mavenir.com> wrote:

> 1)      Version is 1.2
> 2)      DynamicComposites : I read somewhere that they are not recommended ?
> 3)      Good point. I need to think about that one.
>  
>  
>  
> From: Tamar Fraenkel [mailto:tamar@tok-media.com] 
> Sent: 06 February 2013 00:50
> To: user@cassandra.apache.org
> Subject: Re: DataModel Question
>  
> Hi!
> I have couple of questions regarding your model:
> 
>  1. What Cassandra version are you using? I am still working with 1.0 and this seems to make sense, but 1.2 gives you much more power I think.
>  2. Maybe I don't understand your model, but I think you need  DynamicComposite columns, as user columns are different in number of components and maybe type.
>  3. How do you associate between the SMS or MMS and the user you are chating with. Is it done by a separate CF?
> 
> Thanks,
> Tamar
>  
> 
> Tamar Fraenkel 
> Senior Software Engineer, TOK Media 
> 
> <image001.png>
> 
> tamar@tok-media.com
> Tel:   +972 2 6409736 
> Mob:  +972 54 8356490 
> Fax:   +972 2 5612956 
>  
>  
>  
> 
> On Wed, Feb 6, 2013 at 8:23 AM, Vivek Mishra <mi...@gmail.com> wrote:
> Avoid super columns. If you need Sorted, wide rows then go for Composite columns. 
> 
> -Vivek
>  
> 
> On Wed, Feb 6, 2013 at 7:09 AM, Kanwar Sangha <ka...@mavenir.com> wrote:
> Hi –  We are designing a Cassandra based storage for the following use cases-
>  
> ·        Store SMS messages
> 
> ·        Store MMS messages
> 
> ·        Store Chat history
> 
>  
> What would be the ideal was to design the data model for this kind of application ? I am thinking on these lines ..
>  
> Row-Key :  Composite key [ PhoneNum : Day]
>  
> ·        Example:   19876543456:05022013
> 
>  
> Dynamic Column Families
>  
> ·        Composite column key for SMS [SMS:MessageId:TimeUUID]
> 
> ·        Composite column key for MMS [MMS:MessageId:TimeUUID]
> 
> ·        Composite column key for user I am chatting with [UserId:198765432345] – This can have multiple values since each chat conv can have many messages. Should this be a super column ?
> 
>  
>  
> 19866666666:05022013
> SMS:xxxx:ttttttt
> SMS:xxx12:ttttttt
> MMS:xxxx:ttttttt
> XXXX:1933333333
> 19877777777:05022013
>  
>  
>  
>  
> 19878888888:05022013
>  
>  
>  
>  
>  
>  
> Thanks,
> Kanwar
>  
> 
>  


RE: DataModel Question

Posted by Kanwar Sangha <ka...@mavenir.com>.
1)      Version is 1.2

2)      DynamicComposites : I read somewhere that they are not recommended ?

3)      Good point. I need to think about that one.



From: Tamar Fraenkel [mailto:tamar@tok-media.com]
Sent: 06 February 2013 00:50
To: user@cassandra.apache.org
Subject: Re: DataModel Question

Hi!
I have couple of questions regarding your model:
 1. What Cassandra version are you using? I am still working with 1.0 and this seems to make sense, but 1.2 gives you much more power I think.
 2. Maybe I don't understand your model, but I think you need  DynamicComposite columns, as user columns are different in number of components and maybe type.
 3. How do you associate between the SMS or MMS and the user you are chating with. Is it done by a separate CF?
Thanks,
Tamar


Tamar Fraenkel
Senior Software Engineer, TOK Media
[Inline image 1]

tamar@tok-media.com<ma...@tok-media.com>
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956



On Wed, Feb 6, 2013 at 8:23 AM, Vivek Mishra <mi...@gmail.com>> wrote:
Avoid super columns. If you need Sorted, wide rows then go for Composite columns.

-Vivek

On Wed, Feb 6, 2013 at 7:09 AM, Kanwar Sangha <ka...@mavenir.com>> wrote:
Hi -  We are designing a Cassandra based storage for the following use cases-


*        Store SMS messages

*        Store MMS messages

*        Store Chat history

What would be the ideal was to design the data model for this kind of application ? I am thinking on these lines ..

Row-Key :  Composite key [ PhoneNum : Day]


*        Example:   19876543456:05022013

Dynamic Column Families


*        Composite column key for SMS [SMS:MessageId:TimeUUID]

*        Composite column key for MMS [MMS:MessageId:TimeUUID]

*        Composite column key for user I am chatting with [UserId:198765432345] - This can have multiple values since each chat conv can have many messages. Should this be a super column ?


19866666666:05022013

SMS:xxxx:ttttttt

SMS:xxx12:ttttttt

MMS:xxxx:ttttttt

XXXX:1933333333

19877777777:05022013









19878888888:05022013











Thanks,
Kanwar





Re: DataModel Question

Posted by Tamar Fraenkel <ta...@tok-media.com>.
Hi!
I have couple of questions regarding your model:

 1. What Cassandra version are you using? I am still working with 1.0 and
this seems to make sense, but 1.2 gives you much more power I think.
 2. Maybe I don't understand your model, but I think you need
DynamicComposite columns, as user columns are different in number of
components and maybe type.
 3. How do you associate between the SMS or MMS and the user you are
chating with. Is it done by a separate CF?

Thanks,
Tamar


*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

tamar@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956




On Wed, Feb 6, 2013 at 8:23 AM, Vivek Mishra <mi...@gmail.com> wrote:

> Avoid super columns. If you need Sorted, wide rows then go for Composite
> columns.
>
> -Vivek
>
>
> On Wed, Feb 6, 2013 at 7:09 AM, Kanwar Sangha <ka...@mavenir.com> wrote:
>
>>  Hi –  We are designing a Cassandra based storage for the following use
>> cases-****
>>
>> ** **
>>
>> **·        **Store SMS messages****
>>
>> **·        **Store MMS messages****
>>
>> **·        **Store Chat history****
>>
>> ** **
>>
>> What would be the ideal was to design the data model for this kind of
>> application ? I am thinking on these lines ..****
>>
>> ** **
>>
>> Row-Key :  Composite key [ PhoneNum : Day]****
>>
>> ** **
>>
>> **·        **Example:   19876543456:05022013****
>>
>> ** **
>>
>> Dynamic Column Families****
>>
>> ** **
>>
>> **·        **Composite column key for SMS [SMS:MessageId:TimeUUID]****
>>
>> **·        **Composite column key for MMS [MMS:MessageId:TimeUUID]****
>>
>> **·        **Composite column key for user I am chatting with
>> [UserId:198765432345] – This can have multiple values since each chat conv
>> can have many messages. Should this be a super column ?****
>>
>> ** **
>>
>> ** **
>>
>> 19866666666:05022013****
>>
>> SMS:xxxx:ttttttt****
>>
>> SMS:xxx12:ttttttt****
>>
>> MMS:xxxx:ttttttt****
>>
>> XXXX:1933333333****
>>
>> 19877777777:05022013****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> 19878888888:05022013****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> Thanks,****
>>
>> Kanwar****
>>
>> ** **
>>
>
>

Re: DataModel Question

Posted by Vivek Mishra <mi...@gmail.com>.
Avoid super columns. If you need Sorted, wide rows then go for Composite
columns.

-Vivek

On Wed, Feb 6, 2013 at 7:09 AM, Kanwar Sangha <ka...@mavenir.com> wrote:

>  Hi –  We are designing a Cassandra based storage for the following use
> cases-****
>
> ** **
>
> **·        **Store SMS messages****
>
> **·        **Store MMS messages****
>
> **·        **Store Chat history****
>
> ** **
>
> What would be the ideal was to design the data model for this kind of
> application ? I am thinking on these lines ..****
>
> ** **
>
> Row-Key :  Composite key [ PhoneNum : Day]****
>
> ** **
>
> **·        **Example:   19876543456:05022013****
>
> ** **
>
> Dynamic Column Families****
>
> ** **
>
> **·        **Composite column key for SMS [SMS:MessageId:TimeUUID]****
>
> **·        **Composite column key for MMS [MMS:MessageId:TimeUUID]****
>
> **·        **Composite column key for user I am chatting with
> [UserId:198765432345] – This can have multiple values since each chat conv
> can have many messages. Should this be a super column ?****
>
> ** **
>
> ** **
>
> 19866666666:05022013****
>
> SMS:xxxx:ttttttt****
>
> SMS:xxx12:ttttttt****
>
> MMS:xxxx:ttttttt****
>
> XXXX:1933333333****
>
> 19877777777:05022013****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> 19878888888:05022013****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> Thanks,****
>
> Kanwar****
>
> ** **
>

RE: DataModel Question

Posted by Rishabh Agrawal <ri...@impetus.co.in>.
Hello,

Composite keys are always good  and model looks clean to me. Run pilot with around 10 GB or more data and compare it with RDBMS and make changes accordingly.

Thanks and Regards
Rishabh Agrawal

From: Kanwar Sangha [mailto:kanwar@mavenir.com]
Sent: Wednesday, February 06, 2013 7:10 AM
To: user@cassandra.apache.org
Subject: DataModel Question

Hi -  We are designing a Cassandra based storage for the following use cases-


*         Store SMS messages

*         Store MMS messages

*         Store Chat history

What would be the ideal was to design the data model for this kind of application ? I am thinking on these lines ..

Row-Key :  Composite key [ PhoneNum : Day]


*         Example:   19876543456:05022013

Dynamic Column Families


*         Composite column key for SMS [SMS:MessageId:TimeUUID]

*         Composite column key for MMS [MMS:MessageId:TimeUUID]

*         Composite column key for user I am chatting with [UserId:198765432345] - This can have multiple values since each chat conv can have many messages. Should this be a super column ?


19866666666:05022013

SMS:xxxx:ttttttt

SMS:xxx12:ttttttt

MMS:xxxx:ttttttt

XXXX:1933333333

19877777777:05022013









19878888888:05022013











Thanks,
Kanwar



________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.