You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Naresh Yadav <ny...@gmail.com> on 2014/01/27 13:24:09 UTC

Help me on Cassandra Data Modelling

Hi all,

Urgently need help on modelling this usecase on Cassandra.

I have concept of tags and tagcombinations.
For example U.S.A and Pen are two tags AND if they come together in some
definition then register a tagcombination(U.S.A-Pen) for that..

*tags *(U.S.A, Pen, Pencil, India, Shampoo)
*tagcombinations*(U.S.A-Pen, India-pencil, U.S.A-Pencil, India-Pen,
India-Pen-Shampoo)

- millions of tags
- billions of tagcombinations
- one tagcombination generally have 2-8 tags....
- Every day we get lakhs of new tagcombinations to write

Query need to support :
one tag or set of tags appears in how many tagcombinationids ????
If i query for Pen,India then it should return two tagcombinaions
(India-Pen, India-Pen-Shampoo))..Query will be fired by application in
realtime.

I am new to cassandra and need to deliver fast so please give your inputs.

Thanks
Naresh

Re: Help me on Cassandra Data Modelling

Posted by Thunder Stumpges <th...@gmail.com>.

Hey Naresh,

Unfortunately I don't have any further advice. I keep feeling like you're
looking at a search problem instead of a lookup problem. Perhaps Cassandra
is not the right tool for your need in this case. Perhaps something with a
full-text index type feature would help.

Or perhaps someone more experienced than I could come up with another
design.

Good luck,
Thunder



On Tue, Jan 28, 2014 at 9:07 AM, Naresh Yadav <ny...@gmail.com> wrote:

> please inputs on last email if any..
>
>
>
> On Tue, Jan 28, 2014 at 7:18 AM, Naresh Yadav <ny...@gmail.com>wrote:
>
>> yes thunder you are right, i had simplified that by moving *tags *search(partial/exact)
>> in separate column family tagcombination which will act as index for all
>> search based on tags and in my my original metricresult table will store
>> tagcombinationid and time in columns otherwise it was getting complicated &
>> was not getting good results.
>>
>> Yes i agree with you on duplicating the storage with tagcombination
>> columnfamily...if i have billion of real tagcombinations with 8 tags in
>> each then i am duplicating 2^8 combinations for each one to support partial
>> match for that tagcombination which will make this very heavy table...with
>> individual keys i will not able to support search with set of tags
>> ......please suggest alternative solution..
>>
>> Also one of my colleague suggested a total different approach to it but i
>> am  not able to map that on cassandra.
>> Acc to him we store all possible tags in columns and for each combination
>> we just mark 0s, 1s whichever tags
>> appear in that combination...So data(TC1 as India, Pencil AND TC2 as
>> India, Pen) will be like :
>>
>>                               India        Pencil           Pen
>> TC1                          1             1                  0
>> TC2                          1              0                  1
>>
>> I am not able to design optimal column family for this in cassandra..if i
>> design as is then for search of India, Pen then i will select India, Pen
>> columns but that will touch each and every row because i am not able to
>> apply criteria of matching 1s only...i believe there can be better design
>> of this to make use of this good thought.
>>
>> Please help me on this..
>>
>> Thanks
>> Naresh
>>
>>
>>
>> On Mon, Jan 27, 2014 at 11:30 PM, Thunder Stumpges <
>> thunder.stumpges@gmail.com> wrote:
>>
>>> Hey Naresh,
>>>
>>> You asked a similar question a week or two ago. It looks like you have
>>> simplified your needs quite a bit. Were you able to adjust your
>>> requirements or separate the issue? You had a complicated time dimension
>>> before, as well as a single "query" for multiple AND cases on tags.
>>>
>>> ....
>>>> c)Give data for Metric=Sales AND Tag=U.S.A
>>>>        O/P : 5 rows
>>>> d)Give data for Metric=Sales AND Period=Jan-10 AND Tag=U.S.A AND Tag=Pen
>>>>        O/P :1 row"
>>>
>>>
>>>
>>> I agree with Jonathan on the model for this simplified use case. However
>>> looking at how you are storing each partial tag combination as well as
>>> individual tags in the partitioning key, you will be severely duplicating
>>> your storage. You might want to just store individual keys in the
>>> partitioning key.
>>>
>>> Good luck,
>>> Thunder
>>>
>>>
>>>
>>>
>>> On Mon, Jan 27, 2014 at 8:48 AM, Naresh Yadav <ny...@gmail.com>wrote:
>>>
>>>> Thanks Jonathan for guiding me..i just want to confirm my understanding
>>>> :
>>>>
>>>> create columnfamily tagcombinations {
>>>>      partialtags text,
>>>>      tagcombinationid text,
>>>>      tagcombinationtags set<tags>
>>>> Primary Key((partialtags), tagcombinationid)
>>>> }
>>>> IF i need to store TWO tagcombination TC1 as India, Pencil AND TC2 as
>>>> India, Pen then data will stored as :
>>>>
>>>>                    TC1              TC2
>>>> India          India,Pencil   India,pen
>>>>
>>>>                    TC1
>>>> Pencil      India,Pencil
>>>>
>>>>                    TC2
>>>> Pen       India,Pen
>>>>
>>>>                         TC1
>>>> India,Pencil    India,Pencil
>>>>
>>>>                           TC2
>>>> India,Pen        India, Pen
>>>>
>>>>
>>>> I hope i had understood the thought properly please confirm on design.
>>>>
>>>> Thanks
>>>> Naresh
>>>>
>>>>
>>>> On Mon, Jan 27, 2014 at 7:05 PM, Jonathan Lacefield <
>>>> jlacefield@datastax.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>>   The trick with this data model is to get to partition based, and/or
>>>>> cluster based access pattern so C* returns results quickly.  In C* you want
>>>>> to model your tables based on your query access patterns and remember that
>>>>> writes are cheap and fast in C*.
>>>>>
>>>>>   So, try something like the following:
>>>>>
>>>>>   1 Table with a Partition Key = Tag String
>>>>>          Tag String = "Tag" or "set of Tags"
>>>>>          Cluster based on tag combination (probably desc order)
>>>>>          This will allow you to select any combination that includes
>>>>> Tag or "set of Tags"
>>>>>          This will duplicate data as you will store 1 tag combination
>>>>> in every Tag partition, i.e. if a tag combination has 2 parts, then you
>>>>> will have 2 rows
>>>>>
>>>>>   Hope this helps.
>>>>>
>>>>> Jonathan Lacefield
>>>>> Solutions Architect, DataStax
>>>>> (404) 822 3487
>>>>>  <http://www.linkedin.com/in/jlacefield>
>>>>>
>>>>>
>>>>>
>>>>> <http://www.datastax.com/what-we-offer/products-services/training/virtual-training>
>>>>>
>>>>>
>>>>> On Mon, Jan 27, 2014 at 7:24 AM, Naresh Yadav <ny...@gmail.com>wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Urgently need help on modelling this usecase on Cassandra.
>>>>>>
>>>>>> I have concept of tags and tagcombinations.
>>>>>> For example U.S.A and Pen are two tags AND if they come together in
>>>>>> some definition then register a tagcombination(U.S.A-Pen) for that..
>>>>>>
>>>>>> *tags *(U.S.A, Pen, Pencil, India, Shampoo)
>>>>>> *tagcombinations*(U.S.A-Pen, India-pencil, U.S.A-Pencil, India-Pen,
>>>>>> India-Pen-Shampoo)
>>>>>>
>>>>>> - millions of tags
>>>>>> - billions of tagcombinations
>>>>>> - one tagcombination generally have 2-8 tags....
>>>>>> - Every day we get lakhs of new tagcombinations to write
>>>>>>
>>>>>> Query need to support :
>>>>>> one tag or set of tags appears in how many tagcombinationids ????
>>>>>> If i query for Pen,India then it should return two tagcombinaions
>>>>>> (India-Pen, India-Pen-Shampoo))..Query will be fired by application in
>>>>>> realtime.
>>>>>>
>>>>>> I am new to cassandra and need to deliver fast so please give your
>>>>>> inputs.
>>>>>>
>>>>>> Thanks
>>>>>> Naresh
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

Re: Help me on Cassandra Data Modelling

Posted by Naresh Yadav <ny...@gmail.com>.

please inputs on last email if any..


On Tue, Jan 28, 2014 at 7:18 AM, Naresh Yadav <ny...@gmail.com> wrote:

> yes thunder you are right, i had simplified that by moving *tags *search(partial/exact)
> in separate column family tagcombination which will act as index for all
> search based on tags and in my my original metricresult table will store
> tagcombinationid and time in columns otherwise it was getting complicated &
> was not getting good results.
>
> Yes i agree with you on duplicating the storage with tagcombination
> columnfamily...if i have billion of real tagcombinations with 8 tags in
> each then i am duplicating 2^8 combinations for each one to support partial
> match for that tagcombination which will make this very heavy table...with
> individual keys i will not able to support search with set of tags
> ......please suggest alternative solution..
>
> Also one of my colleague suggested a total different approach to it but i
> am  not able to map that on cassandra.
> Acc to him we store all possible tags in columns and for each combination
> we just mark 0s, 1s whichever tags
> appear in that combination...So data(TC1 as India, Pencil AND TC2 as
> India, Pen) will be like :
>
>                               India        Pencil           Pen
> TC1                          1             1                  0
> TC2                          1              0                  1
>
> I am not able to design optimal column family for this in cassandra..if i
> design as is then for search of India, Pen then i will select India, Pen
> columns but that will touch each and every row because i am not able to
> apply criteria of matching 1s only...i believe there can be better design
> of this to make use of this good thought.
>
> Please help me on this..
>
> Thanks
> Naresh
>
>
>
> On Mon, Jan 27, 2014 at 11:30 PM, Thunder Stumpges <
> thunder.stumpges@gmail.com> wrote:
>
>> Hey Naresh,
>>
>> You asked a similar question a week or two ago. It looks like you have
>> simplified your needs quite a bit. Were you able to adjust your
>> requirements or separate the issue? You had a complicated time dimension
>> before, as well as a single "query" for multiple AND cases on tags.
>>
>> ....
>>> c)Give data for Metric=Sales AND Tag=U.S.A
>>>        O/P : 5 rows
>>> d)Give data for Metric=Sales AND Period=Jan-10 AND Tag=U.S.A AND Tag=Pen
>>>        O/P :1 row"
>>
>>
>>
>> I agree with Jonathan on the model for this simplified use case. However
>> looking at how you are storing each partial tag combination as well as
>> individual tags in the partitioning key, you will be severely duplicating
>> your storage. You might want to just store individual keys in the
>> partitioning key.
>>
>> Good luck,
>> Thunder
>>
>>
>>
>>
>> On Mon, Jan 27, 2014 at 8:48 AM, Naresh Yadav <ny...@gmail.com>wrote:
>>
>>> Thanks Jonathan for guiding me..i just want to confirm my understanding :
>>>
>>> create columnfamily tagcombinations {
>>>      partialtags text,
>>>      tagcombinationid text,
>>>      tagcombinationtags set<tags>
>>> Primary Key((partialtags), tagcombinationid)
>>> }
>>> IF i need to store TWO tagcombination TC1 as India, Pencil AND TC2 as
>>> India, Pen then data will stored as :
>>>
>>>                    TC1              TC2
>>> India          India,Pencil   India,pen
>>>
>>>                    TC1
>>> Pencil      India,Pencil
>>>
>>>                    TC2
>>> Pen       India,Pen
>>>
>>>                         TC1
>>> India,Pencil    India,Pencil
>>>
>>>                           TC2
>>> India,Pen        India, Pen
>>>
>>>
>>> I hope i had understood the thought properly please confirm on design.
>>>
>>> Thanks
>>> Naresh
>>>
>>>
>>> On Mon, Jan 27, 2014 at 7:05 PM, Jonathan Lacefield <
>>> jlacefield@datastax.com> wrote:
>>>
>>>> Hello,
>>>>
>>>>   The trick with this data model is to get to partition based, and/or
>>>> cluster based access pattern so C* returns results quickly.  In C* you want
>>>> to model your tables based on your query access patterns and remember that
>>>> writes are cheap and fast in C*.
>>>>
>>>>   So, try something like the following:
>>>>
>>>>   1 Table with a Partition Key = Tag String
>>>>          Tag String = "Tag" or "set of Tags"
>>>>          Cluster based on tag combination (probably desc order)
>>>>          This will allow you to select any combination that includes
>>>> Tag or "set of Tags"
>>>>          This will duplicate data as you will store 1 tag combination
>>>> in every Tag partition, i.e. if a tag combination has 2 parts, then you
>>>> will have 2 rows
>>>>
>>>>   Hope this helps.
>>>>
>>>> Jonathan Lacefield
>>>> Solutions Architect, DataStax
>>>> (404) 822 3487
>>>>  <http://www.linkedin.com/in/jlacefield>
>>>>
>>>>
>>>>
>>>> <http://www.datastax.com/what-we-offer/products-services/training/virtual-training>
>>>>
>>>>
>>>> On Mon, Jan 27, 2014 at 7:24 AM, Naresh Yadav <ny...@gmail.com>wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Urgently need help on modelling this usecase on Cassandra.
>>>>>
>>>>> I have concept of tags and tagcombinations.
>>>>> For example U.S.A and Pen are two tags AND if they come together in
>>>>> some definition then register a tagcombination(U.S.A-Pen) for that..
>>>>>
>>>>> *tags *(U.S.A, Pen, Pencil, India, Shampoo)
>>>>> *tagcombinations*(U.S.A-Pen, India-pencil, U.S.A-Pencil, India-Pen,
>>>>> India-Pen-Shampoo)
>>>>>
>>>>> - millions of tags
>>>>> - billions of tagcombinations
>>>>> - one tagcombination generally have 2-8 tags....
>>>>> - Every day we get lakhs of new tagcombinations to write
>>>>>
>>>>> Query need to support :
>>>>> one tag or set of tags appears in how many tagcombinationids ????
>>>>> If i query for Pen,India then it should return two tagcombinaions
>>>>> (India-Pen, India-Pen-Shampoo))..Query will be fired by application in
>>>>> realtime.
>>>>>
>>>>> I am new to cassandra and need to deliver fast so please give your
>>>>> inputs.
>>>>>
>>>>> Thanks
>>>>> Naresh
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>
>

Re: Help me on Cassandra Data Modelling

Posted by Naresh Yadav <ny...@gmail.com>.

yes thunder you are right, i had simplified that by moving *tags
*search(partial/exact)
in separate column family tagcombination which will act as index for all
search based on tags and in my my original metricresult table will store
tagcombinationid and time in columns otherwise it was getting complicated &
was not getting good results.

Yes i agree with you on duplicating the storage with tagcombination
columnfamily...if i have billion of real tagcombinations with 8 tags in
each then i am duplicating 2^8 combinations for each one to support partial
match for that tagcombination which will make this very heavy table...with
individual keys i will not able to support search with set of tags
......please suggest alternative solution..

Also one of my colleague suggested a total different approach to it but i
am  not able to map that on cassandra.
Acc to him we store all possible tags in columns and for each combination
we just mark 0s, 1s whichever tags
appear in that combination...So data(TC1 as India, Pencil AND TC2 as India,
Pen) will be like :

                              India        Pencil           Pen
TC1                          1             1                  0
TC2                          1              0                  1

I am not able to design optimal column family for this in cassandra..if i
design as is then for search of India, Pen then i will select India, Pen
columns but that will touch each and every row because i am not able to
apply criteria of matching 1s only...i believe there can be better design
of this to make use of this good thought.

Please help me on this..

Thanks
Naresh



On Mon, Jan 27, 2014 at 11:30 PM, Thunder Stumpges <
thunder.stumpges@gmail.com> wrote:

> Hey Naresh,
>
> You asked a similar question a week or two ago. It looks like you have
> simplified your needs quite a bit. Were you able to adjust your
> requirements or separate the issue? You had a complicated time dimension
> before, as well as a single "query" for multiple AND cases on tags.
>
> ....
>> c)Give data for Metric=Sales AND Tag=U.S.A
>>        O/P : 5 rows
>> d)Give data for Metric=Sales AND Period=Jan-10 AND Tag=U.S.A AND Tag=Pen
>>        O/P :1 row"
>
>
>
> I agree with Jonathan on the model for this simplified use case. However
> looking at how you are storing each partial tag combination as well as
> individual tags in the partitioning key, you will be severely duplicating
> your storage. You might want to just store individual keys in the
> partitioning key.
>
> Good luck,
> Thunder
>
>
>
>
> On Mon, Jan 27, 2014 at 8:48 AM, Naresh Yadav <ny...@gmail.com>wrote:
>
>> Thanks Jonathan for guiding me..i just want to confirm my understanding :
>>
>> create columnfamily tagcombinations {
>>      partialtags text,
>>      tagcombinationid text,
>>      tagcombinationtags set<tags>
>> Primary Key((partialtags), tagcombinationid)
>> }
>> IF i need to store TWO tagcombination TC1 as India, Pencil AND TC2 as
>> India, Pen then data will stored as :
>>
>>                    TC1              TC2
>> India          India,Pencil   India,pen
>>
>>                    TC1
>> Pencil      India,Pencil
>>
>>                    TC2
>> Pen       India,Pen
>>
>>                         TC1
>> India,Pencil    India,Pencil
>>
>>                           TC2
>> India,Pen        India, Pen
>>
>>
>> I hope i had understood the thought properly please confirm on design.
>>
>> Thanks
>> Naresh
>>
>>
>> On Mon, Jan 27, 2014 at 7:05 PM, Jonathan Lacefield <
>> jlacefield@datastax.com> wrote:
>>
>>> Hello,
>>>
>>>   The trick with this data model is to get to partition based, and/or
>>> cluster based access pattern so C* returns results quickly.  In C* you want
>>> to model your tables based on your query access patterns and remember that
>>> writes are cheap and fast in C*.
>>>
>>>   So, try something like the following:
>>>
>>>   1 Table with a Partition Key = Tag String
>>>          Tag String = "Tag" or "set of Tags"
>>>          Cluster based on tag combination (probably desc order)
>>>          This will allow you to select any combination that includes Tag
>>> or "set of Tags"
>>>          This will duplicate data as you will store 1 tag combination in
>>> every Tag partition, i.e. if a tag combination has 2 parts, then you will
>>> have 2 rows
>>>
>>>   Hope this helps.
>>>
>>> Jonathan Lacefield
>>> Solutions Architect, DataStax
>>> (404) 822 3487
>>>  <http://www.linkedin.com/in/jlacefield>
>>>
>>>
>>>
>>> <http://www.datastax.com/what-we-offer/products-services/training/virtual-training>
>>>
>>>
>>> On Mon, Jan 27, 2014 at 7:24 AM, Naresh Yadav <ny...@gmail.com>wrote:
>>>
>>>> Hi all,
>>>>
>>>> Urgently need help on modelling this usecase on Cassandra.
>>>>
>>>> I have concept of tags and tagcombinations.
>>>> For example U.S.A and Pen are two tags AND if they come together in
>>>> some definition then register a tagcombination(U.S.A-Pen) for that..
>>>>
>>>> *tags *(U.S.A, Pen, Pencil, India, Shampoo)
>>>> *tagcombinations*(U.S.A-Pen, India-pencil, U.S.A-Pencil, India-Pen,
>>>> India-Pen-Shampoo)
>>>>
>>>> - millions of tags
>>>> - billions of tagcombinations
>>>> - one tagcombination generally have 2-8 tags....
>>>> - Every day we get lakhs of new tagcombinations to write
>>>>
>>>> Query need to support :
>>>> one tag or set of tags appears in how many tagcombinationids ????
>>>> If i query for Pen,India then it should return two tagcombinaions
>>>> (India-Pen, India-Pen-Shampoo))..Query will be fired by application in
>>>> realtime.
>>>>
>>>> I am new to cassandra and need to deliver fast so please give your
>>>> inputs.
>>>>
>>>> Thanks
>>>> Naresh
>>>>
>>>>
>>>
>>
>>
>
>

Re: Help me on Cassandra Data Modelling

Posted by Thunder Stumpges <th...@gmail.com>.

Hey Naresh,

You asked a similar question a week or two ago. It looks like you have
simplified your needs quite a bit. Were you able to adjust your
requirements or separate the issue? You had a complicated time dimension
before, as well as a single "query" for multiple AND cases on tags.

....
> c)Give data for Metric=Sales AND Tag=U.S.A
>        O/P : 5 rows
> d)Give data for Metric=Sales AND Period=Jan-10 AND Tag=U.S.A AND Tag=Pen
>        O/P :1 row"



I agree with Jonathan on the model for this simplified use case. However
looking at how you are storing each partial tag combination as well as
individual tags in the partitioning key, you will be severely duplicating
your storage. You might want to just store individual keys in the
partitioning key.

Good luck,
Thunder




On Mon, Jan 27, 2014 at 8:48 AM, Naresh Yadav <ny...@gmail.com> wrote:

> Thanks Jonathan for guiding me..i just want to confirm my understanding :
>
> create columnfamily tagcombinations {
>      partialtags text,
>      tagcombinationid text,
>      tagcombinationtags set<tags>
> Primary Key((partialtags), tagcombinationid)
> }
> IF i need to store TWO tagcombination TC1 as India, Pencil AND TC2 as
> India, Pen then data will stored as :
>
>                    TC1              TC2
> India          India,Pencil   India,pen
>
>                    TC1
> Pencil      India,Pencil
>
>                    TC2
> Pen       India,Pen
>
>                         TC1
> India,Pencil    India,Pencil
>
>                           TC2
> India,Pen        India, Pen
>
>
> I hope i had understood the thought properly please confirm on design.
>
> Thanks
> Naresh
>
>
> On Mon, Jan 27, 2014 at 7:05 PM, Jonathan Lacefield <
> jlacefield@datastax.com> wrote:
>
>> Hello,
>>
>>   The trick with this data model is to get to partition based, and/or
>> cluster based access pattern so C* returns results quickly.  In C* you want
>> to model your tables based on your query access patterns and remember that
>> writes are cheap and fast in C*.
>>
>>   So, try something like the following:
>>
>>   1 Table with a Partition Key = Tag String
>>          Tag String = "Tag" or "set of Tags"
>>          Cluster based on tag combination (probably desc order)
>>          This will allow you to select any combination that includes Tag
>> or "set of Tags"
>>          This will duplicate data as you will store 1 tag combination in
>> every Tag partition, i.e. if a tag combination has 2 parts, then you will
>> have 2 rows
>>
>>   Hope this helps.
>>
>> Jonathan Lacefield
>> Solutions Architect, DataStax
>> (404) 822 3487
>> <http://www.linkedin.com/in/jlacefield>
>>
>>
>>
>> <http://www.datastax.com/what-we-offer/products-services/training/virtual-training>
>>
>>
>> On Mon, Jan 27, 2014 at 7:24 AM, Naresh Yadav <ny...@gmail.com>wrote:
>>
>>> Hi all,
>>>
>>> Urgently need help on modelling this usecase on Cassandra.
>>>
>>> I have concept of tags and tagcombinations.
>>> For example U.S.A and Pen are two tags AND if they come together in some
>>> definition then register a tagcombination(U.S.A-Pen) for that..
>>>
>>> *tags *(U.S.A, Pen, Pencil, India, Shampoo)
>>> *tagcombinations*(U.S.A-Pen, India-pencil, U.S.A-Pencil, India-Pen,
>>> India-Pen-Shampoo)
>>>
>>> - millions of tags
>>> - billions of tagcombinations
>>> - one tagcombination generally have 2-8 tags....
>>> - Every day we get lakhs of new tagcombinations to write
>>>
>>> Query need to support :
>>> one tag or set of tags appears in how many tagcombinationids ????
>>> If i query for Pen,India then it should return two tagcombinaions
>>> (India-Pen, India-Pen-Shampoo))..Query will be fired by application in
>>> realtime.
>>>
>>> I am new to cassandra and need to deliver fast so please give your
>>> inputs.
>>>
>>> Thanks
>>> Naresh
>>>
>>>
>>
>
>

Re: Help me on Cassandra Data Modelling

Posted by Naresh Yadav <ny...@gmail.com>.

Thanks Jonathan for guiding me..i just want to confirm my understanding :

create columnfamily tagcombinations {
     partialtags text,
     tagcombinationid text,
     tagcombinationtags set<tags>
Primary Key((partialtags), tagcombinationid)
}
IF i need to store TWO tagcombination TC1 as India, Pencil AND TC2 as
India, Pen then data will stored as :

                   TC1              TC2
India          India,Pencil   India,pen

                   TC1
Pencil      India,Pencil

                   TC2
Pen       India,Pen

                        TC1
India,Pencil    India,Pencil

                          TC2
India,Pen        India, Pen


I hope i had understood the thought properly please confirm on design.

Thanks
Naresh


On Mon, Jan 27, 2014 at 7:05 PM, Jonathan Lacefield <jlacefield@datastax.com
> wrote:

> Hello,
>
>   The trick with this data model is to get to partition based, and/or
> cluster based access pattern so C* returns results quickly.  In C* you want
> to model your tables based on your query access patterns and remember that
> writes are cheap and fast in C*.
>
>   So, try something like the following:
>
>   1 Table with a Partition Key = Tag String
>          Tag String = "Tag" or "set of Tags"
>          Cluster based on tag combination (probably desc order)
>          This will allow you to select any combination that includes Tag
> or "set of Tags"
>          This will duplicate data as you will store 1 tag combination in
> every Tag partition, i.e. if a tag combination has 2 parts, then you will
> have 2 rows
>
>   Hope this helps.
>
> Jonathan Lacefield
> Solutions Architect, DataStax
> (404) 822 3487
> <http://www.linkedin.com/in/jlacefield>
>
>
>
> <http://www.datastax.com/what-we-offer/products-services/training/virtual-training>
>
>
> On Mon, Jan 27, 2014 at 7:24 AM, Naresh Yadav <ny...@gmail.com>wrote:
>
>> Hi all,
>>
>> Urgently need help on modelling this usecase on Cassandra.
>>
>> I have concept of tags and tagcombinations.
>> For example U.S.A and Pen are two tags AND if they come together in some
>> definition then register a tagcombination(U.S.A-Pen) for that..
>>
>> *tags *(U.S.A, Pen, Pencil, India, Shampoo)
>> *tagcombinations*(U.S.A-Pen, India-pencil, U.S.A-Pencil, India-Pen,
>> India-Pen-Shampoo)
>>
>> - millions of tags
>> - billions of tagcombinations
>> - one tagcombination generally have 2-8 tags....
>> - Every day we get lakhs of new tagcombinations to write
>>
>> Query need to support :
>> one tag or set of tags appears in how many tagcombinationids ????
>> If i query for Pen,India then it should return two tagcombinaions
>> (India-Pen, India-Pen-Shampoo))..Query will be fired by application in
>> realtime.
>>
>> I am new to cassandra and need to deliver fast so please give your inputs.
>>
>> Thanks
>> Naresh
>>
>>
>

Re: Help me on Cassandra Data Modelling

Posted by Jonathan Lacefield <jl...@datastax.com>.

Hello,

  The trick with this data model is to get to partition based, and/or
cluster based access pattern so C* returns results quickly.  In C* you want
to model your tables based on your query access patterns and remember that
writes are cheap and fast in C*.

  So, try something like the following:

  1 Table with a Partition Key = Tag String
         Tag String = "Tag" or "set of Tags"
         Cluster based on tag combination (probably desc order)
         This will allow you to select any combination that includes Tag or
"set of Tags"
         This will duplicate data as you will store 1 tag combination in
every Tag partition, i.e. if a tag combination has 2 parts, then you will
have 2 rows

  Hope this helps.

Jonathan Lacefield
Solutions Architect, DataStax
(404) 822 3487
<http://www.linkedin.com/in/jlacefield>


<http://www.datastax.com/what-we-offer/products-services/training/virtual-training>


On Mon, Jan 27, 2014 at 7:24 AM, Naresh Yadav <ny...@gmail.com> wrote:

> Hi all,
>
> Urgently need help on modelling this usecase on Cassandra.
>
> I have concept of tags and tagcombinations.
> For example U.S.A and Pen are two tags AND if they come together in some
> definition then register a tagcombination(U.S.A-Pen) for that..
>
> *tags *(U.S.A, Pen, Pencil, India, Shampoo)
> *tagcombinations*(U.S.A-Pen, India-pencil, U.S.A-Pencil, India-Pen,
> India-Pen-Shampoo)
>
> - millions of tags
> - billions of tagcombinations
> - one tagcombination generally have 2-8 tags....
> - Every day we get lakhs of new tagcombinations to write
>
> Query need to support :
> one tag or set of tags appears in how many tagcombinationids ????
> If i query for Pen,India then it should return two tagcombinaions
> (India-Pen, India-Pen-Shampoo))..Query will be fired by application in
> realtime.
>
> I am new to cassandra and need to deliver fast so please give your inputs.
>
> Thanks
> Naresh
>
>