You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Sébastien Druon <sd...@spotuse.com> on 2010/12/09 12:48:23 UTC

N to N relationships

Hello,

For a specific case, we are thinking about representing a N to N
relationship with a NxN Matrix in Cassandra.
The relations will be only between a subset of elements, so the Matrix will
mostly contain empty elements.

We have a set of questions concerning this:
- what is the best way to represent this matrix? what would have the best
performance in reading? in writing?
  . a super column family with n column families, with n columns each
  . a column family with n columns and n lines

In the second case, we would need to extract 2 kinds of information:
- all the relations for a line: this should be no specific problem;
- all the relations for a column: in that case we would need an index for
the columns, right? and then get all the lines where the value of the column
in question is not null... is it the correct way to do?
When using indexes, say we want to add another element N+1. What impact in
terms of time would it have on the indexation job?

Thanks a lot for the answers,

Best regards,

Sébastien Druon

Re: N to N relationships

Posted by Sébastien Druon <sd...@spotuse.com>.
Thanks a lot for the support!

On Thu, 2010-12-09 at 19:50 -0600, Nick Bailey wrote:
> I would also recommend two column families. Storing the key as NxN
> would require you to hit multiple machines to query for an entire row
> or column with RandomPartitioner. Even with OPP you would need to pick
> row or columns to order by and the other would require hitting
> multiple machines.  Two column families avoids this and avoids any
> problems with choosing OPP.
> 
> On Thu, Dec 9, 2010 at 2:26 PM, Aaron Morton <aa...@thelastpickle.com>
> wrote:
>         Am assuming you have one matrix and you know the dimensions.
>         Also as you say the most important queries are to get an
>         entire column or an entire row.
>         
>         
>         I would consider using a standard CF for the Columns and one
>         for the Rows.  The key for each would be the col / row number,
>         each cassandra column name would be the id of the other
>         dimension and the value whatever you want.  
>         
>         
>         - when storing the data update both the Column and Row CF
>         - reading a whole row/col would be simply reading from the
>         appropriate CF.
>         - reading an intersection is a get_slice to either col or row
>         CF using the column_names field to identify the other
>         dimension. 
>         
>         
>         You would not need secondary indexes to serve these queries. 
>         
>         
>         Hope that helps.
>         Aaron
>         
>         
>         
>         On 10 Dec, 2010,at 07:02 AM, Sébastien Druon
>         <sd...@spotuse.com> wrote:
>         
>         
>         > I mean if I have secondary indexes. Apparently they are
>         > calculated in the background...
>         > 
>         > On 9 December 2010 18:33, David Boxenhorn
>         > <da...@lookin2.com> wrote:
>         >         What do you mean by indexing? 
>         >         
>         >         
>         >         
>         >         On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon
>         >         <sd...@spotuse.com> wrote:
>         >                 Thanks a lot for the answer
>         >                 
>         >                 
>         >                 What about the indexing when adding a new
>         >                 element? Is it incremental?
>         >                 
>         >                 
>         >                 Thanks again
>         >                 
>         >                 
>         >                 
>         >                 
>         >                 On 9 December 2010 14:38, David Boxenhorn
>         >                 <da...@lookin2.com> wrote:
>         >                         How about a regular CF where keys
>         >                         are N@N ?
>         >                         
>         >                         Then, getting a matrix row would be
>         >                         the same cost as getting a matrix
>         >                         column (N gets), and it would be
>         >                         very easy to add element N+1. 
>         >                         
>         >                         
>         >                         
>         >                         
>         >                         On Thu, Dec 9, 2010 at 1:48 PM,
>         >                         Sébastien Druon <sd...@spotuse.com>
>         >                         wrote:
>         >                                 Hello,
>         >                                 
>         >                                 
>         >                                 For a specific case, we are
>         >                                 thinking about representing
>         >                                 a N to N relationship with a
>         >                                 NxN Matrix in Cassandra.
>         >                                 The relations will be only
>         >                                 between a subset of
>         >                                 elements, so the Matrix will
>         >                                 mostly contain empty
>         >                                 elements.
>         >                                 
>         >                                 
>         >                                 We have a set of questions
>         >                                 concerning this:
>         >                                 - what is the best way to
>         >                                 represent this matrix? what
>         >                                 would have the best
>         >                                 performance in reading? in
>         >                                 writing?
>         >                                   . a super column family
>         >                                 with n column families, with
>         >                                 n columns each
>         >                                   . a column family with n
>         >                                 columns and n lines
>         >                                 
>         >                                 
>         >                                 In the second case, we would
>         >                                 need to extract 2 kinds of
>         >                                 information:
>         >                                 - all the relations for a
>         >                                 line: this should be no
>         >                                 specific problem;
>         >                                 - all the relations for a
>         >                                 column: in that case we
>         >                                 would need an index for the
>         >                                 columns, right? and then get
>         >                                 all the lines where the
>         >                                 value of the column in
>         >                                 question is not null... is
>         >                                 it the correct way to do?
>         >                                 When using indexes, say we
>         >                                 want to add another element
>         >                                 N+1. What impact in terms of
>         >                                 time would it have on the
>         >                                 indexation job?
>         >                                 
>         >                                 
>         >                                 Thanks a lot for the
>         >                                 answers,
>         >                                 
>         >                                 
>         >                                 Best regards,
>         >                                 
>         >                                 
>         >                                 Sébastien Druon
>         >                         
>         >                         
>         >                 
>         >                 
>         >         
>         >         
>         > 
>         > 
> 



Re: N to N relationships

Posted by Aaron Morton <aa...@thelastpickle.com>.
RE: storing every value twice. Cassandra is not a RDBMS, the goal is not to achieve fifth normal form. The goal is to design your storage schema to support the queries you wish to run. 

Storage is cheap. And it's really not a pain to store the values more than once. Use the batch_mutate() function. Many designs discussed here involve dernormalising the data to support read queries. 

Try it and see. If you're not completely satisfied you can have your money back.

Hope that helps.
Aaron
On 13/12/2010, at 7:25 AM, Edward Capriolo <ed...@gmail.com> wrote:

> On Sun, Dec 12, 2010 at 3:20 AM, David Boxenhorn <da...@lookin2.com> wrote:
>> You want to store every value twice? That would be a pain to maintain, and
>> possibly lead to inconsistent data.
>> 
>> On Fri, Dec 10, 2010 at 3:50 AM, Nick Bailey <ni...@riptano.com> wrote:
>>> 
>>> I would also recommend two column families. Storing the key as NxN would
>>> require you to hit multiple machines to query for an entire row or column
>>> with RandomPartitioner. Even with OPP you would need to pick row or columns
>>> to order by and the other would require hitting multiple machines.  Two
>>> column families avoids this and avoids any problems with choosing OPP.
>>> 
>>> On Thu, Dec 9, 2010 at 2:26 PM, Aaron Morton <aa...@thelastpickle.com>
>>> wrote:
>>>> 
>>>> Am assuming you have one matrix and you know the dimensions. Also as you
>>>> say the most important queries are to get an entire column or an entire row.
>>>> I would consider using a standard CF for the Columns and one for the
>>>> Rows.  The key for each would be the col / row number, each cassandra column
>>>> name would be the id of the other dimension and the value whatever you want.
>>>> 
>>>> - when storing the data update both the Column and Row CF
>>>> - reading a whole row/col would be simply reading from the appropriate
>>>> CF.
>>>> - reading an intersection is a get_slice to either col or row CF using
>>>> the column_names field to identify the other dimension.
>>>> You would not need secondary indexes to serve these queries.
>>>> Hope that helps.
>>>> Aaron
>>>> On 10 Dec, 2010,at 07:02 AM, Sébastien Druon <sd...@spotuse.com> wrote:
>>>> 
>>>> I mean if I have secondary indexes. Apparently they are calculated in the
>>>> background...
>>>> 
>>>> On 9 December 2010 18:33, David Boxenhorn <da...@lookin2.com> wrote:
>>>>> 
>>>>> What do you mean by indexing?
>>>>> 
>>>>> 
>>>>> On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon <sd...@spotuse.com>
>>>>> wrote:
>>>>>> 
>>>>>> Thanks a lot for the answer
>>>>>> What about the indexing when adding a new element? Is it incremental?
>>>>>> Thanks again
>>>>>> 
>>>>>> 
>>>>>> On 9 December 2010 14:38, David Boxenhorn <da...@lookin2.com> wrote:
>>>>>>> 
>>>>>>> How about a regular CF where keys are N@N ?
>>>>>>> 
>>>>>>> Then, getting a matrix row would be the same cost as getting a matrix
>>>>>>> column (N gets), and it would be very easy to add element N+1.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon <sd...@spotuse.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Hello,
>>>>>>>> For a specific case, we are thinking about representing a N to N
>>>>>>>> relationship with a NxN Matrix in Cassandra.
>>>>>>>> The relations will be only between a subset of elements, so the
>>>>>>>> Matrix will mostly contain empty elements.
>>>>>>>> We have a set of questions concerning this:
>>>>>>>> - what is the best way to represent this matrix? what would have the
>>>>>>>> best performance in reading? in writing?
>>>>>>>>   . a super column family with n column families, with n columns each
>>>>>>>>   . a column family with n columns and n lines
>>>>>>>> In the second case, we would need to extract 2 kinds of information:
>>>>>>>> - all the relations for a line: this should be no specific problem;
>>>>>>>> - all the relations for a column: in that case we would need an index
>>>>>>>> for the columns, right? and then get all the lines where the value of the
>>>>>>>> column in question is not null... is it the correct way to do?
>>>>>>>> When using indexes, say we want to add another element N+1. What
>>>>>>>> impact in terms of time would it have on the indexation job?
>>>>>>>> Thanks a lot for the answers,
>>>>>>>> Best regards,
>>>>>>>> Sébastien Druon
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 
> Before secondary indexes the only option was to store the data twice.
> Yes you have to maintain this yourself. The data model only provides
> fast searches on the key. An index normally a separate entity with
> different ordering, almost the same here.

Re: N to N relationships

Posted by Edward Capriolo <ed...@gmail.com>.
On Sun, Dec 12, 2010 at 3:20 AM, David Boxenhorn <da...@lookin2.com> wrote:
> You want to store every value twice? That would be a pain to maintain, and
> possibly lead to inconsistent data.
>
> On Fri, Dec 10, 2010 at 3:50 AM, Nick Bailey <ni...@riptano.com> wrote:
>>
>> I would also recommend two column families. Storing the key as NxN would
>> require you to hit multiple machines to query for an entire row or column
>> with RandomPartitioner. Even with OPP you would need to pick row or columns
>> to order by and the other would require hitting multiple machines.  Two
>> column families avoids this and avoids any problems with choosing OPP.
>>
>> On Thu, Dec 9, 2010 at 2:26 PM, Aaron Morton <aa...@thelastpickle.com>
>> wrote:
>>>
>>> Am assuming you have one matrix and you know the dimensions. Also as you
>>> say the most important queries are to get an entire column or an entire row.
>>> I would consider using a standard CF for the Columns and one for the
>>> Rows.  The key for each would be the col / row number, each cassandra column
>>> name would be the id of the other dimension and the value whatever you want.
>>>
>>> - when storing the data update both the Column and Row CF
>>> - reading a whole row/col would be simply reading from the appropriate
>>> CF.
>>> - reading an intersection is a get_slice to either col or row CF using
>>> the column_names field to identify the other dimension.
>>> You would not need secondary indexes to serve these queries.
>>> Hope that helps.
>>> Aaron
>>> On 10 Dec, 2010,at 07:02 AM, Sébastien Druon <sd...@spotuse.com> wrote:
>>>
>>> I mean if I have secondary indexes. Apparently they are calculated in the
>>> background...
>>>
>>> On 9 December 2010 18:33, David Boxenhorn <da...@lookin2.com> wrote:
>>>>
>>>> What do you mean by indexing?
>>>>
>>>>
>>>> On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon <sd...@spotuse.com>
>>>> wrote:
>>>>>
>>>>> Thanks a lot for the answer
>>>>> What about the indexing when adding a new element? Is it incremental?
>>>>> Thanks again
>>>>>
>>>>>
>>>>> On 9 December 2010 14:38, David Boxenhorn <da...@lookin2.com> wrote:
>>>>>>
>>>>>> How about a regular CF where keys are N@N ?
>>>>>>
>>>>>> Then, getting a matrix row would be the same cost as getting a matrix
>>>>>> column (N gets), and it would be very easy to add element N+1.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon <sd...@spotuse.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>> For a specific case, we are thinking about representing a N to N
>>>>>>> relationship with a NxN Matrix in Cassandra.
>>>>>>> The relations will be only between a subset of elements, so the
>>>>>>> Matrix will mostly contain empty elements.
>>>>>>> We have a set of questions concerning this:
>>>>>>> - what is the best way to represent this matrix? what would have the
>>>>>>> best performance in reading? in writing?
>>>>>>>   . a super column family with n column families, with n columns each
>>>>>>>   . a column family with n columns and n lines
>>>>>>> In the second case, we would need to extract 2 kinds of information:
>>>>>>> - all the relations for a line: this should be no specific problem;
>>>>>>> - all the relations for a column: in that case we would need an index
>>>>>>> for the columns, right? and then get all the lines where the value of the
>>>>>>> column in question is not null... is it the correct way to do?
>>>>>>> When using indexes, say we want to add another element N+1. What
>>>>>>> impact in terms of time would it have on the indexation job?
>>>>>>> Thanks a lot for the answers,
>>>>>>> Best regards,
>>>>>>> Sébastien Druon
>>>>>
>>>>
>>>
>>
>
>
Before secondary indexes the only option was to store the data twice.
Yes you have to maintain this yourself. The data model only provides
fast searches on the key. An index normally a separate entity with
different ordering, almost the same here.

Re: N to N relationships

Posted by David Boxenhorn <da...@lookin2.com>.
You want to store every value twice? That would be a pain to maintain, and
possibly lead to inconsistent data.

On Fri, Dec 10, 2010 at 3:50 AM, Nick Bailey <ni...@riptano.com> wrote:

> I would also recommend two column families. Storing the key as NxN would
> require you to hit multiple machines to query for an entire row or column
> with RandomPartitioner. Even with OPP you would need to pick row or columns
> to order by and the other would require hitting multiple machines.  Two
> column families avoids this and avoids any problems with choosing OPP.
>
>
> On Thu, Dec 9, 2010 at 2:26 PM, Aaron Morton <aa...@thelastpickle.com>wrote:
>
>> Am assuming you have one matrix and you know the dimensions. Also as you
>> say the most important queries are to get an entire column or an entire row.
>>
>> I would consider using a standard CF for the Columns and one for the Rows.
>>  The key for each would be the col / row number, each cassandra column name
>> would be the id of the other dimension and the value whatever you want.
>>
>> - when storing the data update both the Column and Row CF
>> - reading a whole row/col would be simply reading from the appropriate CF.
>> - reading an intersection is a get_slice to either col or row CF using the
>> column_names field to identify the other dimension.
>>
>> You would not need secondary indexes to serve these queries.
>>
>> Hope that helps.
>> Aaron
>>
>> On 10 Dec, 2010,at 07:02 AM, Sébastien Druon <sd...@spotuse.com> wrote:
>>
>> I mean if I have secondary indexes. Apparently they are calculated in the
>> background...
>>
>> On 9 December 2010 18:33, David Boxenhorn <da...@lookin2.com> wrote:
>>
>>> What do you mean by indexing?
>>>
>>>
>>> On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon <sd...@spotuse.com>wrote:
>>>
>>>> Thanks a lot for the answer
>>>>
>>>> What about the indexing when adding a new element? Is it incremental?
>>>>
>>>> Thanks again
>>>>
>>>>
>>>>
>>>> On 9 December 2010 14:38, David Boxenhorn <da...@lookin2.com> wrote:
>>>>
>>>>> How about a regular CF where keys are N@N ?
>>>>>
>>>>> Then, getting a matrix row would be the same cost as getting a matrix
>>>>> column (N gets), and it would be very easy to add element N+1.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon <sd...@spotuse.com>wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> For a specific case, we are thinking about representing a N to N
>>>>>> relationship with a NxN Matrix in Cassandra.
>>>>>> The relations will be only between a subset of elements, so the Matrix
>>>>>> will mostly contain empty elements.
>>>>>>
>>>>>> We have a set of questions concerning this:
>>>>>> - what is the best way to represent this matrix? what would have the
>>>>>> best performance in reading? in writing?
>>>>>>   . a super column family with n column families, with n columns each
>>>>>>   . a column family with n columns and n lines
>>>>>>
>>>>>> In the second case, we would need to extract 2 kinds of information:
>>>>>> - all the relations for a line: this should be no specific problem;
>>>>>> - all the relations for a column: in that case we would need an index
>>>>>> for the columns, right? and then get all the lines where the value of the
>>>>>> column in question is not null... is it the correct way to do?
>>>>>> When using indexes, say we want to add another element N+1. What
>>>>>> impact in terms of time would it have on the indexation job?
>>>>>>
>>>>>> Thanks a lot for the answers,
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Sébastien Druon
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: N to N relationships

Posted by Nick Bailey <ni...@riptano.com>.
I would also recommend two column families. Storing the key as NxN would
require you to hit multiple machines to query for an entire row or column
with RandomPartitioner. Even with OPP you would need to pick row or columns
to order by and the other would require hitting multiple machines.  Two
column families avoids this and avoids any problems with choosing OPP.

On Thu, Dec 9, 2010 at 2:26 PM, Aaron Morton <aa...@thelastpickle.com>wrote:

> Am assuming you have one matrix and you know the dimensions. Also as you
> say the most important queries are to get an entire column or an entire row.
>
> I would consider using a standard CF for the Columns and one for the Rows.
>  The key for each would be the col / row number, each cassandra column name
> would be the id of the other dimension and the value whatever you want.
>
> - when storing the data update both the Column and Row CF
> - reading a whole row/col would be simply reading from the appropriate CF.
> - reading an intersection is a get_slice to either col or row CF using the
> column_names field to identify the other dimension.
>
> You would not need secondary indexes to serve these queries.
>
> Hope that helps.
> Aaron
>
> On 10 Dec, 2010,at 07:02 AM, Sébastien Druon <sd...@spotuse.com> wrote:
>
> I mean if I have secondary indexes. Apparently they are calculated in the
> background...
>
> On 9 December 2010 18:33, David Boxenhorn <da...@lookin2.com> wrote:
>
>> What do you mean by indexing?
>>
>>
>> On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon <sd...@spotuse.com>wrote:
>>
>>> Thanks a lot for the answer
>>>
>>> What about the indexing when adding a new element? Is it incremental?
>>>
>>> Thanks again
>>>
>>>
>>>
>>> On 9 December 2010 14:38, David Boxenhorn <da...@lookin2.com> wrote:
>>>
>>>> How about a regular CF where keys are N@N ?
>>>>
>>>> Then, getting a matrix row would be the same cost as getting a matrix
>>>> column (N gets), and it would be very easy to add element N+1.
>>>>
>>>>
>>>>
>>>> On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon <sd...@spotuse.com>wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> For a specific case, we are thinking about representing a N to N
>>>>> relationship with a NxN Matrix in Cassandra.
>>>>> The relations will be only between a subset of elements, so the Matrix
>>>>> will mostly contain empty elements.
>>>>>
>>>>> We have a set of questions concerning this:
>>>>> - what is the best way to represent this matrix? what would have the
>>>>> best performance in reading? in writing?
>>>>>   . a super column family with n column families, with n columns each
>>>>>   . a column family with n columns and n lines
>>>>>
>>>>> In the second case, we would need to extract 2 kinds of information:
>>>>> - all the relations for a line: this should be no specific problem;
>>>>> - all the relations for a column: in that case we would need an index
>>>>> for the columns, right? and then get all the lines where the value of the
>>>>> column in question is not null... is it the correct way to do?
>>>>> When using indexes, say we want to add another element N+1. What impact
>>>>> in terms of time would it have on the indexation job?
>>>>>
>>>>> Thanks a lot for the answers,
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Sébastien Druon
>>>>>
>>>>
>>>>
>>>
>>
>

Re: N to N relationships

Posted by Aaron Morton <aa...@thelastpickle.com>.
Am assuming you have one matrix and you know the dimensions. Also as you say the most important queries are to get an entire column or an entire row

I would consider using a standard CF for the Columns and one for the Rows.  The key for each would be the col / row number, each cassandra column name would be the id of the other dimension and the value whatever you want.  

- when storing the data update both the Column and Row CF
- reading a whole row/col would be simply reading from the appropriate CF.
- reading an intersection is a get_slice to either col or row CF using the column_names field to identify the other dimension. 

You would not need secondary indexes to serve these queries. 

Hope that helps.
Aaron

On 10 Dec, 2010,at 07:02 AM, Sébastien Druon <sd...@spotuse.com> wrote:

I mean if I have secondary indexes. Apparently they are calculated in the background...

On 9 December 2010 18:33, David Boxenhorn <da...@lookin2.com> wrote:
What do you mean by indexing? 


On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon <sd...@spotuse.com> wrote:
Thanks a lot for the answer

What about the indexing when adding a new element? Is it incremental?

Thanks again



On 9 December 2010 14:38, David Boxenhorn <da...@lookin2.com> wrote:
How about a regular CF where keys are N@N ?

Then, getting a matrix row would be the same cost as getting a matrix column (N gets), and it would be very easy to add element N+1. 



On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon <sd...@spotuse.com> wrote:
Hello,

For a specific case, we are thinking about representing a N to N relationship with a NxN Matrix in Cassandra.
The relations will be only between a subset of elements, so the Matrix will mostly contain empty elements.

We have a set of questions concerning this:
- what is the best way to represent this matrix? what would have the best performance in reading? in writing?
   a super column family with n column families, with n columns each
   a column family with n columns and n lines

In the second case, we would need to extract 2 kinds of information:
- all the relations for a line: this should be no specific problem;
- all the relations for a column: in that case we would need an index for the columns, right? and then get all the lines where the value of the column in question is not null... is it the correct way to do?
When using indexes, say we want to add another element N+1. What impact in terms of time would it have on the indexation job?

Thanks a lot for the answers,

Best regards,

Sébastien Druon





Re: N to N relationships

Posted by Sébastien Druon <sd...@spotuse.com>.
I mean if I have secondary indexes. Apparently they are calculated in the
background...

On 9 December 2010 18:33, David Boxenhorn <da...@lookin2.com> wrote:

> What do you mean by indexing?
>
> On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon <sd...@spotuse.com>wrote:
>
>> Thanks a lot for the answer
>>
>> What about the indexing when adding a new element? Is it incremental?
>>
>> Thanks again
>>
>>
>> On 9 December 2010 14:38, David Boxenhorn <da...@lookin2.com> wrote:
>>
>>> How about a regular CF where keys are N@N ?
>>>
>>> Then, getting a matrix row would be the same cost as getting a matrix
>>> column (N gets), and it would be very easy to add element N+1.
>>>
>>>
>>> On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon <sd...@spotuse.com>wrote:
>>>
>>>> Hello,
>>>>
>>>> For a specific case, we are thinking about representing a N to N
>>>> relationship with a NxN Matrix in Cassandra.
>>>> The relations will be only between a subset of elements, so the Matrix
>>>> will mostly contain empty elements.
>>>>
>>>> We have a set of questions concerning this:
>>>> - what is the best way to represent this matrix? what would have the
>>>> best performance in reading? in writing?
>>>>   . a super column family with n column families, with n columns each
>>>>   . a column family with n columns and n lines
>>>>
>>>> In the second case, we would need to extract 2 kinds of information:
>>>> - all the relations for a line: this should be no specific problem;
>>>> - all the relations for a column: in that case we would need an index
>>>> for the columns, right? and then get all the lines where the value of the
>>>> column in question is not null... is it the correct way to do?
>>>> When using indexes, say we want to add another element N+1. What impact
>>>> in terms of time would it have on the indexation job?
>>>>
>>>> Thanks a lot for the answers,
>>>>
>>>> Best regards,
>>>>
>>>> Sébastien Druon
>>>>
>>>
>>>
>>
>

Re: N to N relationships

Posted by David Boxenhorn <da...@lookin2.com>.
What do you mean by indexing?

On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon <sd...@spotuse.com> wrote:

> Thanks a lot for the answer
>
> What about the indexing when adding a new element? Is it incremental?
>
> Thanks again
>
>
> On 9 December 2010 14:38, David Boxenhorn <da...@lookin2.com> wrote:
>
>> How about a regular CF where keys are N@N ?
>>
>> Then, getting a matrix row would be the same cost as getting a matrix
>> column (N gets), and it would be very easy to add element N+1.
>>
>>
>> On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon <sd...@spotuse.com>wrote:
>>
>>> Hello,
>>>
>>> For a specific case, we are thinking about representing a N to N
>>> relationship with a NxN Matrix in Cassandra.
>>> The relations will be only between a subset of elements, so the Matrix
>>> will mostly contain empty elements.
>>>
>>> We have a set of questions concerning this:
>>> - what is the best way to represent this matrix? what would have the best
>>> performance in reading? in writing?
>>>   . a super column family with n column families, with n columns each
>>>   . a column family with n columns and n lines
>>>
>>> In the second case, we would need to extract 2 kinds of information:
>>> - all the relations for a line: this should be no specific problem;
>>> - all the relations for a column: in that case we would need an index for
>>> the columns, right? and then get all the lines where the value of the column
>>> in question is not null... is it the correct way to do?
>>> When using indexes, say we want to add another element N+1. What impact
>>> in terms of time would it have on the indexation job?
>>>
>>> Thanks a lot for the answers,
>>>
>>> Best regards,
>>>
>>> Sébastien Druon
>>>
>>
>>
>

Re: N to N relationships

Posted by Sébastien Druon <sd...@spotuse.com>.
Thanks a lot for the answer

What about the indexing when adding a new element? Is it incremental?

Thanks again

On 9 December 2010 14:38, David Boxenhorn <da...@lookin2.com> wrote:

> How about a regular CF where keys are N@N ?
>
> Then, getting a matrix row would be the same cost as getting a matrix
> column (N gets), and it would be very easy to add element N+1.
>
>
> On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon <sd...@spotuse.com>wrote:
>
>> Hello,
>>
>> For a specific case, we are thinking about representing a N to N
>> relationship with a NxN Matrix in Cassandra.
>> The relations will be only between a subset of elements, so the Matrix
>> will mostly contain empty elements.
>>
>> We have a set of questions concerning this:
>> - what is the best way to represent this matrix? what would have the best
>> performance in reading? in writing?
>>   . a super column family with n column families, with n columns each
>>   . a column family with n columns and n lines
>>
>> In the second case, we would need to extract 2 kinds of information:
>> - all the relations for a line: this should be no specific problem;
>> - all the relations for a column: in that case we would need an index for
>> the columns, right? and then get all the lines where the value of the column
>> in question is not null... is it the correct way to do?
>> When using indexes, say we want to add another element N+1. What impact in
>> terms of time would it have on the indexation job?
>>
>> Thanks a lot for the answers,
>>
>> Best regards,
>>
>> Sébastien Druon
>>
>
>

Re: N to N relationships

Posted by David Boxenhorn <da...@lookin2.com>.
How about a regular CF where keys are N@N ?

Then, getting a matrix row would be the same cost as getting a matrix column
(N gets), and it would be very easy to add element N+1.


On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon <sd...@spotuse.com> wrote:

> Hello,
>
> For a specific case, we are thinking about representing a N to N
> relationship with a NxN Matrix in Cassandra.
> The relations will be only between a subset of elements, so the Matrix will
> mostly contain empty elements.
>
> We have a set of questions concerning this:
> - what is the best way to represent this matrix? what would have the best
> performance in reading? in writing?
>   . a super column family with n column families, with n columns each
>   . a column family with n columns and n lines
>
> In the second case, we would need to extract 2 kinds of information:
> - all the relations for a line: this should be no specific problem;
> - all the relations for a column: in that case we would need an index for
> the columns, right? and then get all the lines where the value of the column
> in question is not null... is it the correct way to do?
> When using indexes, say we want to add another element N+1. What impact in
> terms of time would it have on the indexation job?
>
> Thanks a lot for the answers,
>
> Best regards,
>
> Sébastien Druon
>