You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Erez Efrati <er...@gmail.com> on 2010/03/24 21:36:13 UTC

Model Question

Hi,

I can't figure out how to use model the following using column family and
the way the columns are sorted (by their name).

Lets say I have a list of users and for each user I wish to display a list
of all the friends he has ordered by the number of messages they sent him so
far (desc from most to least).

I can't see how this is going to work since the columns sorting is always by
the name of the column and not its value. I thought of having a row for each
user and the columns will be the friends that email him. But the column name
needs to be the number of messages to be sorted and the value will be the
friend's user ID. But then, when a friend is sending a message to another
user how do I increment his count of message he sent so far to that user?

How can I model this with Cassandra? Is it possible?

Thanks in advance,

Erez Efrati

Re: Model Question

Posted by Benjamin Black <b...@b3k.us>.
Erez,

To make this work you have to make your model fit Cassandra, not the
other way around.  As a rule, you either do complex queries via client
code to process the results of several, simpler queries or via a CF
you create to act as an index.  Yes, this means you have to write data
to each index in which it belongs.  Which approach to use depends
entirely on your application and your preferences.


b

On Thu, Mar 25, 2010 at 2:09 AM, Erez Efrati <er...@gmail.com> wrote:
> You are correct Chris.
> I am a newbie too in this field.
> I like the Cassandra/NoSQL way and I am trying to see if it can fit my
> model.
> Thanks,
> Erez
>
> On Thu, Mar 25, 2010 at 11:03 AM, Christopher Brind
> <ch...@googlemail.com> wrote:
>>
>> Hi,
>> I wondered if you were eluding to something more complex.   You'd probably
>> want to create a index using something along the lines that Peter suggested.
>> :)
>> But I'm a Cassandra / Column DB newbie, so my experience ends just about
>> ... here. :)
>> Cheers,
>> Chris
>>
>> On 25 March 2010 08:59, Erez Efrati <er...@gmail.com> wrote:
>>>
>>> Hi Chris,
>>> So, if I get it right, you suggest that I pull all the columns for in a
>>> single row and do the sorting client side?
>>> The user-friends-messages was just an example and maybe not the best I
>>> could come up with cause I agree that there are not too many friends in
>>> general that send you messages.
>>> What I wanted to keep track of companies and user-visit count. Each
>>> company can have potentially millions of users. Then for each company I want
>>> to display in pages from the top visiting user to the least one.
>>> Would you still upload the whole company row columns and sort it on the
>>> client?
>>> How do keep updating the visits?
>>> Thanks,
>>> Erez
>>> On Thu, Mar 25, 2010 at 12:35 AM, Christopher Brind
>>> <ch...@googlemail.com> wrote:
>>>>
>>>> Hi Erez,
>>>> Don't know how many friends a user in your system is likely to have, but
>>>> are they likely to have received so many messages from friends that you
>>>> can't sort it in your client app?
>>>> See:
>>>>
>>>> http://java.sun.com/j2se/1.4.2/docs/api/java/util/Collections.html#sort(java.util.List)
>>>> Assuming the user has 10,000 friends (I'm sure I don't even know 10,000
>>>> people :) with Java's Collections.sort which guarantees performance of O(n
>>>> log(n)) let's say it takes 1ms to process each item, you're looking at
>>>> 40,000ms to do a sort plus a little overhead to avoid the O( n2 log(n))  -
>>>> that's 40 seconds to sort for 10,000 friends...
>>>> On Facebook I have 363 friends that's 929ms + overhead, i.e. around a
>>>> second.  Apparently the average Facebook user has 130 friends:
>>>> http://www.facebook.com/press/info.php?statistics
>>>> So I can't imagine the sort exceeding much more than a second or so
>>>> except for the most popular users - in practice I would hope sub-second
>>>> easily.  Does that help?  Or is there something special happening in your
>>>> system?
>>>> Cheers,
>>>> Chris
>>>>
>>>>
>>>> On 24 March 2010 20:36, Erez Efrati <er...@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>> I can't figure out how to use model the following using column family
>>>>> and the way the columns are sorted (by their name).
>>>>> Lets say I have a list of users and for each user I wish to display a
>>>>> list of all the friends he has ordered by the number of messages they sent
>>>>> him so far (desc from most to least).
>>>>> I can't see how this is going to work since the columns sorting is
>>>>> always by the name of the column and not its value. I thought of having a
>>>>> row for each user and the columns will be the friends that email him. But
>>>>> the column name needs to be the number of messages to be sorted and the
>>>>> value will be the friend's user ID. But then, when a friend is sending a
>>>>> message to another user how do I increment his count of message he sent so
>>>>> far to that user?
>>>>> How can I model this with Cassandra? Is it possible?
>>>>> Thanks in advance,
>>>>> Erez Efrati
>>>
>>
>
>

Re: Model Question

Posted by Erez Efrati <er...@gmail.com>.
You are correct Chris.

I am a newbie too in this field.
I like the Cassandra/NoSQL way and I am trying to see if it can fit my
model.

Thanks,
Erez

On Thu, Mar 25, 2010 at 11:03 AM, Christopher Brind <
christopher.brind@googlemail.com> wrote:

> Hi,
>
> I wondered if you were eluding to something more complex.   You'd probably
> want to create a index using something along the lines that Peter suggested.
> :)
>
> But I'm a Cassandra / Column DB newbie, so my experience ends just about
> ... here. :)
>
> Cheers,
> Chris
>
>
> On 25 March 2010 08:59, Erez Efrati <er...@gmail.com> wrote:
>
>> Hi Chris,
>>
>> So, if I get it right, you suggest that I pull all the columns for in a
>> single row and do the sorting client side?
>> The user-friends-messages was just an example and maybe not the best I
>> could come up with cause I agree that there are not too many friends in
>> general that send you messages.
>>
>> What I wanted to keep track of companies and user-visit count. Each
>> company can have potentially millions of users. Then for each company I want
>> to display in pages from the top visiting user to the least one.
>> Would you still upload the whole company row columns and sort it on the
>> client?
>> How do keep updating the visits?
>>
>> Thanks,
>> Erez
>>
>> On Thu, Mar 25, 2010 at 12:35 AM, Christopher Brind <
>> christopher.brind@googlemail.com> wrote:
>>
>>> Hi Erez,
>>>
>>> Don't know how many friends a user in your system is likely to have, but
>>> are they likely to have received so many messages from friends that you
>>> can't sort it in your client app?
>>>
>>> See:
>>>
>>> http://java.sun.com/j2se/1.4.2/docs/api/java/util/Collections.html#sort(java.util.List)
>>>
>>> Assuming the user has 10,000 friends (I'm sure I don't even know 10,000
>>> people :) with Java's Collections.sort which guarantees performance of O(n
>>> log(n)) let's say it takes 1ms to process each item, you're looking at
>>> 40,000ms to do a sort plus a little overhead to avoid the O( n2 log(n))  -
>>> that's 40 seconds to sort for 10,000 friends...
>>>
>>> On Facebook I have 363 friends that's 929ms + overhead, i.e. around a
>>> second.  Apparently the average Facebook user has 130 friends:
>>> http://www.facebook.com/press/info.php?statistics
>>>
>>> So I can't imagine the sort exceeding much more than a second or so
>>> except for the most popular users - in practice I would hope sub-second
>>> easily.  Does that help?  Or is there something special happening in your
>>> system?
>>>
>>> Cheers,
>>> Chris
>>>
>>>
>>>
>>> On 24 March 2010 20:36, Erez Efrati <er...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I can't figure out how to use model the following using column family
>>>> and the way the columns are sorted (by their name).
>>>>
>>>> Lets say I have a list of users and for each user I wish to display a
>>>> list of all the friends he has ordered by the number of messages they sent
>>>> him so far (desc from most to least).
>>>>
>>>> I can't see how this is going to work since the columns sorting is
>>>> always by the name of the column and not its value. I thought of having a
>>>> row for each user and the columns will be the friends that email him. But
>>>> the column name needs to be the number of messages to be sorted and the
>>>> value will be the friend's user ID. But then, when a friend is sending a
>>>> message to another user how do I increment his count of message he sent so
>>>> far to that user?
>>>>
>>>> How can I model this with Cassandra? Is it possible?
>>>>
>>>> Thanks in advance,
>>>>
>>>> Erez Efrati
>>>>
>>>
>>>
>>
>

Re: Model Question

Posted by Christopher Brind <ch...@googlemail.com>.
Hi,

I wondered if you were eluding to something more complex.   You'd probably
want to create a index using something along the lines that Peter suggested.
:)

But I'm a Cassandra / Column DB newbie, so my experience ends just about ...
here. :)

Cheers,
Chris


On 25 March 2010 08:59, Erez Efrati <er...@gmail.com> wrote:

> Hi Chris,
>
> So, if I get it right, you suggest that I pull all the columns for in a
> single row and do the sorting client side?
> The user-friends-messages was just an example and maybe not the best I
> could come up with cause I agree that there are not too many friends in
> general that send you messages.
>
> What I wanted to keep track of companies and user-visit count. Each company
> can have potentially millions of users. Then for each company I want to
> display in pages from the top visiting user to the least one.
> Would you still upload the whole company row columns and sort it on the
> client?
> How do keep updating the visits?
>
> Thanks,
> Erez
>
> On Thu, Mar 25, 2010 at 12:35 AM, Christopher Brind <
> christopher.brind@googlemail.com> wrote:
>
>> Hi Erez,
>>
>> Don't know how many friends a user in your system is likely to have, but
>> are they likely to have received so many messages from friends that you
>> can't sort it in your client app?
>>
>> See:
>>
>> http://java.sun.com/j2se/1.4.2/docs/api/java/util/Collections.html#sort(java.util.List)
>>
>> Assuming the user has 10,000 friends (I'm sure I don't even know 10,000
>> people :) with Java's Collections.sort which guarantees performance of O(n
>> log(n)) let's say it takes 1ms to process each item, you're looking at
>> 40,000ms to do a sort plus a little overhead to avoid the O( n2 log(n))  -
>> that's 40 seconds to sort for 10,000 friends...
>>
>> On Facebook I have 363 friends that's 929ms + overhead, i.e. around a
>> second.  Apparently the average Facebook user has 130 friends:
>> http://www.facebook.com/press/info.php?statistics
>>
>> So I can't imagine the sort exceeding much more than a second or so except
>> for the most popular users - in practice I would hope sub-second easily.
>>  Does that help?  Or is there something special happening in your system?
>>
>> Cheers,
>> Chris
>>
>>
>>
>> On 24 March 2010 20:36, Erez Efrati <er...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I can't figure out how to use model the following using column family and
>>> the way the columns are sorted (by their name).
>>>
>>> Lets say I have a list of users and for each user I wish to display a
>>> list of all the friends he has ordered by the number of messages they sent
>>> him so far (desc from most to least).
>>>
>>> I can't see how this is going to work since the columns sorting is always
>>> by the name of the column and not its value. I thought of having a row for
>>> each user and the columns will be the friends that email him. But the column
>>> name needs to be the number of messages to be sorted and the value will be
>>> the friend's user ID. But then, when a friend is sending a message to
>>> another user how do I increment his count of message he sent so far to that
>>> user?
>>>
>>> How can I model this with Cassandra? Is it possible?
>>>
>>> Thanks in advance,
>>>
>>> Erez Efrati
>>>
>>
>>
>

Re: Model Question

Posted by Erez Efrati <er...@gmail.com>.
Hi Chris,

So, if I get it right, you suggest that I pull all the columns for in a
single row and do the sorting client side?
The user-friends-messages was just an example and maybe not the best I could
come up with cause I agree that there are not too many friends in general
that send you messages.

What I wanted to keep track of companies and user-visit count. Each company
can have potentially millions of users. Then for each company I want to
display in pages from the top visiting user to the least one.
Would you still upload the whole company row columns and sort it on the
client?
How do keep updating the visits?

Thanks,
Erez

On Thu, Mar 25, 2010 at 12:35 AM, Christopher Brind <
christopher.brind@googlemail.com> wrote:

> Hi Erez,
>
> Don't know how many friends a user in your system is likely to have, but
> are they likely to have received so many messages from friends that you
> can't sort it in your client app?
>
> See:
>
> http://java.sun.com/j2se/1.4.2/docs/api/java/util/Collections.html#sort(java.util.List)
>
> Assuming the user has 10,000 friends (I'm sure I don't even know 10,000
> people :) with Java's Collections.sort which guarantees performance of O(n
> log(n)) let's say it takes 1ms to process each item, you're looking at
> 40,000ms to do a sort plus a little overhead to avoid the O( n2 log(n))  -
> that's 40 seconds to sort for 10,000 friends...
>
> On Facebook I have 363 friends that's 929ms + overhead, i.e. around a
> second.  Apparently the average Facebook user has 130 friends:
> http://www.facebook.com/press/info.php?statistics
>
> So I can't imagine the sort exceeding much more than a second or so except
> for the most popular users - in practice I would hope sub-second easily.
>  Does that help?  Or is there something special happening in your system?
>
> Cheers,
> Chris
>
>
>
> On 24 March 2010 20:36, Erez Efrati <er...@gmail.com> wrote:
>
>> Hi,
>>
>> I can't figure out how to use model the following using column family and
>> the way the columns are sorted (by their name).
>>
>> Lets say I have a list of users and for each user I wish to display a list
>> of all the friends he has ordered by the number of messages they sent him so
>> far (desc from most to least).
>>
>> I can't see how this is going to work since the columns sorting is always
>> by the name of the column and not its value. I thought of having a row for
>> each user and the columns will be the friends that email him. But the column
>> name needs to be the number of messages to be sorted and the value will be
>> the friend's user ID. But then, when a friend is sending a message to
>> another user how do I increment his count of message he sent so far to that
>> user?
>>
>> How can I model this with Cassandra? Is it possible?
>>
>> Thanks in advance,
>>
>> Erez Efrati
>>
>
>

Re: Model Question

Posted by Christopher Brind <ch...@googlemail.com>.
Hi Erez,

Don't know how many friends a user in your system is likely to have, but are
they likely to have received so many messages from friends that you can't
sort it in your client app?

See:
http://java.sun.com/j2se/1.4.2/docs/api/java/util/Collections.html#sort(java.util.List)

Assuming the user has 10,000 friends (I'm sure I don't even know 10,000
people :) with Java's Collections.sort which guarantees performance of O(n
log(n)) let's say it takes 1ms to process each item, you're looking at
40,000ms to do a sort plus a little overhead to avoid the O( n2 log(n))  -
that's 40 seconds to sort for 10,000 friends...

On Facebook I have 363 friends that's 929ms + overhead, i.e. around a
second.  Apparently the average Facebook user has 130 friends:
http://www.facebook.com/press/info.php?statistics

So I can't imagine the sort exceeding much more than a second or so except
for the most popular users - in practice I would hope sub-second easily.
 Does that help?  Or is there something special happening in your system?

Cheers,
Chris



On 24 March 2010 20:36, Erez Efrati <er...@gmail.com> wrote:

> Hi,
>
> I can't figure out how to use model the following using column family and
> the way the columns are sorted (by their name).
>
> Lets say I have a list of users and for each user I wish to display a list
> of all the friends he has ordered by the number of messages they sent him so
> far (desc from most to least).
>
> I can't see how this is going to work since the columns sorting is always
> by the name of the column and not its value. I thought of having a row for
> each user and the columns will be the friends that email him. But the column
> name needs to be the number of messages to be sorted and the value will be
> the friend's user ID. But then, when a friend is sending a message to
> another user how do I increment his count of message he sent so far to that
> user?
>
> How can I model this with Cassandra? Is it possible?
>
> Thanks in advance,
>
> Erez Efrati
>

Re: Model Question

Posted by Peter Chang <pe...@gmail.com>.
Do you mean on the client? It really depends on how many items you're
sorting. In terms of computer runtime, client-side will always likely be
faster but if you take into account bandwidth speeds having a pre-sorted
list will be better for large lists.

Creating 0-padded numbers is pretty straightforward. That's how people sort
number values (stored as strings) in Simple DB.


On Thu, Mar 25, 2010 at 1:32 AM, Colin Vipurs <zo...@gmail.com> wrote:

> Peter,
>
> Do you think 0-padding the entries would be more efficient than just
> implementing your own comparator?
>
> On Wed, Mar 24, 2010 at 10:57 PM, Peter Chang <pe...@gmail.com> wrote:
> > If there's not much overhead, I recommend client side as well.
> > Otherwise, you can only sort on column. Therefore, you could create some
> > sort of inverted index based on the message count.
> > User 1 sent 50 messages.
> > User 2 sent 10 messages.
> > User 3 sent 25 messages.
> > Then store a separate index that looks like:
> >   ->  50-User-1-Key
> >   ->  25-User-2-Key
> >   ->  10-User-2-Key
> > You'd also have to 0-pad your count so that numbers are correctly
> compared
> > (12 is less than 110) since you'll have to use some lexical-based
> sorting.
> >
> >
> > On Wed, Mar 24, 2010 at 1:36 PM, Erez Efrati <er...@gmail.com> wrote:
> >>
> >> Hi,
> >> I can't figure out how to use model the following using column family
> and
> >> the way the columns are sorted (by their name).
> >> Lets say I have a list of users and for each user I wish to display a
> list
> >> of all the friends he has ordered by the number of messages they sent
> him so
> >> far (desc from most to least).
> >> I can't see how this is going to work since the columns sorting is
> always
> >> by the name of the column and not its value. I thought of having a row
> for
> >> each user and the columns will be the friends that email him. But the
> column
> >> name needs to be the number of messages to be sorted and the value will
> be
> >> the friend's user ID. But then, when a friend is sending a message to
> >> another user how do I increment his count of message he sent so far to
> that
> >> user?
> >> How can I model this with Cassandra? Is it possible?
> >> Thanks in advance,
> >> Erez Efrati
> >
>
>
>
> --
> Maybe she awoke to see the roommate's boyfriend swinging from the
> chandelier wearing a boar's head.
>
> Something which you, I, and everyone else would call "Tuesday", of course.
>

Re: Model Question

Posted by Colin Vipurs <zo...@gmail.com>.
Peter,

Do you think 0-padding the entries would be more efficient than just
implementing your own comparator?

On Wed, Mar 24, 2010 at 10:57 PM, Peter Chang <pe...@gmail.com> wrote:
> If there's not much overhead, I recommend client side as well.
> Otherwise, you can only sort on column. Therefore, you could create some
> sort of inverted index based on the message count.
> User 1 sent 50 messages.
> User 2 sent 10 messages.
> User 3 sent 25 messages.
> Then store a separate index that looks like:
>   ->  50-User-1-Key
>   ->  25-User-2-Key
>   ->  10-User-2-Key
> You'd also have to 0-pad your count so that numbers are correctly compared
> (12 is less than 110) since you'll have to use some lexical-based sorting.
>
>
> On Wed, Mar 24, 2010 at 1:36 PM, Erez Efrati <er...@gmail.com> wrote:
>>
>> Hi,
>> I can't figure out how to use model the following using column family and
>> the way the columns are sorted (by their name).
>> Lets say I have a list of users and for each user I wish to display a list
>> of all the friends he has ordered by the number of messages they sent him so
>> far (desc from most to least).
>> I can't see how this is going to work since the columns sorting is always
>> by the name of the column and not its value. I thought of having a row for
>> each user and the columns will be the friends that email him. But the column
>> name needs to be the number of messages to be sorted and the value will be
>> the friend's user ID. But then, when a friend is sending a message to
>> another user how do I increment his count of message he sent so far to that
>> user?
>> How can I model this with Cassandra? Is it possible?
>> Thanks in advance,
>> Erez Efrati
>



-- 
Maybe she awoke to see the roommate's boyfriend swinging from the
chandelier wearing a boar's head.

Something which you, I, and everyone else would call "Tuesday", of course.

Re: Model Question

Posted by Erez Efrati <er...@gmail.com>.
I am not clear how does this work when I want to increase the count of
user-1.

Thanks
Erez

On Thu, Mar 25, 2010 at 12:57 AM, Peter Chang <pe...@gmail.com> wrote:

> If there's not much overhead, I recommend client side as well.
>
> Otherwise, you can only sort on column. Therefore, you could create some
> sort of inverted index based on the message count.
>
> User 1 sent 50 messages.
> User 2 sent 10 messages.
> User 3 sent 25 messages.
>
> Then store a separate index that looks like:
>   ->  50-User-1-Key
>   ->  25-User-2-Key
>   ->  10-User-2-Key
>
> You'd also have to 0-pad your count so that numbers are correctly compared
> (12 is less than 110) since you'll have to use some lexical-based sorting.
>
>
>
> On Wed, Mar 24, 2010 at 1:36 PM, Erez Efrati <er...@gmail.com> wrote:
>
>> Hi,
>>
>> I can't figure out how to use model the following using column family and
>> the way the columns are sorted (by their name).
>>
>> Lets say I have a list of users and for each user I wish to display a list
>> of all the friends he has ordered by the number of messages they sent him so
>> far (desc from most to least).
>>
>> I can't see how this is going to work since the columns sorting is always
>> by the name of the column and not its value. I thought of having a row for
>> each user and the columns will be the friends that email him. But the column
>> name needs to be the number of messages to be sorted and the value will be
>> the friend's user ID. But then, when a friend is sending a message to
>> another user how do I increment his count of message he sent so far to that
>> user?
>>
>> How can I model this with Cassandra? Is it possible?
>>
>> Thanks in advance,
>>
>> Erez Efrati
>>
>
>

Re: Model Question

Posted by Peter Chang <pe...@gmail.com>.
If there's not much overhead, I recommend client side as well.

Otherwise, you can only sort on column. Therefore, you could create some
sort of inverted index based on the message count.

User 1 sent 50 messages.
User 2 sent 10 messages.
User 3 sent 25 messages.

Then store a separate index that looks like:
  ->  50-User-1-Key
  ->  25-User-2-Key
  ->  10-User-2-Key

You'd also have to 0-pad your count so that numbers are correctly compared
(12 is less than 110) since you'll have to use some lexical-based sorting.



On Wed, Mar 24, 2010 at 1:36 PM, Erez Efrati <er...@gmail.com> wrote:

> Hi,
>
> I can't figure out how to use model the following using column family and
> the way the columns are sorted (by their name).
>
> Lets say I have a list of users and for each user I wish to display a list
> of all the friends he has ordered by the number of messages they sent him so
> far (desc from most to least).
>
> I can't see how this is going to work since the columns sorting is always
> by the name of the column and not its value. I thought of having a row for
> each user and the columns will be the friends that email him. But the column
> name needs to be the number of messages to be sorted and the value will be
> the friend's user ID. But then, when a friend is sending a message to
> another user how do I increment his count of message he sent so far to that
> user?
>
> How can I model this with Cassandra? Is it possible?
>
> Thanks in advance,
>
> Erez Efrati
>