You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Sasha Dolgy <sd...@gmail.com> on 2011/02/15 13:20:31 UTC

online chat scenario

hi everyone,

is anyone using cassandra as a backend repository for storing and serving
online chat information?  are you able to share your design thoughts?  have
you encountered problems with the data structure you've implemented?   i was
playing with some ideas and each time i come back to super columns, which
i'd like to avoid if possible.

i have read elsewhere that people are suggesting MongoDB or redis ... i'm
curious about Cassandra.

kind regards,
-sd

-- 
Sasha Dolgy
sasha.dolgy@gmail.com

Re: online chat scenario

Posted by Sasha Dolgy <sd...@gmail.com>.
Hi Aaron,

I did come across this:

http://www.juhonkoti.net/2010/09/25/example-how-to-model-your-data-into-nosql-with-cassandra

<http://www.juhonkoti.net/2010/09/25/example-how-to-model-your-data-into-nosql-with-cassandra>Was
this what you were referring to?  I found this one interesting, and keep
coming back to it but have some concerns that this is the best way to
achieve the same result.

-sd

On Tue, Feb 15, 2011 at 8:50 PM, Aaron Morton <aa...@thelastpickle.com>wrote:

> There was a by here last year who did something similar and did a nice
> write up. Cannot find it right now, some googleing  may help.
>
> Aaron
>
>
> On 16/02/2011, at 2:56 AM, Victor Kabdebon <vi...@gmail.com>
> wrote:
>
> Hello Sasha.
>
> In this sort of real time application the way you insert (QUORUM, ONE,
> etc..) and  the way you retrieve is extremely important because your data
> may not have had the time to propagate to all your nodes. Be sure to use
> adequate policies to do that : insert to a certain number of nodes but don't
> sacrifice to much time doing that to keep the real time component.
> Here is a presentation of how the chat is made in Facebook, it may be
> useful to you :
>
>
> <http://www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf>
> http://www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf
>
> It's more focused on erlang, but it might give you ideas on how to deal
> with that problem (I am not sure that DB are the best way to deal with
> that... but it's just my opinion).
>
> Victor Kabdebon
> <http://www.voxnucleus.fr>http://www.voxnucleus.fr
>
>
>
> 2011/2/15 Sasha Dolgy < <sd...@gmail.com>
>
>> thanks for the response.  thinking about this, this would not allow for
>> the sorting of messages into a chronological order for end user display.  i
>> had thought about having each message as its own column against the room or
>> the user, but i have had some inconsistencies in retrieving the data.
>> sometimes i get 3 columns, sometimes i get 50...( i think this is because of
>> the random partitioner)
>>
>> i had thought about this structure:
>>
>> [messages][nickname][message id => message data]
>> [chatrooms][room_name][message id]
>>
>> this way i can pull all messages a user ever posted, not specific to a
>> room.  what i haven't been able to do so far is print the timestamp on the
>> row or column.  does this have to be explicitly added somewhere or can it be
>> returned as part of a 'get' request?
>>
>> -sd
>>
>>
>> On Tue, Feb 15, 2011 at 2:12 PM, Michal Augustýn <<a...@gmail.com>
>> augustyn.michal@gmail.com> wrote:
>>
>>> The schema design depends on chatrooms/users/messages numbers. I.e. you
>>> can have one CF, where key is chatroom, column name is username, column
>>> value is the message and message time is the same as column timestamp.
>>> You can add day-timestamp to the chatroom name to avoid large rows.
>>>
>>> Augi
>>>
>>> 2011/2/15 Andrey V. Panov < <pa...@gmail.com>
>>>
>>> I never did it. But I suppose you can use "chatroom name" as key and
>>>> store messages & nicks as columns in JSON and timestamp as columnName.
>>>>
>>>
>>>
>>
>>
>> --
>> Sasha Dolgy
>> <sa...@gmail.com>sasha.dolgy@gmail.com
>>
>
>


-- 
Sasha Dolgy
sasha.dolgy@gmail.com

Re: online chat scenario

Posted by Aaron Morton <aa...@thelastpickle.com>.
There was a by here last year who did something similar and did a nice write up. Cannot find it right now, some googleing  may help.

Aaron


On 16/02/2011, at 2:56 AM, Victor Kabdebon <vi...@gmail.com> wrote:

> Hello Sasha.
> 
> In this sort of real time application the way you insert (QUORUM, ONE, etc..) and  the way you retrieve is extremely important because your data may not have had the time to propagate to all your nodes. Be sure to use adequate policies to do that : insert to a certain number of nodes but don't sacrifice to much time doing that to keep the real time component.
> Here is a presentation of how the chat is made in Facebook, it may be useful to you :
> 
> http://www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf
> 
> It's more focused on erlang, but it might give you ideas on how to deal with that problem (I am not sure that DB are the best way to deal with that... but it's just my opinion).
> 
> Victor Kabdebon
> http://www.voxnucleus.fr
> 
> 
> 
> 2011/2/15 Sasha Dolgy <sd...@gmail.com>
> thanks for the response.  thinking about this, this would not allow for the sorting of messages into a chronological order for end user display.  i had thought about having each message as its own column against the room or the user, but i have had some inconsistencies in retrieving the data.  sometimes i get 3 columns, sometimes i get 50...( i think this is because of the random partitioner)
>  
> i had thought about this structure:
>  
> [messages][nickname][message id => message data]
> [chatrooms][room_name][message id]
>  
> this way i can pull all messages a user ever posted, not specific to a room.  what i haven't been able to do so far is print the timestamp on the row or column.  does this have to be explicitly added somewhere or can it be returned as part of a 'get' request? 
>  
> -sd
>  
>  
> On Tue, Feb 15, 2011 at 2:12 PM, Michal Augustýn <au...@gmail.com> wrote:
> The schema design depends on chatrooms/users/messages numbers. I.e. you can have one CF, where key is chatroom, column name is username, column value is the message and message time is the same as column timestamp.
> You can add day-timestamp to the chatroom name to avoid large rows.
> 
> Augi
> 
> 2011/2/15 Andrey V. Panov <pa...@gmail.com>
> 
> I never did it. But I suppose you can use "chatroom name" as key and store messages & nicks as columns in JSON and timestamp as columnName.
> 
> 
> 
> 
> -- 
> Sasha Dolgy
> sasha.dolgy@gmail.com
> 

Re: online chat scenario

Posted by Victor Kabdebon <vi...@gmail.com>.
Hello Sasha.

In this sort of real time application the way you insert (QUORUM, ONE,
etc..) and  the way you retrieve is extremely important because your data
may not have had the time to propagate to all your nodes. Be sure to use
adequate policies to do that : insert to a certain number of nodes but don't
sacrifice to much time doing that to keep the real time component.
Here is a presentation of how the chat is made in Facebook, it may be useful
to you :

http://www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf

It's more focused on erlang, but it might give you ideas on how to deal with
that problem (I am not sure that DB are the best way to deal with that...
but it's just my opinion).

Victor Kabdebon
http://www.voxnucleus.fr



2011/2/15 Sasha Dolgy <sd...@gmail.com>

> thanks for the response.  thinking about this, this would not allow for the
> sorting of messages into a chronological order for end user display.  i had
> thought about having each message as its own column against the room or the
> user, but i have had some inconsistencies in retrieving the data.  sometimes
> i get 3 columns, sometimes i get 50...( i think this is because of the
> random partitioner)
>
> i had thought about this structure:
>
> [messages][nickname][message id => message data]
> [chatrooms][room_name][message id]
>
> this way i can pull all messages a user ever posted, not specific to a
> room.  what i haven't been able to do so far is print the timestamp on the
> row or column.  does this have to be explicitly added somewhere or can it be
> returned as part of a 'get' request?
>
> -sd
>
>
> On Tue, Feb 15, 2011 at 2:12 PM, Michal Augustýn <
> augustyn.michal@gmail.com> wrote:
>
>> The schema design depends on chatrooms/users/messages numbers. I.e. you
>> can have one CF, where key is chatroom, column name is username, column
>> value is the message and message time is the same as column timestamp.
>> You can add day-timestamp to the chatroom name to avoid large rows.
>>
>> Augi
>>
>> 2011/2/15 Andrey V. Panov <pa...@gmail.com>
>>
>> I never did it. But I suppose you can use "chatroom name" as key and store
>>> messages & nicks as columns in JSON and timestamp as columnName.
>>>
>>
>>
>
>
> --
> Sasha Dolgy
> sasha.dolgy@gmail.com
>

Re: online chat scenario

Posted by Sasha Dolgy <sd...@gmail.com>.
thanks for the response.  thinking about this, this would not allow for the
sorting of messages into a chronological order for end user display.  i had
thought about having each message as its own column against the room or the
user, but i have had some inconsistencies in retrieving the data.  sometimes
i get 3 columns, sometimes i get 50...( i think this is because of the
random partitioner)

i had thought about this structure:

[messages][nickname][message id => message data]
[chatrooms][room_name][message id]

this way i can pull all messages a user ever posted, not specific to a
room.  what i haven't been able to do so far is print the timestamp on the
row or column.  does this have to be explicitly added somewhere or can it be
returned as part of a 'get' request?

-sd


On Tue, Feb 15, 2011 at 2:12 PM, Michal Augustýn
<au...@gmail.com>wrote:

> The schema design depends on chatrooms/users/messages numbers. I.e. you can
> have one CF, where key is chatroom, column name is username, column value is
> the message and message time is the same as column timestamp.
> You can add day-timestamp to the chatroom name to avoid large rows.
>
> Augi
>
> 2011/2/15 Andrey V. Panov <pa...@gmail.com>
>
> I never did it. But I suppose you can use "chatroom name" as key and store
>> messages & nicks as columns in JSON and timestamp as columnName.
>>
>
>


-- 
Sasha Dolgy
sasha.dolgy@gmail.com

Re: online chat scenario

Posted by Michal Augustýn <au...@gmail.com>.
The schema design depends on chatrooms/users/messages numbers. I.e. you can
have one CF, where key is chatroom, column name is username, column value is
the message and message time is the same as column timestamp.
You can add day-timestamp to the chatroom name to avoid large rows.

Augi

2011/2/15 Andrey V. Panov <pa...@gmail.com>

> I never did it. But I suppose you can use "chatroom name" as key and store
> messages & nicks as columns in JSON and timestamp as columnName.
>

Re: online chat scenario

Posted by "Andrey V. Panov" <pa...@gmail.com>.
I never did it. But I suppose you can use "chatroom name" as key and store
messages & nicks as columns in JSON and timestamp as columnName.