You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Simon Reavely <si...@gmail.com> on 2010/10/03 17:02:02 UTC

Re: Schema question

Two questions:
1. So this compaction challenge is a CPU issue or a disk IO issue in your case? 
2. In other places people have recommended adjustments from the defaults to control compaction overhead...did you adjust or experiment with how to control compaction?




Simon Reavely


On Sep 21, 2010, at 8:48 AM, Juho Mäkinen <ju...@gmail.com> wrote:

> Not really. The schema has worked without any problems and we haven't
> had any problems with it. We're running a five node cassandra cluster
> behind the system (it has also other uses than just this particular
> application like it stores all our blog contents and bunch of other
> data). There are about 120 000 new messages each day and the chat
> window is displayed about 1200 times per second.
> 
> It's worth to note that cassandra cluster can't be just one or two
> machines to get best availability because once a node starts to do
> compacting it will severely hurt the node performance. Thus you need
> to have enough nodes that the probability that all nodes are doing
> compaction is low enough. I've implemented a php wrapper around the
> thrift api which retries operation to another server until the result
> can be obtained. The library is available at github:
> http://github.com/dynamoid/cassandra-utilities
> 
> We have monitored and stored performance data for each cassandra
> request we have done. As the results are interesting I'll be posting
> them into another post into this mailing list within a moment.
> 
> - Juho Mäkinen
> 
> 
> 
> 
> On Tue, Sep 21, 2010 at 1:20 PM, Simon Reavely <si...@gmail.com> wrote:
>> Thanks for the writeup...good stuff!
>> Any lessons learnt you'd like to share or challenges that persist?
>> 
>> 
>> Simon Reavely
>> 
>> 
>> On Sep 20, 2010, at 6:37 AM, Juho Mäkinen <ju...@gmail.com> wrote:
>> 
>>> We have built a facebook style "messenger" into our web site which
>>> uses cassandra as storage backend with two column families:
>>> TalkMessages and TalkLastMessages. I've uploaded a screenshot showing
>>> the feature in action to
>>> http://img138.imageshack.us/img138/3807/talkexample.jpg
>>> 
>>> TalkMessages contains each message between two participants. The key
>>> is a string built from the two users uids "$smaller_uid:$bigger_uid".
>>> Each column inside this CF contains a single message. The column name
>>> is the message timestamp in microseconds since epoch stored as
>>> LongType. The column value is a JSON encoded string containing
>>> following fields: sender_uid, target_uid, msg.
>>> 
>>> This results in following structure inside the column family.
>>> 
>>> "2249:9111" => [
>>>  12345678 : { sender_uid : 2249, target_uid : 9111, msg : "Hello, how
>>> are you?" },
>>>  12345679 : { sender_uid : 9111, target_uid : 2249, msg : "I'm fine, thanks" }
>>> ]
>>> 
>>> TalkLastMessages is used to quickly fetch users talk partners, the
>>> last message which was sent between the peers and other similar data.
>>> This allows us to quickly fetch all needed data which is needed to
>>> display a "main view" for all online friends with just one query to
>>> cassandra. This column family uses the user uid as is key. Each column
>>> represents a talk partner whom the user has been talking to and it
>>> uses the talk partner uid as the column name. Column value is a json
>>> packed structure which contains following fields:
>>> - last message timestamp: microseconds since epoch when a message was
>>> last sent between these two users.
>>> - unread timestamp : microseconds since epoch when the first unread
>>> message was sent between these two users.
>>> - unread : counter how many unread messages there are.
>>> - last message : last message between these two users.
>>> 
>>> This results in following structure inside the column family for these
>>> two example users: 2249 and 9111.
>>> 
>>> "2249" => [
>>>  9111 : { last_message_timestamp : 12345679, unread_timestamp :
>>> 12345679, unread : 1, last_message: "I'm fine, thanks" }
>>> 
>>> ],
>>> "9111" => [
>>>  2249 : { last_message_timestamp :  12345679, unread_timestamp :
>>> 12345679, unread : 0, last_message: "I'm fine, thanks" }
>>> ]
>>> 
>>> Displaying chat (this happends on every page load, needs to be fast)
>>> 1) Fetch all columns from TalkLastMessages for the user
>>> 
>>> Display messages history between two participants:
>>> 1) Fetch last n columns from TalkMessages for the relevant
>>> "$smaller_uid:$bigger_uid" row.
>>> 
>>> Mark all sent messages from another participant as read (when you read
>>> the messages)
>>> 1) Get column $sender_uid from row $reader_uid from TalkLastMessages
>>> 2) Update the JSON payload and insert the column back
>>> 
>>> Sending message involves the following operations:
>>> 1) Insert new column to TalkMessages
>>> 2) Fetch relevant column from TalkLastMessages from $target_uid row
>>> with $sender_uid column
>>> 3) Update the column json payload and insert it back to TalkLastMessages
>>> 4) Fetch relevant column from TalkLastMessages from $sender_uid row
>>> with $target_uid column
>>> 5) Update the column json payload and insert it back to TalkLastMessages
>>> 
>>> There are also other operations and the actual payload is a bit more complex.
>>> 
>>> I'm happy to answer questions if somebody is interested :)
>>> 
>>> - Juho Mäkinen
>>> 
>>> 
>>> 
>>> On Mon, Sep 20, 2010 at 12:57 PM, Morten Wegelbye Nissen <mw...@monit.dk> wrote:
>>>>  Hello List,
>>>> 
>>>> No matter where you read, you almost every-where read the the noSQL
>>>> datascema is completely different from the relational way - and after a
>>>> little insight in cassandra everyone can 2nd that.
>>>> 
>>>> But I miss to see some real-life examples on how a real system can be
>>>> modelled. Lets take the example for a system where users can send messages
>>>> to each other. ( Completely imaginary, noone would use cassandra for a
>>>> mailsystem :) )
>>>> 
>>>> If one should create such a system, what CF's would be used? And how would
>>>> you per example find all not read messages?
>>>> 
>>>> ./Morten
>>>> 
>> 

Re: Schema question

Posted by Edward Capriolo <ed...@gmail.com>.
On Sun, Oct 3, 2010 at 11:02 AM, Simon Reavely <si...@gmail.com> wrote:
> Two questions:
> 1. So this compaction challenge is a CPU issue or a disk IO issue in your case?
> 2. In other places people have recommended adjustments from the defaults to control compaction overhead...did you adjust or experiment with how to control compaction?
>
>
>
>
> Simon Reavely
>
>
> On Sep 21, 2010, at 8:48 AM, Juho Mäkinen <ju...@gmail.com> wrote:
>
>> Not really. The schema has worked without any problems and we haven't
>> had any problems with it. We're running a five node cassandra cluster
>> behind the system (it has also other uses than just this particular
>> application like it stores all our blog contents and bunch of other
>> data). There are about 120 000 new messages each day and the chat
>> window is displayed about 1200 times per second.
>>
>> It's worth to note that cassandra cluster can't be just one or two
>> machines to get best availability because once a node starts to do
>> compacting it will severely hurt the node performance. Thus you need
>> to have enough nodes that the probability that all nodes are doing
>> compaction is low enough. I've implemented a php wrapper around the
>> thrift api which retries operation to another server until the result
>> can be obtained. The library is available at github:
>> http://github.com/dynamoid/cassandra-utilities
>>
>> We have monitored and stored performance data for each cassandra
>> request we have done. As the results are interesting I'll be posting
>> them into another post into this mailing list within a moment.
>>
>> - Juho Mäkinen
>>
>>
>>
>>
>> On Tue, Sep 21, 2010 at 1:20 PM, Simon Reavely <si...@gmail.com> wrote:
>>> Thanks for the writeup...good stuff!
>>> Any lessons learnt you'd like to share or challenges that persist?
>>>
>>>
>>> Simon Reavely
>>>
>>>
>>> On Sep 20, 2010, at 6:37 AM, Juho Mäkinen <ju...@gmail.com> wrote:
>>>
>>>> We have built a facebook style "messenger" into our web site which
>>>> uses cassandra as storage backend with two column families:
>>>> TalkMessages and TalkLastMessages. I've uploaded a screenshot showing
>>>> the feature in action to
>>>> http://img138.imageshack.us/img138/3807/talkexample.jpg
>>>>
>>>> TalkMessages contains each message between two participants. The key
>>>> is a string built from the two users uids "$smaller_uid:$bigger_uid".
>>>> Each column inside this CF contains a single message. The column name
>>>> is the message timestamp in microseconds since epoch stored as
>>>> LongType. The column value is a JSON encoded string containing
>>>> following fields: sender_uid, target_uid, msg.
>>>>
>>>> This results in following structure inside the column family.
>>>>
>>>> "2249:9111" => [
>>>>  12345678 : { sender_uid : 2249, target_uid : 9111, msg : "Hello, how
>>>> are you?" },
>>>>  12345679 : { sender_uid : 9111, target_uid : 2249, msg : "I'm fine, thanks" }
>>>> ]
>>>>
>>>> TalkLastMessages is used to quickly fetch users talk partners, the
>>>> last message which was sent between the peers and other similar data.
>>>> This allows us to quickly fetch all needed data which is needed to
>>>> display a "main view" for all online friends with just one query to
>>>> cassandra. This column family uses the user uid as is key. Each column
>>>> represents a talk partner whom the user has been talking to and it
>>>> uses the talk partner uid as the column name. Column value is a json
>>>> packed structure which contains following fields:
>>>> - last message timestamp: microseconds since epoch when a message was
>>>> last sent between these two users.
>>>> - unread timestamp : microseconds since epoch when the first unread
>>>> message was sent between these two users.
>>>> - unread : counter how many unread messages there are.
>>>> - last message : last message between these two users.
>>>>
>>>> This results in following structure inside the column family for these
>>>> two example users: 2249 and 9111.
>>>>
>>>> "2249" => [
>>>>  9111 : { last_message_timestamp : 12345679, unread_timestamp :
>>>> 12345679, unread : 1, last_message: "I'm fine, thanks" }
>>>>
>>>> ],
>>>> "9111" => [
>>>>  2249 : { last_message_timestamp :  12345679, unread_timestamp :
>>>> 12345679, unread : 0, last_message: "I'm fine, thanks" }
>>>> ]
>>>>
>>>> Displaying chat (this happends on every page load, needs to be fast)
>>>> 1) Fetch all columns from TalkLastMessages for the user
>>>>
>>>> Display messages history between two participants:
>>>> 1) Fetch last n columns from TalkMessages for the relevant
>>>> "$smaller_uid:$bigger_uid" row.
>>>>
>>>> Mark all sent messages from another participant as read (when you read
>>>> the messages)
>>>> 1) Get column $sender_uid from row $reader_uid from TalkLastMessages
>>>> 2) Update the JSON payload and insert the column back
>>>>
>>>> Sending message involves the following operations:
>>>> 1) Insert new column to TalkMessages
>>>> 2) Fetch relevant column from TalkLastMessages from $target_uid row
>>>> with $sender_uid column
>>>> 3) Update the column json payload and insert it back to TalkLastMessages
>>>> 4) Fetch relevant column from TalkLastMessages from $sender_uid row
>>>> with $target_uid column
>>>> 5) Update the column json payload and insert it back to TalkLastMessages
>>>>
>>>> There are also other operations and the actual payload is a bit more complex.
>>>>
>>>> I'm happy to answer questions if somebody is interested :)
>>>>
>>>> - Juho Mäkinen
>>>>
>>>>
>>>>
>>>> On Mon, Sep 20, 2010 at 12:57 PM, Morten Wegelbye Nissen <mw...@monit.dk> wrote:
>>>>>  Hello List,
>>>>>
>>>>> No matter where you read, you almost every-where read the the noSQL
>>>>> datascema is completely different from the relational way - and after a
>>>>> little insight in cassandra everyone can 2nd that.
>>>>>
>>>>> But I miss to see some real-life examples on how a real system can be
>>>>> modelled. Lets take the example for a system where users can send messages
>>>>> to each other. ( Completely imaginary, noone would use cassandra for a
>>>>> mailsystem :) )
>>>>>
>>>>> If one should create such a system, what CF's would be used? And how would
>>>>> you per example find all not read messages?
>>>>>
>>>>> ./Morten
>>>>>
>>>
>
You can adjust the compaction thread priority, or up memtable sizes,
but if you are doing a high write volume "Judgement day is inevitable"
I mean "compaction is inevitable". If you up your nodes from 2->4 and
the other settings stay the same you will get less inventive
compaction as the data per node is lower.