You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by aaron morton <aa...@thelastpickle.com> on 2011/04/01 00:41:55 UTC

Re: Revised: Data Modeling advise for Cassandra 0.8 (added #8)

It does not have a yaml file, so am assuming it's the default Random Partitioner. 

Aaron

On 1 Apr 2011, at 04:51, Drew Kutcharian wrote:

> Thanks Aaron,
> 
> I have already checked out Twissandra. I was mainly looking to see how Secondary Indexes can be used and how they effect Data Modeling. There doesn't seem to be a lot of coverage on them.
> 
> In addition, I couldn't tell what kind of Partitioner is Twissandra using and why.
> 
> cheers,
> 
> Drew
> 
> 
> On Mar 31, 2011, at 5:53 AM, aaron morton wrote:
> 
>> Drew, 
>> 	The Twissandra project is a twitter clone in cassandra, it may give you some insight into how things can be modelled https://github.com/thobbs/twissandra
>> 
>> 	If you are just starting then consider something like...
>> 
>> 	- CF to hold the user, their data and their network links  
>> 	- standard CF to hold a blog entry, key is a timestamp 
>> 	- standard CF to hold blog comments, each comment as a single column where the name is a long timestamp 
>> 	- standard CF to hold the blogs for a user, key is the user id and each column is the blog key 
>> 
>> Thats not a great schema but it's a simple starting point you can build on and refine using things like secondary indexes and doing more/less in the same CF. 
>> 
>> Good luck. 
>> Aaron
>> 
>> On 30 Mar 2011, at 15:13, Drew Kutcharian wrote:
>> 
>>> I'm pretty new to Cassandra and I would like to get your advice on modeling. The object model of the project that I'm working on will be pretty close to Blogger, Tumblr, etc. (or any other blogging website).
>>> Where you have Users, that each can have many Blogs and each Blog can have many comments. How would you model this efficiently considering:
>>> 
>>> 1) Be able to directly link to a User
>>> 2) Be able to directly link to a Blog
>>> 3) Be able to query and get all the Blogs for a User ordered by time created descending (new blogs first)
>>> 4) Be able to query and get all the Comments for each Blog ordered by time created ascending (old comments first)
>>> 5) Be able to link different Users to each other, as a network.
>>> 6) Have a well distributed hash so we don't end up with "hot" nodes, while the rest of the nodes are idle
>>> 7) It would be nice to show a User how many Blogs they have or how many comments are on a Blog, without iterating thru the whole dataset.
>>> NEW: 8) Be able to query for the most recently added Blogs. For example, Blogs added today, this week, this month, etc.
>>> 
>>> The target Cassandra version is 0.8 to use the Secondary Indexes. The goal is to be very efficient, so no Text keys. We were thinking of using Time Based 64bit ids, using Snowflake.
>>> 
>>> Thanks,
>>> 
>>> Drew
>> 
>