You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Sam Hodgson <ho...@hotmail.com> on 2011/01/25 00:46:42 UTC

Schema Question

Hi all,

Im brand new to Cassandra - im migrating from MySql for a large forum site and would be grateful if anyone can give me some basic pointers on schema design, or any recommended documentation.  

The example used in http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model is very close if not exactly what I need for my main CF:
<!--
    ColumnFamily: BlogEntries
    This is where all the blog entries will go:

    Row Key +> post's slug (the seo friendly portion of the uri)
    Column Name: an attribute for the entry (title, body, etc)
    Column Value: value of the associated attribute

    Access: grab an entry by slug (always fetch all Columns for Row)

    fyi: tags is a denormalization... its a comma separated list of tags.
    im not using json in order to not interfere with our
    notation but obviously you could use anything as long as your app
    knows how to deal w/ it

    BlogEntries : { // CF
        i-got-a-new-guitar : { // row key - the unique "slug" of the entry.
            title: This is a blog entry about my new, awesome guitar,
            body: this is a cool entry. etc etc yada yada
            author: Arin Sarkissian  // a row key into the Authors CF
            tags: life,guitar,music  // comma sep list of tags (basic denormalization)
            pubDate: 1250558004      // unixtime for publish date
            slug: i-got-a-new-guitar
        },
        // all other entries
        another-cool-guitar : {
            ...
            tags: guitar,
            slug: another-cool-guitar
        },
        scream-is-the-best-movie-ever : {
            ...
            tags: movie,horror,
            slug: scream-is-the-best-movie-ever
        }
    }
-->
<ColumnFamily CompareWith="BytesType" Name="BlogEntries"/>

How well would this scale? Say you are storing 5 million posts and looking to scale that up 
would it be better to segment them into several column families and if so to what extent? 

I could create column families to store posts for each category however i'd end up with thousands of CF's.  
Saying that the data would then be stored in a very sorted manner for querying/presenting.

My db is very write heavy and growing fast, Cassandra sounds like the best solution.
Any advice is greatly appreciated!! 

Thanks

Sam