You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Chintana Wilamuna <ch...@gmail.com> on 2011/10/17 11:40:59 UTC

One CF vs several CFs

Hi,

Does anyone have an idea about the pros/cons with modeling your data
in the following way. First is you write all your data within a single
CF. Using the infamous blog example,

Posts = { // CF
	slug-1: { // key to the row inside CF
		title: "...",
		body: "...",
		tag1: "...",
		tag2: "...",
		...
		tagN: "...",
		comment1: "...",
		comment2: "...",
		commentN: "..."
	},

	slug-2: {
		title: "...",
		body: "...",
		tag1: "...",
		tag2: "...",
		...
		tagN: "...",
		comment1: "...",
		comment2: "...",
		commentN: "..."
	}
}

Using this model, one has to do slice queries to retrieve tags,
comments for a given blog post but all tag, comment info for a
particular post is available with a single read.

The other method, breaking down tags and comments into their own CFs.

Posts = { // CF
	slug-1: { // key to the row inside CF
		title: "...",
		body: "..."
	},
	
	slug-2: {
		title: "...",
		body: "..."
	}
}

Tags = { // CF
	tag1: { // key to the row inside CF
		timestamp1: slug-1,
		timestamp2: slug-2
	},

	tag2: {
		timestamp1: slug-2
	}
}

Comments = { // CF
	slug-1: { // key to the row inside CF
		timestamp1: "comment1 ...",
		timestamp2: "comment2 ...",
		timestamp3: "comment3 ..."
	},

	slug-2: {
		timestamp1: "comment1 ..."

	}
}

Here, you have to do a couple of queries when you're trying to get
tags and comments for a particular post.

Does the answer is, it depends or is there an inherent inefficiency
associated with method 1 regardless of the data you're trying to
model?

Thanks in advance,

    -Chintana

-- 
blog: engwar.com/
photos: flickr.com/photos/chintana
linkedin: linkedin.com/in/engwar
facebook: facebook.com/chintana
twitter: twitter.com/std_err

Re: One CF vs several CFs

Posted by aaron morton <aa...@thelastpickle.com>.

It depends on what your workload is and how you want to read the data. 

If you want to get all the data for an article every time, and the number of comments is not huge go with option 1.

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 18/10/2011, at 2:45 AM, Konstantin Naryshkin wrote:

> Method 1 may also result in very wide rows if you have lots and lots of tags and comments. This is a very drastic inefficiency for Cassadra (but again, it depends on your data).
> 
> On Mon, Oct 17, 2011 at 05:40, Chintana Wilamuna <ch...@gmail.com> wrote:
> Hi,
> 
> Does anyone have an idea about the pros/cons with modeling your data
> in the following way. First is you write all your data within a single
> CF. Using the infamous blog example,
> 
> Posts = { // CF
>        slug-1: { // key to the row inside CF
>                title: "...",
>                body: "...",
>                tag1: "...",
>                tag2: "...",
>                ...
>                tagN: "...",
>                comment1: "...",
>                comment2: "...",
>                commentN: "..."
>        },
> 
>        slug-2: {
>                title: "...",
>                body: "...",
>                tag1: "...",
>                tag2: "...",
>                ...
>                tagN: "...",
>                comment1: "...",
>                comment2: "...",
>                commentN: "..."
>        }
> }
> 
> Using this model, one has to do slice queries to retrieve tags,
> comments for a given blog post but all tag, comment info for a
> particular post is available with a single read.
> 
> The other method, breaking down tags and comments into their own CFs.
> 
> Posts = { // CF
>        slug-1: { // key to the row inside CF
>                title: "...",
>                body: "..."
>        },
> 
>        slug-2: {
>                title: "...",
>                body: "..."
>        }
> }
> 
> Tags = { // CF
>        tag1: { // key to the row inside CF
>                timestamp1: slug-1,
>                timestamp2: slug-2
>        },
> 
>        tag2: {
>                timestamp1: slug-2
>        }
> }
> 
> Comments = { // CF
>        slug-1: { // key to the row inside CF
>                timestamp1: "comment1 ...",
>                timestamp2: "comment2 ...",
>                timestamp3: "comment3 ..."
>        },
> 
>        slug-2: {
>                timestamp1: "comment1 ..."
> 
>        }
> }
> 
> Here, you have to do a couple of queries when you're trying to get
> tags and comments for a particular post.
> 
> Does the answer is, it depends or is there an inherent inefficiency
> associated with method 1 regardless of the data you're trying to
> model?
> 
> Thanks in advance,
> 
>    -Chintana
> 
> --
> blog: engwar.com/
> photos: flickr.com/photos/chintana
> linkedin: linkedin.com/in/engwar
> facebook: facebook.com/chintana
> twitter: twitter.com/std_err
>

Re: One CF vs several CFs

Posted by Konstantin Naryshkin <ko...@a-bb.net>.

Method 1 may also result in very wide rows if you have lots and lots of tags
and comments. This is a very drastic inefficiency for Cassadra (but again,
it depends on your data).

On Mon, Oct 17, 2011 at 05:40, Chintana Wilamuna <ch...@gmail.com>wrote:

> Hi,
>
> Does anyone have an idea about the pros/cons with modeling your data
> in the following way. First is you write all your data within a single
> CF. Using the infamous blog example,
>
> Posts = { // CF
>        slug-1: { // key to the row inside CF
>                title: "...",
>                body: "...",
>                tag1: "...",
>                tag2: "...",
>                ...
>                tagN: "...",
>                comment1: "...",
>                comment2: "...",
>                commentN: "..."
>        },
>
>        slug-2: {
>                title: "...",
>                body: "...",
>                tag1: "...",
>                tag2: "...",
>                ...
>                tagN: "...",
>                comment1: "...",
>                comment2: "...",
>                commentN: "..."
>        }
> }
>
> Using this model, one has to do slice queries to retrieve tags,
> comments for a given blog post but all tag, comment info for a
> particular post is available with a single read.
>
> The other method, breaking down tags and comments into their own CFs.
>
> Posts = { // CF
>        slug-1: { // key to the row inside CF
>                title: "...",
>                body: "..."
>        },
>
>        slug-2: {
>                title: "...",
>                body: "..."
>        }
> }
>
> Tags = { // CF
>        tag1: { // key to the row inside CF
>                timestamp1: slug-1,
>                timestamp2: slug-2
>        },
>
>        tag2: {
>                timestamp1: slug-2
>        }
> }
>
> Comments = { // CF
>        slug-1: { // key to the row inside CF
>                timestamp1: "comment1 ...",
>                timestamp2: "comment2 ...",
>                timestamp3: "comment3 ..."
>        },
>
>        slug-2: {
>                timestamp1: "comment1 ..."
>
>        }
> }
>
> Here, you have to do a couple of queries when you're trying to get
> tags and comments for a particular post.
>
> Does the answer is, it depends or is there an inherent inefficiency
> associated with method 1 regardless of the data you're trying to
> model?
>
> Thanks in advance,
>
>    -Chintana
>
> --
> blog: engwar.com/
> photos: flickr.com/photos/chintana
> linkedin: linkedin.com/in/engwar
> facebook: facebook.com/chintana
> twitter: twitter.com/std_err
>