You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Rakesh Rajan <ra...@gmail.com> on 2010/05/01 13:34:31 UTC

Design Query

I am evaluating cassandra to implement activity streams. We currently have
over 1000000 feeds with total entries exceeding 320000000 implemented using
redis ( ~320 entries / feed). Would like hear from the community on how to
use cassandra to solve the following cases:

   1. Ability to fetch entries by applying a few filters ( like show me only
   likes from a given user). This would include range query to support
   pagination. So this would mean indices on a few columns like the feed id,
   feed type etc.
   2. We have around 3 machines with 4GB RAM for this purpose and thinking
   of having replication factor 2. Would 4GB * 3 be enough for cassandra for
   this kind of data? I read that cassandra does not keep all the data in
   memory but want to be sure that we have the right server config to handle
   this data using cassandra.

Thanks,
Rakesh

Re: Design Query

Posted by Jonathan Ellis <jb...@gmail.com>.

On Sat, May 1, 2010 at 6:34 AM, Rakesh Rajan <ra...@gmail.com> wrote:
> I am evaluating cassandra to implement activity streams. We currently have
> over 1000000 feeds with total entries exceeding 320000000 implemented using
> redis ( ~320 entries / feed). Would like hear from the community on how to
> use cassandra to solve the following cases:
>
> Ability to fetch entries by applying a few filters ( like show me only likes
> from a given user). This would include range query to support pagination. So
> this would mean indices on a few columns like the feed id, feed type etc.

Sounds like you've got it: you need to denormalize in your app to
other CFs for things that you need "filtered" server-side.  Everything
else you have to filter client-side.

> We have around 3 machines with 4GB RAM for this purpose and thinking of
> having replication factor 2. Would 4GB * 3 be enough for cassandra for this
> kind of data? I read that cassandra does not keep all the data in memory but
> want to be sure that we have the right server config to handle this data
> using cassandra.

Depends on how much of the data is "hot."  Cassandra does not require
all memory to be in memory, but of course if you request data faster
than the disk can keep up then that will be your bottleneck.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Design Query

Posted by Dorin Dragutoiu <dd...@gmail.com>.

2. I have used the same configuration (3 machines with 4GB RAM) and I 
got an Out of memory error on compactation each time trying to compact 4 
x 128MB sstables. Tried different configuration incl Java Opts, same 
result. When I have used 16GB ram machine everything worked like a charm.

Pe 04.05.2010 12:28, vineet daniel a scris:
>
> As you havent specified all the details pertaining to filters and your 
> data layout (structure) at a very high level what i can suggest is 
> that you need to create a seperate CF for each filter.
>
>
> On Sat, May 1, 2010 at 5:04 PM, Rakesh Rajan <rakeshxp@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     I am evaluating cassandra to implement activity streams. We
>     currently have over 1000000 feeds with total entries exceeding
>     320000000 implemented using redis ( ~320 entries / feed). Would
>     like hear from the community on how to use cassandra to solve the
>     following cases:
>
>        1. Ability to fetch entries by applying a few filters ( like
>           show me only likes from a given user). This would include
>           range query to support pagination. So this would mean
>           indices on a few columns like the feed id, feed type etc.
>        2. We have around 3 machines with 4GB RAM for this purpose and
>           thinking of having replication factor 2. Would 4GB * 3 be
>           enough for cassandra for this kind of data? I read that
>           cassandra does not keep all the data in memory but want to
>           be sure that we have the right server config to handle this
>           data using cassandra.
>
>     Thanks,
>     Rakesh
>
>

Re: Design Query

Posted by vineet daniel <vi...@gmail.com>.

As you havent specified all the details pertaining to filters and your data
layout (structure) at a very high level what i can suggest is that you need
to create a seperate CF for each filter.


On Sat, May 1, 2010 at 5:04 PM, Rakesh Rajan <ra...@gmail.com> wrote:

> I am evaluating cassandra to implement activity streams. We currently have
> over 1000000 feeds with total entries exceeding 320000000 implemented using
> redis ( ~320 entries / feed). Would like hear from the community on how to
> use cassandra to solve the following cases:
>
>    1. Ability to fetch entries by applying a few filters ( like show me
>    only likes from a given user). This would include range query to support
>    pagination. So this would mean indices on a few columns like the feed id,
>    feed type etc.
>    2. We have around 3 machines with 4GB RAM for this purpose and thinking
>    of having replication factor 2. Would 4GB * 3 be enough for cassandra for
>    this kind of data? I read that cassandra does not keep all the data in
>    memory but want to be sure that we have the right server config to handle
>    this data using cassandra.
>
> Thanks,
> Rakesh
>