You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Joe Van Dyk <jo...@fixieconsulting.com> on 2009/10/02 00:31:20 UTC

cassandra as permanent datastore

Hi,

How stupid would it be to use cassandra as a permanent datastore?

Say I have a service that tracks clicks on ads running on other sites.
 I'd need to keep track of who clicked what when and where.  And run
reports on it.  Cassandra is attractive because of the built-in
replication and the high write availability.



-- 
Joe Van Dyk
http://fixieconsulting.com

Re: cassandra as permanent datastore

Posted by Mark Robson <ma...@gmail.com>.
2009/10/1 Joe Van Dyk <jo...@fixieconsulting.com>

> Hi,
>
> How stupid would it be to use cassandra as a permanent datastore?
>
> Say I have a service that tracks clicks on ads running on other sites.
>  I'd need to keep track of who clicked what when and where.  And run
> reports on it.  Cassandra is attractive because of the built-in
> replication and the high write availability
>


Not stupid at all really.

The individual clicks won't be interesting to anyone, so you'll want to
summarise the data after some time (say daily etc). You can store the
summaries in something which allows for easier reporting, but only put the
individual clicks in Cassandra.

Or you could even store the summaries in Cassandra.

However I'd say, don't use Cassandra until your data get either too big or
too much write load for one machine, which typically means > 3Tb and > 1000
inserts/sec

Until you get there, a single (potentially replicated) mysql instance would
do it and be far easier to program against.

One of the things Cassandra doesn't have right now is range_remove - I
suggested it however and it shouldn't be hard to  implement. A range remove
is pretty much vital for audit data (I'm assuming you're using ordered
partitioner), otherwise you'd have to execute individual remove commands for
every record - which would be a pain and probably very inefficient - much
more workload than the inserts.

Mark

Re: cassandra as permanent datastore

Posted by Joe Stump <jo...@joestump.net>.
For click tracking data I might look at Hadoop as well. It can handle  
the writes, replication, etc. along with being mechanisms for  
crunching large datasets built in (e.g. MapReduce).

That being said, you're working with data that, for the most part, it  
won't be the end of the world if you lose a click here or there.  
Cassandra, of course, will handle the HA writes and replication for you.

--Joe

On Oct 1, 2009, at 4:31 PM, Joe Van Dyk wrote:

> Hi,
>
> How stupid would it be to use cassandra as a permanent datastore?
>
> Say I have a service that tracks clicks on ads running on other sites.
> I'd need to keep track of who clicked what when and where.  And run
> reports on it.  Cassandra is attractive because of the built-in
> replication and the high write availability.
>
>
>
> -- 
> Joe Van Dyk
> http://fixieconsulting.com


Re: cassandra as permanent datastore

Posted by Jonathan Ellis <jb...@gmail.com>.
I would say, not stupid at all, with the caveat that it sounds like
you will want to use Hadoop for reporting and we are still working on
Hadoop support.  But that is probably a man-week or two of work, just
waiting for someone to need it badly enough to write it. :)

-Jonathan

On Thu, Oct 1, 2009 at 5:31 PM, Joe Van Dyk <jo...@fixieconsulting.com> wrote:
> Hi,
>
> How stupid would it be to use cassandra as a permanent datastore?
>
> Say I have a service that tracks clicks on ads running on other sites.
>  I'd need to keep track of who clicked what when and where.  And run
> reports on it.  Cassandra is attractive because of the built-in
> replication and the high write availability.
>
>
>
> --
> Joe Van Dyk
> http://fixieconsulting.com
>