You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Bill Hastings <bl...@gmail.com> on 2010/08/18 00:59:49 UTC

Map/Reduce over Cassandra

Hi All

How performant is M/R on Cassandra when compared to running it on HDFS?
Anyone have any numbers they can share? Specifically how much of data the
M/R job was run against and what was the throughput etc. Any information
would be very helpful.

-- 
Cheers
Bill

Re: Map/Reduce over Cassandra

Posted by Drew Dahlke <dr...@bronto.com>.

Hey Bill,

A few months ago we did an experiment with 5 hadoop nodes pulling from
4 cass nodes. It was pulling down 1 column family with 8 small columns
& just dumping the raw data to hdfs. It was cycling through around 17K
map tasks per sec. The machines weren't being taxed too hard, so I'm
sure there's some concurrency tuning we could have done to speed that
up. Unfortunately we don't have that same data on HDFS yet, so I can't
really give a direct comparison.

Hope that helps. I'm curious what others have seen as well.

On Tue, Aug 17, 2010 at 6:59 PM, Bill Hastings <bl...@gmail.com> wrote:
> Hi All
> How performant is M/R on Cassandra when compared to running it on HDFS?
> Anyone have any numbers they can share? Specifically how much of data the
> M/R job was run against and what was the throughput etc. Any information
> would be very helpful.
>
> --
> Cheers
> Bill
>