You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Mark <st...@gmail.com> on 2010/07/27 17:14:25 UTC

Cassandra vs MongoDB

Can someone quickly explain the differences between the two? Other than 
the fact that MongoDB supports ad-hoc querying I don't know whats 
different. It also appears (using google trends) that MongoDB seems to 
be growing while Cassandra is dying off. Is this the case?

Thanks for the help

Re: Cassandra vs MongoDB

Posted by Jeff Hammerbacher <ha...@cloudera.com>.
Having participated in the design of a few of these systems being mentioned,
I'll chime in here and point out that the combination of Flume and Hive
makes CDH3 very useful for log processing and that use case is directly in
the wheelhouse of the system, especially for large collections of log files
(as search logs tend to be).

On Wed, Jul 28, 2010 at 2:59 PM, Jeremy Hanna <je...@gmail.com>wrote:

> > "As a result, we designed and built Flume...
> > (I wonder if this could deliver into Cassanda :) )
>
>
> Yes - apparently it's pretty easy to do - I was thinking of doing it but
> haven't found the time yet.
>
> https://issues.cloudera.org//browse/FLUME-20
>
> On Jul 28, 2010, at 4:30 PM, Aaron Morton wrote:
>
> >
> >> If you are looking to store web logs and then do ad hoc queries you
> might/should be using Hadoop (depending on how big your logs are)
> >
> > I agree, take a look at the Cloudera Hadopp 3 CDH3, they include an app
> called Flume for moving data...
> >
> > "As a result, we designed and built Flume. Flume is a distributed service
> that makes it very easy to collect and aggregate your data into a persistent
> store such as HDFS. Flume can read data from almost any source – log files,
> Syslog packets, the standard output of any Unix process – and can deliver it
> to a batch processing system like Hadoop or a real-time data store like
> HBase. All this can be configured dynamically from a single, central
> location – no more tedious configuration file editing and process
> restarting. Flume will collect the data from wherever existing applications
> are storing it, and whisk it away for further analysis and processing."
> >
> > (I wonder if this could deliver into Cassanda :) )
> >
> > If it's straight log file processing Hadoop may be a better fit.
> >
> > Aaron
>
>

Re: Cassandra vs MongoDB

Posted by Jeremy Hanna <je...@gmail.com>.
> "As a result, we designed and built Flume...
> (I wonder if this could deliver into Cassanda :) )


Yes - apparently it's pretty easy to do - I was thinking of doing it but haven't found the time yet.

https://issues.cloudera.org//browse/FLUME-20

On Jul 28, 2010, at 4:30 PM, Aaron Morton wrote:

> 
>> If you are looking to store web logs and then do ad hoc queries you might/should be using Hadoop (depending on how big your logs are)
>  
> I agree, take a look at the Cloudera Hadopp 3 CDH3, they include an app called Flume for moving data...
> 
> "As a result, we designed and built Flume. Flume is a distributed service that makes it very easy to collect and aggregate your data into a persistent store such as HDFS. Flume can read data from almost any source – log files, Syslog packets, the standard output of any Unix process – and can deliver it to a batch processing system like Hadoop or a real-time data store like HBase. All this can be configured dynamically from a single, central location – no more tedious configuration file editing and process restarting. Flume will collect the data from wherever existing applications are storing it, and whisk it away for further analysis and processing."
> 
> (I wonder if this could deliver into Cassanda :) )
> 
> If it's straight log file processing Hadoop may be a better fit.
> 
> Aaron


Re: Cassandra vs MongoDB

Posted by Aaron Morton <aa...@thelastpickle.com>.
> If you are looking to store web logs and then do ad hoc queries you might/should be using Hadoop (depending on how big your logs are)
 
I agree, take a look at the Cloudera Hadopp 3 CDH3, they include an app called Flume for moving data...

"As a result, we designed and built Flume. Flume is a distributed service that makes it very easy to collect and aggregate your data into a persistent store such as HDFS. Flume can read data from almost any source – log files, Syslog packets, the standard output of any Unix process – and can deliver it to a batch processing system like Hadoop or a real-time data store like HBase. All this can be configured dynamically from a single, central location – no more tedious configuration file editing and process restarting. Flume will collect the data from wherever existing applications are storing it, and whisk it away for further analysis and processing."

(I wonder if this could deliver into Cassanda :) )

If it's straight log file processing Hadoop may be a better fit.

Aaron 

Re: Cassandra vs MongoDB

Posted by Joseph Stein <cr...@gmail.com>.
If you are looking to store web logs and then do ad hoc queries you
might/should be using Hadoop (depending on how big your logs are)

While MongoDB has MapReduce (built in) it is there to simulate SQL GROUP BY
and not for large scale analytics by any means.

MongoDB uses a global read/write lock per operation. general and
index-assisted reads are ultra-fast in mongo, but a bigger map/reduce or
group call will block other requests until complete, possibly causing
traffic to back up. because of that global lock, *all writes block*, too.

Cassandra is much more durable but from an architecture perspective keystore
vs document store could be weighed (on smaller traffic systems that do not
need higher level big data scale & durability)

If you have lots of data then MongoDB will eventually become a consistent
problem.

Here is a nice article on MongoDB in a larger scale of implementation
http://www.mikealrogers.com/2010/07/mongodb-performance-durability/ with
some conclusions which also talks about Cassandra, Redis & CouchDB.

MongoDB has made a lot of improvements over time but Cassandra is
*VERY*active also and continues to deliver great features and not
driven by a
corporation but rather the community.

MongoDB is backed and started by a company for them to make money using the
open source modal instead of Cassandra which started to solve a difficult
problem at facebook and then supported completely open source and THEN
having a company later pop up (Riptano) to support it making their money
using the open source modal... I say this to express the drives of the 2
servers & open source projects/communities are different.

You might see Google trends for MongoDB going up because folks jump into
because of the marketing and then have issues and try to find solutions =8^)

Now, I am not bashing MongoDB by an sorts it is a good database (so is
MySQL) but it is all about use cases AND the implementation/use/load.  Apply
the right solution to the problem it fits in all respects!

For logs (speaking with my architect hat on) I see no reason why you would
want to hold that in a document structure but at the same time you might not
have that many logs so you can get a lot of benefit from MongoDB M/R and
such....But honestly if it is less than 1TB you might be fine JUST using
MySQL.

It is all relative.

Lastly, and back to Hadoop, Cassandra has a nice implementation so that when
you load your data into Cassandra you can pull it out to MapReduce it
http://allthingshadoop.com/2010/04/24/running-hadoop-mapreduce-with-cassandra-nosql/

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop
*/

On Tue, Jul 27, 2010 at 4:05 PM, Mark <st...@gmail.com> wrote:

> On 7/27/10 12:42 PM, Dave Gardner wrote:
>
>> There are quite a few differences. Ultimately it depends on your use
>> case! For example Mongo has a limit on the maximum "document" size of
>> 4MB, whereas with Cassandra you are not really limited to the volume
>> of data/columns per-row (I think there maybe a limit of 2GB perhaps;
>> basically none)
>>
>> Another point re: search volumes is that mongo has been actively
>> promoting over the last few months. I recently attended an excellent
>> conference day in London which was very cheap; tickets probably didn't
>> cover the costs. I guess this is part of their strategy. Eg: encourage
>> adoption.
>>
>> Dave
>>
>> On Tuesday, July 27, 2010, Jonathan Shook<js...@gmail.com>  wrote:
>>
>>
>>> Also, google trends is only a measure of what terms people are
>>> searching for. To equate this directly to growth would be misleading.
>>>
>>>  Tue, Jul 27, 2010 at 12:27 PM, Drew Dahlke<dr...@bronto.com>
>>>  wrote:
>>>
>>>
>>>> There's a good post on stackoverflow comparing the two
>>>> http://stackoverflow.com/questions/2892729/mongodb-vs-cassandra
>>>>
>>>> It seems to me that both projects have pretty vibrant communities behind
>>>> them.
>>>>
>>>> On Tue, Jul 27, 2010 at 11:14 AM, Mark<st...@gmail.com>
>>>>  wrote:
>>>>
>>>>
>>>>> Can someone quickly explain the differences between the two? Other than
>>>>> the
>>>>> fact that MongoDB supports ad-hoc querying I don't know whats
>>>>> different. It
>>>>> also appears (using google trends) that MongoDB seems to be growing
>>>>> while
>>>>> Cassandra is dying off. Is this the case?
>>>>>
>>>>> Thanks for the help
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>> Well my initial use case would be to store our search logs and perform
> some ad-hoc querying which I know is a win for Mongo. However I don't think
> I fully understand how to build indexes in Cassandra so maybe its just an
> issue of ignorance. I know going forward though we would be expanding it to
> house our per item translations.
>



--

Re: Cassandra vs MongoDB

Posted by Mark <st...@gmail.com>.
On 7/27/10 12:42 PM, Dave Gardner wrote:
> There are quite a few differences. Ultimately it depends on your use
> case! For example Mongo has a limit on the maximum "document" size of
> 4MB, whereas with Cassandra you are not really limited to the volume
> of data/columns per-row (I think there maybe a limit of 2GB perhaps;
> basically none)
>
> Another point re: search volumes is that mongo has been actively
> promoting over the last few months. I recently attended an excellent
> conference day in London which was very cheap; tickets probably didn't
> cover the costs. I guess this is part of their strategy. Eg: encourage
> adoption.
>
> Dave
>
> On Tuesday, July 27, 2010, Jonathan Shook<js...@gmail.com>  wrote:
>    
>> Also, google trends is only a measure of what terms people are
>> searching for. To equate this directly to growth would be misleading.
>>
>>   Tue, Jul 27, 2010 at 12:27 PM, Drew Dahlke<dr...@bronto.com>  wrote:
>>      
>>> There's a good post on stackoverflow comparing the two
>>> http://stackoverflow.com/questions/2892729/mongodb-vs-cassandra
>>>
>>> It seems to me that both projects have pretty vibrant communities behind them.
>>>
>>> On Tue, Jul 27, 2010 at 11:14 AM, Mark<st...@gmail.com>  wrote:
>>>        
>>>> Can someone quickly explain the differences between the two? Other than the
>>>> fact that MongoDB supports ad-hoc querying I don't know whats different. It
>>>> also appears (using google trends) that MongoDB seems to be growing while
>>>> Cassandra is dying off. Is this the case?
>>>>
>>>> Thanks for the help
>>>>
>>>>          
>>>        
>>      
Well my initial use case would be to store our search logs and perform 
some ad-hoc querying which I know is a win for Mongo. However I don't 
think I fully understand how to build indexes in Cassandra so maybe its 
just an issue of ignorance. I know going forward though we would be 
expanding it to house our per item translations.

Re: Cassandra vs MongoDB

Posted by Dave Gardner <da...@imagini.net>.
There are quite a few differences. Ultimately it depends on your use
case! For example Mongo has a limit on the maximum "document" size of
4MB, whereas with Cassandra you are not really limited to the volume
of data/columns per-row (I think there maybe a limit of 2GB perhaps;
basically none)

Another point re: search volumes is that mongo has been actively
promoting over the last few months. I recently attended an excellent
conference day in London which was very cheap; tickets probably didn't
cover the costs. I guess this is part of their strategy. Eg: encourage
adoption.

Dave

On Tuesday, July 27, 2010, Jonathan Shook <js...@gmail.com> wrote:
> Also, google trends is only a measure of what terms people are
> searching for. To equate this directly to growth would be misleading.
>
>  Tue, Jul 27, 2010 at 12:27 PM, Drew Dahlke <dr...@bronto.com> wrote:
>> There's a good post on stackoverflow comparing the two
>> http://stackoverflow.com/questions/2892729/mongodb-vs-cassandra
>>
>> It seems to me that both projects have pretty vibrant communities behind them.
>>
>> On Tue, Jul 27, 2010 at 11:14 AM, Mark <st...@gmail.com> wrote:
>>> Can someone quickly explain the differences between the two? Other than the
>>> fact that MongoDB supports ad-hoc querying I don't know whats different. It
>>> also appears (using google trends) that MongoDB seems to be growing while
>>> Cassandra is dying off. Is this the case?
>>>
>>> Thanks for the help
>>>
>>
>

Re: Cassandra vs MongoDB

Posted by Jonathan Shook <js...@gmail.com>.
Also, google trends is only a measure of what terms people are
searching for. To equate this directly to growth would be misleading.

 Tue, Jul 27, 2010 at 12:27 PM, Drew Dahlke <dr...@bronto.com> wrote:
> There's a good post on stackoverflow comparing the two
> http://stackoverflow.com/questions/2892729/mongodb-vs-cassandra
>
> It seems to me that both projects have pretty vibrant communities behind them.
>
> On Tue, Jul 27, 2010 at 11:14 AM, Mark <st...@gmail.com> wrote:
>> Can someone quickly explain the differences between the two? Other than the
>> fact that MongoDB supports ad-hoc querying I don't know whats different. It
>> also appears (using google trends) that MongoDB seems to be growing while
>> Cassandra is dying off. Is this the case?
>>
>> Thanks for the help
>>
>

Re: Cassandra vs MongoDB

Posted by Drew Dahlke <dr...@bronto.com>.
There's a good post on stackoverflow comparing the two
http://stackoverflow.com/questions/2892729/mongodb-vs-cassandra

It seems to me that both projects have pretty vibrant communities behind them.

On Tue, Jul 27, 2010 at 11:14 AM, Mark <st...@gmail.com> wrote:
> Can someone quickly explain the differences between the two? Other than the
> fact that MongoDB supports ad-hoc querying I don't know whats different. It
> also appears (using google trends) that MongoDB seems to be growing while
> Cassandra is dying off. Is this the case?
>
> Thanks for the help
>

Re: Cassandra vs MongoDB

Posted by Benjamin Black <b...@b3k.us>.
They have approximately nothing in common.  And, no, Cassandra is
definitely not dying off.

On Tue, Jul 27, 2010 at 8:14 AM, Mark <st...@gmail.com> wrote:
> Can someone quickly explain the differences between the two? Other than the
> fact that MongoDB supports ad-hoc querying I don't know whats different. It
> also appears (using google trends) that MongoDB seems to be growing while
> Cassandra is dying off. Is this the case?
>
> Thanks for the help
>