You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Jan Fajerski <fa...@informatik.hu-berlin.de> on 2012/08/02 12:12:57 UTC

Map Reduce Implementation

Hi,
I am researching map reduce implemenations for distributed database systems.
Is there a paper or documentation on how this is done in CouchDB? Or is the
source code the answer? If so would you be so kind to point me to a good start
in the source code?

Many thanks in advance,
Jan

Re: Map Reduce Implementation

Posted by Robert Newson <rn...@apache.org>.
Oops, this slipped by me earlier.

The engine isn't relevant, CouchDB supports pluggable view servers.

CouchDB currently builds a view group sequentially (though different view groups build concurrently), but after the BigCouch merge this will change to be parallel (up the number of shards of your database, which is configurable at database creation time).

B.


On 3 Aug 2012, at 19:10, Jens Alfke wrote:

> 
> On Aug 3, 2012, at 1:54 AM, Jan Fajerski <fa...@informatik.hu-berlin.de> wrote:
> 
>> What Javascript engine is used,
> 
> SpiderMonkey.
> 
>> how is workload distributed (if at all) 
> 
> I don't believe it is. Erlang itself is good at multiprocessing, but it runs SpiderMonkey in a separate process. I don't know whether it spawns more than one of these processes. In any case there is none of the fully-distributed mapping the way Google does it where map requests get farmed out to multiple computers. CouchDB doesn't use M/R as a way to process ridiculously large data sets, it uses it as a way to flexibly query semi-structured data.
> 
> [Disclaimer: I'm not an expert on the implementation of CouchDB.]
> 
>> May the dev list be a better place to ask such questions?
> 
> Most likely, yes.
> 
> —Jens


Re: Map Reduce Implementation

Posted by Jens Alfke <je...@couchbase.com>.
On Aug 3, 2012, at 1:54 AM, Jan Fajerski <fa...@informatik.hu-berlin.de> wrote:

> What Javascript engine is used,

SpiderMonkey.

> how is workload distributed (if at all) 

I don't believe it is. Erlang itself is good at multiprocessing, but it runs SpiderMonkey in a separate process. I don't know whether it spawns more than one of these processes. In any case there is none of the fully-distributed mapping the way Google does it where map requests get farmed out to multiple computers. CouchDB doesn't use M/R as a way to process ridiculously large data sets, it uses it as a way to flexibly query semi-structured data.

[Disclaimer: I'm not an expert on the implementation of CouchDB.]

> May the dev list be a better place to ask such questions?

Most likely, yes.

—Jens

Re: Map Reduce Implementation

Posted by Jan Fajerski <fa...@informatik.hu-berlin.de>.
Thanks Mathias,
it is a good starting point, What I am more interested though are the technical
details. What Javascript engine is used, who does the system deal with failures,
how is workload distributed (if at all) and other such questions.
May the dev list be a better place to ask such questions?

Best Jan

On Fri, Aug 03, 2012 at 07:47:52AM +0200, Mathias Leppich wrote:
> Hi Jan,
> 
> I think the Definitive Guide [1] gives a good understanding what's behind CouchDB's views, the indexes that are incrementally generated by a MapReduce functions written in JavaScript (or any other language).
> 
> [1] http://guide.couchdb.org/draft/views.html
> 
> - mathias
> 
> On Aug 2, 2012, at 12:12 , Jan Fajerski wrote:
> 
> > Hi,
> > I am researching map reduce implemenations for distributed database systems.
> > Is there a paper or documentation on how this is done in CouchDB? Or is the
> > source code the answer? If so would you be so kind to point me to a good start
> > in the source code?
> > 
> > Many thanks in advance,
> > Jan
> 

Re: Map Reduce Implementation

Posted by Mathias Leppich <ml...@muhqu.de>.
Hi Jan,

I think the Definitive Guide [1] gives a good understanding what's behind CouchDB's views, the indexes that are incrementally generated by a MapReduce functions written in JavaScript (or any other language).

[1] http://guide.couchdb.org/draft/views.html

- mathias

On Aug 2, 2012, at 12:12 , Jan Fajerski wrote:

> Hi,
> I am researching map reduce implemenations for distributed database systems.
> Is there a paper or documentation on how this is done in CouchDB? Or is the
> source code the answer? If so would you be so kind to point me to a good start
> in the source code?
> 
> Many thanks in advance,
> Jan