You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Krzysztof Kulewski <kk...@student.uw.edu.pl> on 2008/05/01 00:52:45 UTC

map-reduce question: map done on many servers

Hello, 

I have a question for which I couldn't find solution on docs available 
online. Please help: 

Lets suppose that I have one couchdb server storing text documents. I want 
to do some map reduce on it. Map part take very long time per one doc, so I 
want to distribute maps between 20 map servers (btw. maybe unreliable). And 
then somebody will do the reduce part. 

As I can see, "CouchDB delegates computation of Views to external query 
servers." But I want to use 20 separate machines for doing map part to 
obtain 20x speed-up. 

How to distribute this map part? Is there any pretty solution out of the 
box? 

BR,
Krzysztof

Re: map-reduce question: map done on many servers

Posted by Jan Lehnardt <ja...@prima.de>.
On May 1, 2008, at 17:54, Krzysztof Kulewski wrote:
> couch_query_servers.erl is working in such a way:
> % send command and get a response.
> prompt(Port, Json) ->
>   writeline(Port, cjson:encode(Json)),
>   read_json(Port).
> so to obtain many tasks at once in this javascript (or whatever)  
> view server I have also to pass more than one request before actualy  
> reading the output. Then I can distribute the load in view server.
> Am I right?

Yes you are. There are actually things that need to be changed
on CouchDB's side to let you do this. Sorry, if my statements
earlier were misleading :) Patches are welcome, too!

Cheers
Jan
--


>
>
> Jan Lehnardt napisał(a):
>> Hello Krzysztof,
>> On May 1, 2008, at 00:52, Krzysztof Kulewski wrote:
>>> Hello,
>>> I have a question for which I couldn't find solution on docs   
>>> available online. Please help:
>>> Lets suppose that I have one couchdb server storing text  
>>> documents.  I want to do some map reduce on it. Map part take very  
>>> long time per  one doc, so I want to distribute maps between 20  
>>> map servers (btw.  maybe unreliable). And then somebody will do  
>>> the reduce part.
>>> As I can see, "CouchDB delegates computation of Views to external   
>>> query servers." But I want to use 20 separate machines for doing  
>>> map  part to obtain 20x speed-up.
>>> How to distribute this map part? Is there any pretty solution out  
>>> of  the box?
>> This is not yet possible out of the box. You could probably
>> achieve this by writing your own view server component
>> that distributes the map requests that CouchDB sends in. For a  
>> reference view server see:
>> http://svn.apache.org/viewvc/incubator/couchdb/trunk/share/server/main.js? 
>>  revision=645661&view=markup You don't need to write it in  
>> JavaScript (the couch.ini lets
>> you specify a daemon written in any language. The only
>> thing you need to make sure is that the interface to
>> CouchDB behaves the same as the original view server. Cheers
>> Jan
>> --
>


Re: map-reduce question: map done on many servers

Posted by Krzysztof Kulewski <kk...@student.uw.edu.pl>.
couch_query_servers.erl is working in such a way: 

% send command and get a response.
prompt(Port, Json) ->
    writeline(Port, cjson:encode(Json)),
    read_json(Port). 

so to obtain many tasks at once in this javascript (or whatever) view server 
I have also to pass more than one request before actualy reading the output. 
Then I can distribute the load in view server. 

Am I right? 

 

Jan Lehnardt napisał(a): 

> Hello Krzysztof,
> On May 1, 2008, at 00:52, Krzysztof Kulewski wrote:
>> Hello,
>> I have a question for which I couldn't find solution on docs  available 
>> online. Please help:
>> Lets suppose that I have one couchdb server storing text documents.  I 
>> want to do some map reduce on it. Map part take very long time per  one 
>> doc, so I want to distribute maps between 20 map servers (btw.  maybe 
>> unreliable). And then somebody will do the reduce part.
>> As I can see, "CouchDB delegates computation of Views to external  query 
>> servers." But I want to use 20 separate machines for doing map  part to 
>> obtain 20x speed-up.
>> How to distribute this map part? Is there any pretty solution out of  the 
>> box?
> 
> This is not yet possible out of the box. You could probably
> achieve this by writing your own view server component
> that distributes the map requests that CouchDB sends in. 
> 
> For a reference view server see:
> http://svn.apache.org/viewvc/incubator/couchdb/trunk/share/server/main.js? 
> revision=645661&view=markup 
> 
> You don't need to write it in JavaScript (the couch.ini lets
> you specify a daemon written in any language. The only
> thing you need to make sure is that the interface to
> CouchDB behaves the same as the original view server. 
> 
> Cheers
> Jan
> --

Re: map-reduce question: map done on many servers

Posted by Jan Lehnardt <ja...@apache.org>.
Hello Krzysztof,
On May 1, 2008, at 00:52, Krzysztof Kulewski wrote:
> Hello,
> I have a question for which I couldn't find solution on docs  
> available online. Please help:
> Lets suppose that I have one couchdb server storing text documents.  
> I want to do some map reduce on it. Map part take very long time per  
> one doc, so I want to distribute maps between 20 map servers (btw.  
> maybe unreliable). And then somebody will do the reduce part.
> As I can see, "CouchDB delegates computation of Views to external  
> query servers." But I want to use 20 separate machines for doing map  
> part to obtain 20x speed-up.
> How to distribute this map part? Is there any pretty solution out of  
> the box?

This is not yet possible out of the box. You could probably
achieve this by writing your own view server component
that distributes the map requests that CouchDB sends in.

For a reference view server see:
http://svn.apache.org/viewvc/incubator/couchdb/trunk/share/server/main.js?revision=645661&view=markup

You don't need to write it in JavaScript (the couch.ini lets
you specify a daemon written in any language. The only
thing you need to make sure is that the interface to
CouchDB behaves the same as the original view server.

Cheers
Jan
--