You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Krzysztof Kulewski <kk...@student.uw.edu.pl> on 2008/05/01 00:52:45 UTC
map-reduce question: map done on many servers
Hello,
I have a question for which I couldn't find solution on docs available
online. Please help:
Lets suppose that I have one couchdb server storing text documents. I want
to do some map reduce on it. Map part take very long time per one doc, so I
want to distribute maps between 20 map servers (btw. maybe unreliable). And
then somebody will do the reduce part.
As I can see, "CouchDB delegates computation of Views to external query
servers." But I want to use 20 separate machines for doing map part to
obtain 20x speed-up.
How to distribute this map part? Is there any pretty solution out of the
box?
BR,
Krzysztof
Re: map-reduce question: map done on many servers
Posted by Jan Lehnardt <ja...@prima.de>.
On May 1, 2008, at 17:54, Krzysztof Kulewski wrote:
> couch_query_servers.erl is working in such a way:
> % send command and get a response.
> prompt(Port, Json) ->
> writeline(Port, cjson:encode(Json)),
> read_json(Port).
> so to obtain many tasks at once in this javascript (or whatever)
> view server I have also to pass more than one request before actualy
> reading the output. Then I can distribute the load in view server.
> Am I right?
Yes you are. There are actually things that need to be changed
on CouchDB's side to let you do this. Sorry, if my statements
earlier were misleading :) Patches are welcome, too!
Cheers
Jan
--
>
>
> Jan Lehnardt napisał(a):
>> Hello Krzysztof,
>> On May 1, 2008, at 00:52, Krzysztof Kulewski wrote:
>>> Hello,
>>> I have a question for which I couldn't find solution on docs
>>> available online. Please help:
>>> Lets suppose that I have one couchdb server storing text
>>> documents. I want to do some map reduce on it. Map part take very
>>> long time per one doc, so I want to distribute maps between 20
>>> map servers (btw. maybe unreliable). And then somebody will do
>>> the reduce part.
>>> As I can see, "CouchDB delegates computation of Views to external
>>> query servers." But I want to use 20 separate machines for doing
>>> map part to obtain 20x speed-up.
>>> How to distribute this map part? Is there any pretty solution out
>>> of the box?
>> This is not yet possible out of the box. You could probably
>> achieve this by writing your own view server component
>> that distributes the map requests that CouchDB sends in. For a
>> reference view server see:
>> http://svn.apache.org/viewvc/incubator/couchdb/trunk/share/server/main.js?
>> revision=645661&view=markup You don't need to write it in
>> JavaScript (the couch.ini lets
>> you specify a daemon written in any language. The only
>> thing you need to make sure is that the interface to
>> CouchDB behaves the same as the original view server. Cheers
>> Jan
>> --
>
Re: map-reduce question: map done on many servers
Posted by Krzysztof Kulewski <kk...@student.uw.edu.pl>.
couch_query_servers.erl is working in such a way:
% send command and get a response.
prompt(Port, Json) ->
writeline(Port, cjson:encode(Json)),
read_json(Port).
so to obtain many tasks at once in this javascript (or whatever) view server
I have also to pass more than one request before actualy reading the output.
Then I can distribute the load in view server.
Am I right?
Jan Lehnardt napisał(a):
> Hello Krzysztof,
> On May 1, 2008, at 00:52, Krzysztof Kulewski wrote:
>> Hello,
>> I have a question for which I couldn't find solution on docs available
>> online. Please help:
>> Lets suppose that I have one couchdb server storing text documents. I
>> want to do some map reduce on it. Map part take very long time per one
>> doc, so I want to distribute maps between 20 map servers (btw. maybe
>> unreliable). And then somebody will do the reduce part.
>> As I can see, "CouchDB delegates computation of Views to external query
>> servers." But I want to use 20 separate machines for doing map part to
>> obtain 20x speed-up.
>> How to distribute this map part? Is there any pretty solution out of the
>> box?
>
> This is not yet possible out of the box. You could probably
> achieve this by writing your own view server component
> that distributes the map requests that CouchDB sends in.
>
> For a reference view server see:
> http://svn.apache.org/viewvc/incubator/couchdb/trunk/share/server/main.js?
> revision=645661&view=markup
>
> You don't need to write it in JavaScript (the couch.ini lets
> you specify a daemon written in any language. The only
> thing you need to make sure is that the interface to
> CouchDB behaves the same as the original view server.
>
> Cheers
> Jan
> --
Re: map-reduce question: map done on many servers
Posted by Jan Lehnardt <ja...@apache.org>.
Hello Krzysztof,
On May 1, 2008, at 00:52, Krzysztof Kulewski wrote:
> Hello,
> I have a question for which I couldn't find solution on docs
> available online. Please help:
> Lets suppose that I have one couchdb server storing text documents.
> I want to do some map reduce on it. Map part take very long time per
> one doc, so I want to distribute maps between 20 map servers (btw.
> maybe unreliable). And then somebody will do the reduce part.
> As I can see, "CouchDB delegates computation of Views to external
> query servers." But I want to use 20 separate machines for doing map
> part to obtain 20x speed-up.
> How to distribute this map part? Is there any pretty solution out of
> the box?
This is not yet possible out of the box. You could probably
achieve this by writing your own view server component
that distributes the map requests that CouchDB sends in.
For a reference view server see:
http://svn.apache.org/viewvc/incubator/couchdb/trunk/share/server/main.js?revision=645661&view=markup
You don't need to write it in JavaScript (the couch.ini lets
you specify a daemon written in any language. The only
thing you need to make sure is that the interface to
CouchDB behaves the same as the original view server.
Cheers
Jan
--