You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by akhil1988 <ak...@gmail.com> on 2009/06/08 02:22:40 UTC

Implementing CLient-Server architecture using MapReduce

Hi All,

I am porting a machine learning application on Hadoop using MapReduce. The
architecture of the application goes like this: 
1. run a number of server processes which take around 2-3 minutes to start
and then remain as daemon waiting for a client to call for a connection.
During the startup these server processes get trained on the trainng
dataset.

2. A client is then run which connects to servers and process or test any
data that it wants to. The client is basically our job, which we will be
converted to the mapreduce model of hadoop.

Now, since each server takes a good amount of time to start, needless to say
that we want each of these server processes to be pre-running on all the
tasktrackers(all nodes) so that when a mapreduce(client) task come to that
node, the servers are already running and the client just uses them for its
purpose. The server process keeps on running waiting for another map task
that may be assigned to that node.


That means, a server process is started on each node once and it waits for a
connection by a client. When clients( implemeted as map reduce) come to that
node they connect to the server, do they their processing and leave(or
finish).

Can you please tell me how should I go about starting the server on each
node. If I am not clear, please ask any questions. Any help in this regard
will be greatly appreciated.

Thank You!
Akhil

-- 
View this message in context: http://www.nabble.com/Implementing-CLient-Server-architecture-using-MapReduce-tp23916757p23916757.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Re: Implementing CLient-Server architecture using MapReduce

Posted by Matei Zaharia <ma...@cloudera.com>.
Hi Akhil,

Are you looking for an easy way to launch a command on all nodes? Hadoop
comes with a script called bin/slaves.sh which runs a command on each slave.
For example, if you run bin/slaves.sh hostname, you will see the hostname of
each slave printed. To launch a server process, you should create a script
that starts the process in the background using nohup, and then copy this to
all the slaves and run bin/slaves.sh.

For your application though, it sounds like it might be simpler to train on
the data with a separate MapReduce job and then write the results of the
training to the filesystem. New jobs can then grab the appropriate result
file (assuming you had one file per node for example, or some other
partitioning).

Matei

On Mon, Jun 8, 2009 at 10:20 AM, akhil1988 <ak...@gmail.com> wrote:

>
>
> Can anyone help me on this issue. I have an account on the cluster and I
> cannot go and start server on each server process on each tasktracker.
>
> Akhil
>
> akhil1988 wrote:
> >
> > Hi All,
> >
> > I am porting a machine learning application on Hadoop using MapReduce.
> The
> > architecture of the application goes like this:
> > 1. run a number of server processes which take around 2-3 minutes to
> start
> > and then remain as daemon waiting for a client to call for a connection.
> > During the startup these server processes get trained on the trainng
> > dataset.
> >
> > 2. A client is then run which connects to servers and process or test any
> > data that it wants to. The client is basically our job, which we will be
> > converted to the mapreduce model of hadoop.
> >
> > Now, since each server takes a good amount of time to start, needless to
> > say that we want each of these server processes to be pre-running on all
> > the tasktrackers(all nodes) so that when a mapreduce(client) task come to
> > that node, the servers are already running and the client just uses them
> > for its purpose. The server process keeps on running waiting for another
> > map task that may be assigned to that node.
> >
> >
> > That means, a server process is started on each node once and it waits
> for
> > a connection by a client. When clients( implemeted as map reduce) come to
> > that node they connect to the server, do they their processing and
> > leave(or finish).
> >
> > Can you please tell me how should I go about starting the server on each
> > node. If I am not clear, please ask any questions. Any help in this
> regard
> > will be greatly appreciated.
> >
> > Thank You!
> > Akhil
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Implementing-CLient-Server-architecture-using-MapReduce-tp23916757p23928505.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

Re: Implementing CLient-Server architecture using MapReduce

Posted by akhil1988 <ak...@gmail.com>.

Can anyone help me on this issue. I have an account on the cluster and I
cannot go and start server on each server process on each tasktracker.

Akhil

akhil1988 wrote:
> 
> Hi All,
> 
> I am porting a machine learning application on Hadoop using MapReduce. The
> architecture of the application goes like this: 
> 1. run a number of server processes which take around 2-3 minutes to start
> and then remain as daemon waiting for a client to call for a connection.
> During the startup these server processes get trained on the trainng
> dataset.
> 
> 2. A client is then run which connects to servers and process or test any
> data that it wants to. The client is basically our job, which we will be
> converted to the mapreduce model of hadoop.
> 
> Now, since each server takes a good amount of time to start, needless to
> say that we want each of these server processes to be pre-running on all
> the tasktrackers(all nodes) so that when a mapreduce(client) task come to
> that node, the servers are already running and the client just uses them
> for its purpose. The server process keeps on running waiting for another
> map task that may be assigned to that node.
> 
> 
> That means, a server process is started on each node once and it waits for
> a connection by a client. When clients( implemeted as map reduce) come to
> that node they connect to the server, do they their processing and
> leave(or finish).
> 
> Can you please tell me how should I go about starting the server on each
> node. If I am not clear, please ask any questions. Any help in this regard
> will be greatly appreciated.
> 
> Thank You!
> Akhil
> 
> 

-- 
View this message in context: http://www.nabble.com/Implementing-CLient-Server-architecture-using-MapReduce-tp23916757p23928505.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.