You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-dev@hadoop.apache.org by mohak gupta <gu...@gmail.com> on 2012/01/01 08:29:25 UTC

modify data distribution in jobconf

hi

as part of my project I need to modify the data distribution layer in job
conf so as to achieve the following :

1) control which worker nodes should be  started based on the input data
given to them.

2) keep other worker nodes in some kind of sleep state.

3) based on the output emitted by the worker nodes and the data distributed
allow other worker nodes to start .

4) Perform this in a looping structure till the output is achieved.

basically I wish to control which worker nodes perform map and reduce
functions based on the data they have recieved.

Could you please help me by suggesting if this could be achieved and also
what are the tradeoffs involved, Any help is really appreciated

regards
Mohak Gupta

Re: modify data distribution in jobconf

Posted by Prashant Sharma <pr...@imaginea.com>.

Mohak,

I hope it means child jvms which are spawned by tasktrackers. It is still
not clear though what are you trying to achieve, I had say do a little more
research.

You might wanna chk this out.
http://blog.imaginea.com/hadoop-a-short-guide/ ( Take a look at Map-reduce
part.)

-P


On Mon, Jan 2, 2012 at 12:56 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> I'm not sure what you are trying to achieve here.
>
> Hadoop MapReduce works by *trying* to schedule tasks on nodes on which
> data is 'close', either node-local/rack-local.
>
> We doesn't try to 'start'/'stop' nodes. If that is what you are trying to
> do, you need to look for something else.
>
> Arun
>
> On Dec 31, 2011, at 11:29 PM, mohak gupta wrote:
>
> > hi
> >
> > as part of my project I need to modify the data distribution layer in job
> > conf so as to achieve the following :
> >
> > 1) control which worker nodes should be  started based on the input data
> > given to them.
> >
> > 2) keep other worker nodes in some kind of sleep state.
> >
> > 3) based on the output emitted by the worker nodes and the data
> distributed
> > allow other worker nodes to start .
> >
> > 4) Perform this in a looping structure till the output is achieved.
> >
> > basically I wish to control which worker nodes perform map and reduce
> > functions based on the data they have recieved.
> >
> > Could you please help me by suggesting if this could be achieved and also
> > what are the tradeoffs involved, Any help is really appreciated
> >
> > regards
> > Mohak Gupta
>
>

Re: modify data distribution in jobconf

Posted by Arun C Murthy <ac...@hortonworks.com>.

I'm not sure what you are trying to achieve here.

Hadoop MapReduce works by *trying* to schedule tasks on nodes on which data is 'close', either node-local/rack-local.

We doesn't try to 'start'/'stop' nodes. If that is what you are trying to do, you need to look for something else.

Arun

On Dec 31, 2011, at 11:29 PM, mohak gupta wrote:

> hi
> 
> as part of my project I need to modify the data distribution layer in job
> conf so as to achieve the following :
> 
> 1) control which worker nodes should be  started based on the input data
> given to them.
> 
> 2) keep other worker nodes in some kind of sleep state.
> 
> 3) based on the output emitted by the worker nodes and the data distributed
> allow other worker nodes to start .
> 
> 4) Perform this in a looping structure till the output is achieved.
> 
> basically I wish to control which worker nodes perform map and reduce
> functions based on the data they have recieved.
> 
> Could you please help me by suggesting if this could be achieved and also
> what are the tradeoffs involved, Any help is really appreciated
> 
> regards
> Mohak Gupta