You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Luiz Carlos Muniz <lc...@gmail.com> on 2012/03/29 02:25:05 UTC

Send a map to all nodes

Hi,

Is there any way to ensure the execution of a map on all nodes of a
clusterin a way that each node run the map once and only once. That is, I
would use Hadoop to execute a method on all nodes in the cluster. Without
the possibility that the method execute twice in the same node even if
another node fails.

I already set mapred.tasktracker.map.tasks.maximum  to 1 and
mapred.max.jobs.per.node to 1 but still, if a node fails, another node that
has
carried out a map before run the map again to meet the absence of which
failed.

Luiz Carlos Melo Muniz

Luiz Carlos Melo Muniz

Re: Send a map to all nodes

Posted by Radim Kolar <hs...@filez.com>.
YARN in hadoop 0.23.1 can do this.

Re: Send a map to all nodes

Posted by Samir Eljazovic <sa...@gmail.com>.
Hi Luiz,
you should consider Storm <https://github.com/nathanmarz/storm>or
S4<http://incubator.apache.org/s4/>for your purpose. In Storm you can
create topology to run your algorithm on
all nodes.

HTH

Samir


On 29 March 2012 14:58, Luiz Carlos Muniz <lc...@gmail.com> wrote:

> Do not worry about this.
>
> My problem is just run an algorithm on all nodes in a grid. So I realized,
> hadoop does not serve for this purpose and I am already studying a
> alternative. If you have some suggestion I will be grateful.
>
>
> Luiz Carlos Melo Muniz
>
>
>
>
>
> 2012/3/29 Harsh J <ha...@cloudera.com>
>
>> Luiz,
>>
>> Though it is possible to 'hint' this by tweaking the InputSplits
>> passed from the job, the default schedulers of Hadoop do not make any
>> such guarantees and hence this isn't possible unless you write your
>> own complete scheduler, an exercise that wouldn't suit production
>> deployments unless you also test your scheduler intensively for other
>> types of workloads.
>>
>> Why do you even need such a thing? For processing purposes or
>> otherwise? I'm hoping its not a monitoring sort of hack you're trying
>> to do.
>>
>> On Thu, Mar 29, 2012 at 5:55 AM, Luiz Carlos Muniz <lc...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > Is there any way to ensure the execution of a map on all nodes of a
>> > clusterin a way that each node run the map once and only once. That is,
>> I
>> > would use Hadoop to execute a method on all nodes in the cluster.
>> Without
>> > the possibility that the method execute twice in the same node even if
>> > another node fails.
>> >
>> > I already set mapred.tasktracker.map.tasks.maximum  to 1 and
>> > mapred.max.jobs.per.node to 1 but still, if a node fails, another node
>> that
>> > has
>> > carried out a map before run the map again to meet the absence of which
>> > failed.
>> >
>> > Luiz Carlos Melo Muniz
>> >
>> > Luiz Carlos Melo Muniz
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Send a map to all nodes

Posted by Luiz Carlos Muniz <lc...@gmail.com>.
Do not worry about this.

My problem is just run an algorithm on all nodes in a grid. So I realized,
hadoop does not serve for this purpose and I am already studying a
alternative. If you have some suggestion I will be grateful.


Luiz Carlos Melo Muniz




2012/3/29 Harsh J <ha...@cloudera.com>

> Luiz,
>
> Though it is possible to 'hint' this by tweaking the InputSplits
> passed from the job, the default schedulers of Hadoop do not make any
> such guarantees and hence this isn't possible unless you write your
> own complete scheduler, an exercise that wouldn't suit production
> deployments unless you also test your scheduler intensively for other
> types of workloads.
>
> Why do you even need such a thing? For processing purposes or
> otherwise? I'm hoping its not a monitoring sort of hack you're trying
> to do.
>
> On Thu, Mar 29, 2012 at 5:55 AM, Luiz Carlos Muniz <lc...@gmail.com>
> wrote:
> > Hi,
> >
> > Is there any way to ensure the execution of a map on all nodes of a
> > clusterin a way that each node run the map once and only once. That is, I
> > would use Hadoop to execute a method on all nodes in the cluster. Without
> > the possibility that the method execute twice in the same node even if
> > another node fails.
> >
> > I already set mapred.tasktracker.map.tasks.maximum  to 1 and
> > mapred.max.jobs.per.node to 1 but still, if a node fails, another node
> that
> > has
> > carried out a map before run the map again to meet the absence of which
> > failed.
> >
> > Luiz Carlos Melo Muniz
> >
> > Luiz Carlos Melo Muniz
> >
> >
>
>
>
> --
> Harsh J
>

Re: Send a map to all nodes

Posted by Harsh J <ha...@cloudera.com>.
Luiz,

Though it is possible to 'hint' this by tweaking the InputSplits
passed from the job, the default schedulers of Hadoop do not make any
such guarantees and hence this isn't possible unless you write your
own complete scheduler, an exercise that wouldn't suit production
deployments unless you also test your scheduler intensively for other
types of workloads.

Why do you even need such a thing? For processing purposes or
otherwise? I'm hoping its not a monitoring sort of hack you're trying
to do.

On Thu, Mar 29, 2012 at 5:55 AM, Luiz Carlos Muniz <lc...@gmail.com> wrote:
> Hi,
>
> Is there any way to ensure the execution of a map on all nodes of a
> clusterin a way that each node run the map once and only once. That is, I
> would use Hadoop to execute a method on all nodes in the cluster. Without
> the possibility that the method execute twice in the same node even if
> another node fails.
>
> I already set mapred.tasktracker.map.tasks.maximum  to 1 and
> mapred.max.jobs.per.node to 1 but still, if a node fails, another node that
> has
> carried out a map before run the map again to meet the absence of which
> failed.
>
> Luiz Carlos Melo Muniz
>
> Luiz Carlos Melo Muniz
>
>



-- 
Harsh J