You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Delip Rao <de...@gmail.com> on 2008/12/08 07:25:33 UTC

Run Map-Reduce multiple times

Hi,

I need to run my map-reduce routines for several iterations so that
the output of an iteration becomes the input to the next iteration. Is
there a standard pattern to do this instead of calling
JobClient.runJob() in a loop?

Thanks,
Delip

Re: Run Map-Reduce multiple times

Posted by Delip Rao <de...@gmail.com>.
Thanks Jason for pointing about the ChainMapper. Although it's not
directly useful for the problem in this email, it's an awesome way to
pipeline several mappers. Quite useful if you've multiple
pre-processing steps. For archival purposes, here's a link with a good
example.

http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/lib/ChainMapper.html

Coming back to the subject of this email, yes I did something similar
to what Chris and Xiance noted below. Turns out Mahout also does the
same thing too.

Cheers,
Delip

On Tue, Dec 23, 2008 at 6:24 PM, Jason Venner <ja...@attributor.com> wrote:
> in 19 there is a chaining facility, I haven't looked at it yet, but it may
> provide an alternative to the rather standard pattern of looping.
>
> You may also what to check what mahout is doing as it is a common problem in
> that space.
>
> Delip Rao wrote:
>>
>> Thanks Chris! I ended up doing something similar too.
>>
>> On Mon, Dec 8, 2008 at 2:29 AM, Chris Dyer <re...@umd.edu> wrote:
>>
>>>
>>> Hey Delip-
>>> mapreduce doesn't really have any particular support for iterative
>>> algorithms.  You just have to put a loop in the control program and
>>> set the output path from the previous iteration to be the input path
>>> in the next iteration.  This at least lets you control whether you
>>> decide to keep around results of intermediate iterations or erase
>>> them...
>>> -Chris
>>>
>>> On Mon, Dec 8, 2008 at 1:25 AM, Delip Rao <de...@gmail.com> wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> I need to run my map-reduce routines for several iterations so that
>>>> the output of an iteration becomes the input to the next iteration. Is
>>>> there a standard pattern to do this instead of calling
>>>> JobClient.runJob() in a loop?
>>>>
>>>> Thanks,
>>>> Delip
>>>>
>>>>
>

Re: Run Map-Reduce multiple times

Posted by Jason Venner <ja...@attributor.com>.
in 19 there is a chaining facility, I haven't looked at it yet, but it 
may provide an alternative to the rather standard pattern of looping.

You may also what to check what mahout is doing as it is a common 
problem in that space.

Delip Rao wrote:
> Thanks Chris! I ended up doing something similar too.
>
> On Mon, Dec 8, 2008 at 2:29 AM, Chris Dyer <re...@umd.edu> wrote:
>   
>> Hey Delip-
>> mapreduce doesn't really have any particular support for iterative
>> algorithms.  You just have to put a loop in the control program and
>> set the output path from the previous iteration to be the input path
>> in the next iteration.  This at least lets you control whether you
>> decide to keep around results of intermediate iterations or erase
>> them...
>> -Chris
>>
>> On Mon, Dec 8, 2008 at 1:25 AM, Delip Rao <de...@gmail.com> wrote:
>>     
>>> Hi,
>>>
>>> I need to run my map-reduce routines for several iterations so that
>>> the output of an iteration becomes the input to the next iteration. Is
>>> there a standard pattern to do this instead of calling
>>> JobClient.runJob() in a loop?
>>>
>>> Thanks,
>>> Delip
>>>
>>>       

Re: Run Map-Reduce multiple times

Posted by Delip Rao <de...@gmail.com>.
Thanks Chris! I ended up doing something similar too.

On Mon, Dec 8, 2008 at 2:29 AM, Chris Dyer <re...@umd.edu> wrote:
> Hey Delip-
> mapreduce doesn't really have any particular support for iterative
> algorithms.  You just have to put a loop in the control program and
> set the output path from the previous iteration to be the input path
> in the next iteration.  This at least lets you control whether you
> decide to keep around results of intermediate iterations or erase
> them...
> -Chris
>
> On Mon, Dec 8, 2008 at 1:25 AM, Delip Rao <de...@gmail.com> wrote:
>> Hi,
>>
>> I need to run my map-reduce routines for several iterations so that
>> the output of an iteration becomes the input to the next iteration. Is
>> there a standard pattern to do this instead of calling
>> JobClient.runJob() in a loop?
>>
>> Thanks,
>> Delip
>>
>

Re: Run Map-Reduce multiple times

Posted by Chris Dyer <re...@umd.edu>.
Hey Delip-
mapreduce doesn't really have any particular support for iterative
algorithms.  You just have to put a loop in the control program and
set the output path from the previous iteration to be the input path
in the next iteration.  This at least lets you control whether you
decide to keep around results of intermediate iterations or erase
them...
-Chris

On Mon, Dec 8, 2008 at 1:25 AM, Delip Rao <de...@gmail.com> wrote:
> Hi,
>
> I need to run my map-reduce routines for several iterations so that
> the output of an iteration becomes the input to the next iteration. Is
> there a standard pattern to do this instead of calling
> JobClient.runJob() in a loop?
>
> Thanks,
> Delip
>

Re: Run Map-Reduce multiple times

Posted by "Xiance SI(司宪策)" <ad...@gmail.com>.
I've faced the same problem, and I just wrote a loop to do iterative
MapReduce manually.

Xiance

On Mon, Dec 8, 2008 at 2:25 PM, Delip Rao <de...@gmail.com> wrote:

> Hi,
>
> I need to run my map-reduce routines for several iterations so that
> the output of an iteration becomes the input to the next iteration. Is
> there a standard pattern to do this instead of calling
> JobClient.runJob() in a loop?
>
> Thanks,
> Delip
>