You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Delip Rao <de...@gmail.com> on 2008/12/08 07:25:33 UTC
Run Map-Reduce multiple times
Hi,
I need to run my map-reduce routines for several iterations so that
the output of an iteration becomes the input to the next iteration. Is
there a standard pattern to do this instead of calling
JobClient.runJob() in a loop?
Thanks,
Delip
Re: Run Map-Reduce multiple times
Posted by Delip Rao <de...@gmail.com>.
Thanks Jason for pointing about the ChainMapper. Although it's not
directly useful for the problem in this email, it's an awesome way to
pipeline several mappers. Quite useful if you've multiple
pre-processing steps. For archival purposes, here's a link with a good
example.
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/lib/ChainMapper.html
Coming back to the subject of this email, yes I did something similar
to what Chris and Xiance noted below. Turns out Mahout also does the
same thing too.
Cheers,
Delip
On Tue, Dec 23, 2008 at 6:24 PM, Jason Venner <ja...@attributor.com> wrote:
> in 19 there is a chaining facility, I haven't looked at it yet, but it may
> provide an alternative to the rather standard pattern of looping.
>
> You may also what to check what mahout is doing as it is a common problem in
> that space.
>
> Delip Rao wrote:
>>
>> Thanks Chris! I ended up doing something similar too.
>>
>> On Mon, Dec 8, 2008 at 2:29 AM, Chris Dyer <re...@umd.edu> wrote:
>>
>>>
>>> Hey Delip-
>>> mapreduce doesn't really have any particular support for iterative
>>> algorithms. You just have to put a loop in the control program and
>>> set the output path from the previous iteration to be the input path
>>> in the next iteration. This at least lets you control whether you
>>> decide to keep around results of intermediate iterations or erase
>>> them...
>>> -Chris
>>>
>>> On Mon, Dec 8, 2008 at 1:25 AM, Delip Rao <de...@gmail.com> wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> I need to run my map-reduce routines for several iterations so that
>>>> the output of an iteration becomes the input to the next iteration. Is
>>>> there a standard pattern to do this instead of calling
>>>> JobClient.runJob() in a loop?
>>>>
>>>> Thanks,
>>>> Delip
>>>>
>>>>
>
Re: Run Map-Reduce multiple times
Posted by Jason Venner <ja...@attributor.com>.
in 19 there is a chaining facility, I haven't looked at it yet, but it
may provide an alternative to the rather standard pattern of looping.
You may also what to check what mahout is doing as it is a common
problem in that space.
Delip Rao wrote:
> Thanks Chris! I ended up doing something similar too.
>
> On Mon, Dec 8, 2008 at 2:29 AM, Chris Dyer <re...@umd.edu> wrote:
>
>> Hey Delip-
>> mapreduce doesn't really have any particular support for iterative
>> algorithms. You just have to put a loop in the control program and
>> set the output path from the previous iteration to be the input path
>> in the next iteration. This at least lets you control whether you
>> decide to keep around results of intermediate iterations or erase
>> them...
>> -Chris
>>
>> On Mon, Dec 8, 2008 at 1:25 AM, Delip Rao <de...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I need to run my map-reduce routines for several iterations so that
>>> the output of an iteration becomes the input to the next iteration. Is
>>> there a standard pattern to do this instead of calling
>>> JobClient.runJob() in a loop?
>>>
>>> Thanks,
>>> Delip
>>>
>>>
Re: Run Map-Reduce multiple times
Posted by Delip Rao <de...@gmail.com>.
Thanks Chris! I ended up doing something similar too.
On Mon, Dec 8, 2008 at 2:29 AM, Chris Dyer <re...@umd.edu> wrote:
> Hey Delip-
> mapreduce doesn't really have any particular support for iterative
> algorithms. You just have to put a loop in the control program and
> set the output path from the previous iteration to be the input path
> in the next iteration. This at least lets you control whether you
> decide to keep around results of intermediate iterations or erase
> them...
> -Chris
>
> On Mon, Dec 8, 2008 at 1:25 AM, Delip Rao <de...@gmail.com> wrote:
>> Hi,
>>
>> I need to run my map-reduce routines for several iterations so that
>> the output of an iteration becomes the input to the next iteration. Is
>> there a standard pattern to do this instead of calling
>> JobClient.runJob() in a loop?
>>
>> Thanks,
>> Delip
>>
>
Re: Run Map-Reduce multiple times
Posted by Chris Dyer <re...@umd.edu>.
Hey Delip-
mapreduce doesn't really have any particular support for iterative
algorithms. You just have to put a loop in the control program and
set the output path from the previous iteration to be the input path
in the next iteration. This at least lets you control whether you
decide to keep around results of intermediate iterations or erase
them...
-Chris
On Mon, Dec 8, 2008 at 1:25 AM, Delip Rao <de...@gmail.com> wrote:
> Hi,
>
> I need to run my map-reduce routines for several iterations so that
> the output of an iteration becomes the input to the next iteration. Is
> there a standard pattern to do this instead of calling
> JobClient.runJob() in a loop?
>
> Thanks,
> Delip
>
Re: Run Map-Reduce multiple times
Posted by "Xiance SI(司宪策)" <ad...@gmail.com>.
I've faced the same problem, and I just wrote a loop to do iterative
MapReduce manually.
Xiance
On Mon, Dec 8, 2008 at 2:25 PM, Delip Rao <de...@gmail.com> wrote:
> Hi,
>
> I need to run my map-reduce routines for several iterations so that
> the output of an iteration becomes the input to the next iteration. Is
> there a standard pattern to do this instead of calling
> JobClient.runJob() in a loop?
>
> Thanks,
> Delip
>