You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Raghava Mutharaju <m....@gmail.com> on 2010/02/03 07:04:57 UTC

avoiding data redistribution in iterative mapreduce

Hi all,

      I to run a map reduce task repeatedly in order to achieve the desired
result. Is it possible that at the beginning of each iteration, the data set
is not distributed (divided into chunks and distributed) again and again
i.e. once the distribution occurs for the first time, map nodes should work
on the same chunk in every iteration. Can this be done? I only have a brief
experience with MapReduce and I think that the input data set is
redistributed every time.

Thank you.

Regards,
Raghava.

Re: avoiding data redistribution in iterative mapreduce

Posted by Raghava Mutharaju <m....@gmail.com>.

Hi,

    No problem, I am thankful that someone has replied to my question.

Known location -- can/will it be the HDFS or some distributed key-value
store?

Regards,
Raghava.

On Tue, Feb 9, 2010 at 12:40 AM, Amogh Vasekar <am...@yahoo-inc.com> wrote:

>  Hi,
> AFAIK no. I’m not sure how much of a task it is to write a HOD-like
> scheduler, or if its even feasible given the new architecture of single
> managing JT, directly talking to TT. Probably someone more familiar with the
> scheduler architecture can help you better.
> What I was trying to suggest with serialization was write initial mapper
> data to known location, and instead of streaming from split, ignore that and
> read form here.
> Sorry for the delayed response,
>
> Amogh
>
>
>
>
> On 2/4/10 2:01 PM, "Raghava Mutharaju" <m....@gmail.com> wrote:
>
> Hi,
>
>      So is it not possible to avoid redistribution in this case? If that is
> the case, can a custom scheduler be written -- will it be any easy task?
>
> Regards,
> Raghava.
>
> On Thu, Feb 4, 2010 at 2:52 AM, Amogh Vasekar <am...@yahoo-inc.com> wrote:
>
> Hi,
>
> >>Will there be a re-assignment of Map & Reduce nodes by the Master?
> In general using available schedulers, I believe so. Because if it weren’t,
> and I submit job 2 needing different/additional set of inputs, the data
> locality considerations would be somewhat hampered right? When we had HOD,
> this was certainly possible.
>
> Amogh
>
>
>
> On 2/4/10 1:06 AM, "Raghava Mutharaju" <m.vijayaraghava@gmail.com <
> http://m.vijayaraghava@gmail.com> > wrote:
>
> Hi Amogh,
>
>        Thank you for the reply.
>
> >>> What you need, I believe, is “just run on whatever map has”.
>             You got that right :). An example of sequential program would
> be Bubble Sort which needs several iterations for the end result and in each
> iteration it needs to work on the previous output (partially sorted list)
> rather than the initial input. In my case also, the same thing should
> happen.
>
> >>> If you are using an exclusive private cluster, you can probably
> localize <k,v> from first iteration and >>> use dummy input data ( to ensure
> same number of mapper tasks as first round, and use custom >>> classes of
> MapRunner, RecordReader to not read data from supplied input )
>
>           Yes, it would be a local cluster, the one in my university. If we
> set the no of map tasks, would it not be followed in each iteration? As
> mentioned in the documentation, I think I need to use JobClient to control
> the no of iterations.
>
>
> >>> But how can you ensure that you get the same nodes always to run your
> map reduce job on a
> >>> shared cluster?
>
>            while (!done) { JobClient.runJob(jobConf); <<Do something to
> check termination condition>>}
>
> If I write something like that in the code, would not the Map node run on
> the same data chunk it has each time? Will there be a re-assignment of Map &
> Reduce nodes by the Master?
>
>
> Regards,
> Raghava.
>
> On Wed, Feb 3, 2010 at 9:59 AM, Amogh Vasekar <amogh@yahoo-inc.com <
> http://amogh@yahoo-inc.com> > wrote:
>
> Hi,
> If each of your sequential iteration is map+reduce, then no.
> The lifetime of a split is confined to a single map reduce job. The split
> is actually a reference to data, which is used to schedule job as close as
> possible to data. The record reader then uses same object to pass the <k,v>
> in split.
> What you need, I believe, is “just run on whatever map has”. If you are
> using an exclusive private cluster, you can probably localize <k,v> from
> first iteration and use dummy input data ( to ensure same number of mapper
> tasks as first round, and use custom classes of MapRunner, RecordReader to
> not read data from supplied input )But how can you ensure that you get the
> same nodes always to run your map reduce job on a shared cluster?
> Please correct me if I misunderstood your question.
>
> Amogh
>
>
>
> On 2/3/10 11:34 AM, "Raghava Mutharaju" <m.vijayaraghava@gmail.com <
> http://m.vijayaraghava@gmail.com>  <ht...@gmail.com> >
> wrote:
>
> Hi all,
>
>       I to run a map reduce task repeatedly in order to achieve the desired
> result. Is it possible that at the beginning of each iteration, the data set
> is not distributed (divided into chunks and distributed) again and again
> i.e. once the distribution occurs for the first time, map nodes should work
> on the same chunk in every iteration. Can this be done? I only have a brief
> experience with MapReduce and I think that the input data set is
> redistributed every time.
>
> Thank you.
>
> Regards,
> Raghava.
>
>
>
>
>
>

Re: avoiding data redistribution in iterative mapreduce

Posted by Amogh Vasekar <am...@yahoo-inc.com>.

Hi,
AFAIK no. I'm not sure how much of a task it is to write a HOD-like scheduler, or if its even feasible given the new architecture of single managing JT, directly talking to TT. Probably someone more familiar with the scheduler architecture can help you better.
What I was trying to suggest with serialization was write initial mapper data to known location, and instead of streaming from split, ignore that and read form here.
Sorry for the delayed response,

Amogh

On 2/4/10 2:01 PM, "Raghava Mutharaju" <m....@gmail.com> wrote:

Hi,

     So is it not possible to avoid redistribution in this case? If that is the case, can a custom scheduler be written -- will it be any easy task?

Regards,
Raghava.

On Thu, Feb 4, 2010 at 2:52 AM, Amogh Vasekar <am...@yahoo-inc.com> wrote:
Hi,

>>Will there be a re-assignment of Map & Reduce nodes by the Master?
In general using available schedulers, I believe so. Because if it weren't, and I submit job 2 needing different/additional set of inputs, the data locality considerations would be somewhat hampered right? When we had HOD, this was certainly possible.

Amogh

On 2/4/10 1:06 AM, "Raghava Mutharaju" <m.vijayaraghava@gmail.com <ht...@gmail.com> > wrote:

Hi Amogh,

       Thank you for the reply.

>>> What you need, I believe, is "just run on whatever map has".
            You got that right :). An example of sequential program would be Bubble Sort which needs several iterations for the end result and in each iteration it needs to work on the previous output (partially sorted list) rather than the initial input. In my case also, the same thing should happen.

>>> If you are using an exclusive private cluster, you can probably localize <k,v> from first iteration and >>> use dummy input data ( to ensure same number of mapper tasks as first round, and use custom >>> classes of MapRunner, RecordReader to not read data from supplied input )

          Yes, it would be a local cluster, the one in my university. If we set the no of map tasks, would it not be followed in each iteration? As mentioned in the documentation, I think I need to use JobClient to control the no of iterations.

>>> But how can you ensure that you get the same nodes always to run your map reduce job on a
>>> shared cluster?

           while (!done) { JobClient.runJob(jobConf); <<Do something to check termination condition>>}

If I write something like that in the code, would not the Map node run on the same data chunk it has each time? Will there be a re-assignment of Map & Reduce nodes by the Master?

Regards,
Raghava.

On Wed, Feb 3, 2010 at 9:59 AM, Amogh Vasekar <amogh@yahoo-inc.com <ht...@yahoo-inc.com> > wrote:
Hi,
If each of your sequential iteration is map+reduce, then no.
The lifetime of a split is confined to a single map reduce job. The split is actually a reference to data, which is used to schedule job as close as possible to data. The record reader then uses same object to pass the <k,v> in split.
What you need, I believe, is "just run on whatever map has". If you are using an exclusive private cluster, you can probably localize <k,v> from first iteration and use dummy input data ( to ensure same number of mapper tasks as first round, and use custom classes of MapRunner, RecordReader to not read data from supplied input )But how can you ensure that you get the same nodes always to run your map reduce job on a shared cluster?
Please correct me if I misunderstood your question.

Amogh

On 2/3/10 11:34 AM, "Raghava Mutharaju" <m.vijayaraghava@gmail.com <ht...@gmail.com>  <ht...@gmail.com> > wrote:

Hi all,

      I to run a map reduce task repeatedly in order to achieve the desired result. Is it possible that at the beginning of each iteration, the data set is not distributed (divided into chunks and distributed) again and again i.e. once the distribution occurs for the first time, map nodes should work on the same chunk in every iteration. Can this be done? I only have a brief experience with MapReduce and I think that the input data set is redistributed every time.

Thank you.

Regards,
Raghava.

Re: avoiding data redistribution in iterative mapreduce

Posted by Raghava Mutharaju <m....@gmail.com>.

Hi,

     So is it not possible to avoid redistribution in this case? If that is
the case, can a custom scheduler be written -- will it be any easy task?

Regards,
Raghava.

On Thu, Feb 4, 2010 at 2:52 AM, Amogh Vasekar <am...@yahoo-inc.com> wrote:

>  Hi,
>
> >>Will there be a re-assignment of Map & Reduce nodes by the Master?
> In general using available schedulers, I believe so. Because if it weren’t,
> and I submit job 2 needing different/additional set of inputs, the data
> locality considerations would be somewhat hampered right? When we had HOD,
> this was certainly possible.
>
> Amogh
>
>
>
> On 2/4/10 1:06 AM, "Raghava Mutharaju" <m....@gmail.com> wrote:
>
> Hi Amogh,
>
>        Thank you for the reply.
>
> >>> What you need, I believe, is “just run on whatever map has”.
>             You got that right :). An example of sequential program would
> be Bubble Sort which needs several iterations for the end result and in each
> iteration it needs to work on the previous output (partially sorted list)
> rather than the initial input. In my case also, the same thing should
> happen.
>
> >>> If you are using an exclusive private cluster, you can probably
> localize <k,v> from first iteration and >>> use dummy input data ( to ensure
> same number of mapper tasks as first round, and use custom >>> classes of
> MapRunner, RecordReader to not read data from supplied input )
>
>           Yes, it would be a local cluster, the one in my university. If we
> set the no of map tasks, would it not be followed in each iteration? As
> mentioned in the documentation, I think I need to use JobClient to control
> the no of iterations.
>
>
> >>> But how can you ensure that you get the same nodes always to run your
> map reduce job on a
> >>> shared cluster?
>
>            while (!done) { JobClient.runJob(jobConf); <<Do something to
> check termination condition>>}
>
> If I write something like that in the code, would not the Map node run on
> the same data chunk it has each time? Will there be a re-assignment of Map &
> Reduce nodes by the Master?
>
>
> Regards,
> Raghava.
>
> On Wed, Feb 3, 2010 at 9:59 AM, Amogh Vasekar <am...@yahoo-inc.com> wrote:
>
> Hi,
> If each of your sequential iteration is map+reduce, then no.
> The lifetime of a split is confined to a single map reduce job. The split
> is actually a reference to data, which is used to schedule job as close as
> possible to data. The record reader then uses same object to pass the <k,v>
> in split.
> What you need, I believe, is “just run on whatever map has”. If you are
> using an exclusive private cluster, you can probably localize <k,v> from
> first iteration and use dummy input data ( to ensure same number of mapper
> tasks as first round, and use custom classes of MapRunner, RecordReader to
> not read data from supplied input )But how can you ensure that you get the
> same nodes always to run your map reduce job on a shared cluster?
> Please correct me if I misunderstood your question.
>
> Amogh
>
>
>
> On 2/3/10 11:34 AM, "Raghava Mutharaju" <m.vijayaraghava@gmail.com <
> http://m.vijayaraghava@gmail.com> > wrote:
>
> Hi all,
>
>       I to run a map reduce task repeatedly in order to achieve the desired
> result. Is it possible that at the beginning of each iteration, the data set
> is not distributed (divided into chunks and distributed) again and again
> i.e. once the distribution occurs for the first time, map nodes should work
> on the same chunk in every iteration. Can this be done? I only have a brief
> experience with MapReduce and I think that the input data set is
> redistributed every time.
>
> Thank you.
>
> Regards,
> Raghava.
>
>
>
>

Re: avoiding data redistribution in iterative mapreduce

Posted by Amogh Vasekar <am...@yahoo-inc.com>.

Hi,
>>Will there be a re-assignment of Map & Reduce nodes by the Master?
In general using available schedulers, I believe so. Because if it weren't, and I submit job 2 needing different/additional set of inputs, the data locality considerations would be somewhat hampered right? When we had HOD, this was certainly possible.

Amogh

On 2/4/10 1:06 AM, "Raghava Mutharaju" <m....@gmail.com> wrote:

Hi Amogh,

       Thank you for the reply.

>>> What you need, I believe, is "just run on whatever map has".
            You got that right :). An example of sequential program would be Bubble Sort which needs several iterations for the end result and in each iteration it needs to work on the previous output (partially sorted list) rather than the initial input. In my case also, the same thing should happen.

>>> If you are using an exclusive private cluster, you can probably localize <k,v> from first iteration and >>> use dummy input data ( to ensure same number of mapper tasks as first round, and use custom >>> classes of MapRunner, RecordReader to not read data from supplied input )

          Yes, it would be a local cluster, the one in my university. If we set the no of map tasks, would it not be followed in each iteration? As mentioned in the documentation, I think I need to use JobClient to control the no of iterations.

>>> But how can you ensure that you get the same nodes always to run your map reduce job on a
>>> shared cluster?

           while (!done) { JobClient.runJob(jobConf); <<Do something to check termination condition>>}

If I write something like that in the code, would not the Map node run on the same data chunk it has each time? Will there be a re-assignment of Map & Reduce nodes by the Master?

Regards,
Raghava.

On Wed, Feb 3, 2010 at 9:59 AM, Amogh Vasekar <am...@yahoo-inc.com> wrote:
Hi,
If each of your sequential iteration is map+reduce, then no.
The lifetime of a split is confined to a single map reduce job. The split is actually a reference to data, which is used to schedule job as close as possible to data. The record reader then uses same object to pass the <k,v> in split.
What you need, I believe, is "just run on whatever map has". If you are using an exclusive private cluster, you can probably localize <k,v> from first iteration and use dummy input data ( to ensure same number of mapper tasks as first round, and use custom classes of MapRunner, RecordReader to not read data from supplied input )But how can you ensure that you get the same nodes always to run your map reduce job on a shared cluster?
Please correct me if I misunderstood your question.

Amogh

On 2/3/10 11:34 AM, "Raghava Mutharaju" <m.vijayaraghava@gmail.com <ht...@gmail.com> > wrote:

Hi all,

      I to run a map reduce task repeatedly in order to achieve the desired result. Is it possible that at the beginning of each iteration, the data set is not distributed (divided into chunks and distributed) again and again i.e. once the distribution occurs for the first time, map nodes should work on the same chunk in every iteration. Can this be done? I only have a brief experience with MapReduce and I think that the input data set is redistributed every time.

Thank you.

Regards,
Raghava.

Re: avoiding data redistribution in iterative mapreduce

Posted by Raghava Mutharaju <m....@gmail.com>.

Hi Amogh,

       Thank you for the reply.

>>> What you need, I believe, is “just run on whatever map has”.
            You got that right :). An example of sequential program would be
Bubble Sort which needs several iterations for the end result and in each
iteration it needs to work on the previous output (partially sorted list)
rather than the initial input. In my case also, the same thing should
happen.

>>> If you are using an exclusive private cluster, you can probably localize
<k,v> from first iteration and >>> use dummy input data ( to ensure same
number of mapper tasks as first round, and use custom >>> classes of
MapRunner, RecordReader to not read data from supplied input )

          Yes, it would be a local cluster, the one in my university. If we
set the no of map tasks, would it not be followed in each iteration? As
mentioned in the documentation, I think I need to use JobClient to control
the no of iterations.


>>> But how can you ensure that you get the same nodes always to run your
map reduce job on a
>>> shared cluster?

           while (!done) { JobClient.runJob(jobConf); <<Do something to
check termination condition>>}

If I write something like that in the code, would not the Map node run on
the same data chunk it has each time? Will there be a re-assignment of Map &
Reduce nodes by the Master?


Regards,
Raghava.

On Wed, Feb 3, 2010 at 9:59 AM, Amogh Vasekar <am...@yahoo-inc.com> wrote:

>  Hi,
> If each of your sequential iteration is map+reduce, then no.
> The lifetime of a split is confined to a single map reduce job. The split
> is actually a reference to data, which is used to schedule job as close as
> possible to data. The record reader then uses same object to pass the <k,v>
> in split.
> What you need, I believe, is “just run on whatever map has”. If you are
> using an exclusive private cluster, you can probably localize <k,v> from
> first iteration and use dummy input data ( to ensure same number of mapper
> tasks as first round, and use custom classes of MapRunner, RecordReader to
> not read data from supplied input )But how can you ensure that you get the
> same nodes always to run your map reduce job on a shared cluster?
> Please correct me if I misunderstood your question.
>
> Amogh
>
>
>
> On 2/3/10 11:34 AM, "Raghava Mutharaju" <m....@gmail.com> wrote:
>
> Hi all,
>
>       I to run a map reduce task repeatedly in order to achieve the desired
> result. Is it possible that at the beginning of each iteration, the data set
> is not distributed (divided into chunks and distributed) again and again
> i.e. once the distribution occurs for the first time, map nodes should work
> on the same chunk in every iteration. Can this be done? I only have a brief
> experience with MapReduce and I think that the input data set is
> redistributed every time.
>
> Thank you.
>
> Regards,
> Raghava.
>
>

Re: avoiding data redistribution in iterative mapreduce

Posted by Amogh Vasekar <am...@yahoo-inc.com>.

Hi,
If each of your sequential iteration is map+reduce, then no.
The lifetime of a split is confined to a single map reduce job. The split is actually a reference to data, which is used to schedule job as close as possible to data. The record reader then uses same object to pass the <k,v> in split.
What you need, I believe, is "just run on whatever map has". If you are using an exclusive private cluster, you can probably localize <k,v> from first iteration and use dummy input data ( to ensure same number of mapper tasks as first round, and use custom classes of MapRunner, RecordReader to not read data from supplied input )But how can you ensure that you get the same nodes always to run your map reduce job on a shared cluster?
Please correct me if I misunderstood your question.

Amogh


On 2/3/10 11:34 AM, "Raghava Mutharaju" <m....@gmail.com> wrote:

Hi all,

      I to run a map reduce task repeatedly in order to achieve the desired result. Is it possible that at the beginning of each iteration, the data set is not distributed (divided into chunks and distributed) again and again i.e. once the distribution occurs for the first time, map nodes should work on the same chunk in every iteration. Can this be done? I only have a brief experience with MapReduce and I think that the input data set is redistributed every time.

Thank you.

Regards,
Raghava.