You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-dev@hadoop.apache.org by jian yi <ey...@gmail.com> on 2010/02/08 08:25:52 UTC

Map-Balance-Reduce draft

Two targets:
1. Solving the skew problem
2. Regarding a task as a timeslice to improve on scheduler, switching a job
to another job by timeslice.

In MR (Map-Reduce) model, reducings are not balanced, because the scale of
partitiones are unbalanced. How to balance? We can control the size of
partition, rehash the bigger parition and combine to the specified size. If
a key has many values, it's necessary to execute mapreduce twice.The
following is the model digram:
[image:
?ui=2&view=att&th=126ac73d6290bd76&attid=0.1&disp=attd&realattid=ii_126ac73d6290bd76&zw]
Scheduler can regard a task as a timeslice similarly OS scheduler.
If a split is bigger than a specified size, it will be splitted again. If a
split is smaller than a specified size, it will be combined with others, we
can name the combining procedure regroup. The combining is logic, it's not
necessay to combine these smaller splits to a disk file, which will not
affect the performance.The target is that every task spent same time
running.
[image:
?ui=2&view=att&th=126ac741bba5c355&attid=0.1&disp=attd&realattid=ii_126ac741bba5c355&zw]

Re: Map-Balance-Reduce draft

Posted by jian yi <ey...@gmail.com>.

Hi Todd,

I think you have mistaken my meaning. Splitting should achieve two goals:
1. All tasks are balanced
2. The bytes of a task are in the specified range

There are two cases:
1.A single key has too many values
2.Many different keys are hashed to a same partition

By analogy we cut a cake, not only every piece is evenly, but also the piece
size are in the specified range. There are several advantages:
1.The skew problem is soloved, becase every task is balanced
2. Scheduler can switch jobs by a task, becase the bytes are specified for
every task

In MR, a task looks like a sleep(N) call and N is not determinated, and M
maybe is very larger.
In MBR, a task looks like a sleep(M) call and M is determinated, and M is
not larger.

For lots of  applications, a parallel scheduler is necessary based on
priority. Exclusive priority is also necessary for emergency.

Regards
Jian Yi

2010/2/10 Todd Lipcon <to...@cloudera.com>

> Hi Jian,
>
> Are you essentially proposing "work stealing"? That is to say, if a map
> task
> is taking longer than others, another map task will be able to grab some of
> its work from it after the job is already under way?
>
> -Todd
>
> On Tue, Feb 9, 2010 at 1:23 AM, jian yi <ey...@gmail.com> wrote:
>
> > Just like cutting a cake, we will cut it again if the first is not
> > balanced.
> > If we can control the size of every task, system will become more simple
> > and
> > optimizing is possible.
> >
> > 2010/2/9 jian yi <ey...@gmail.com>
> >
> > > Hi Alan,
> > >
> > > In my opinion, MBR solves the skew problem with minimum cost. It is
> > differ
> > > with combiner. My English is poor, but I do my best to express myself
> > > clearly.
> > >
> > > In MBR, we can set the size of a task, both map task and reduce task.
> For
> > > example, 120~150MB input for a task, the motive is that we can control
> a
> > > task's run-time to keep the size of every task is almost equal, which
> is
> > the
> > > precondition that a task can be regarded as a timeslice switched.
> > >
> > > By hashing output of map to more splits, we can regroup smaller splits
> to
> > a
> > > new split which size is specified and hash bigger spits to more small
> > > splits, until the size of all splits is within the specified range.
> > >
> > > Balance interface is like map and reduce interface. Balance interface
> > shoud
> > > be implemented when a single key has too many values, because the key
> > will
> > > be hashed to more than one splits. In the case, we can't get the final
> > > results in a MBR session, it is necessary to start a next MBR session.
> > For a
> > > same key, we can hash it with key+value, the action will be telled to
> > > balance interface.
> > >
> > > Regards
> > > Jian Yi
> > >
> > > 2010/2/9 Alan Gates <ga...@yahoo-inc.com>
> > >
> > > Jian,
> > >>
> > >> Sorry if any of my questions or comments would have been answered by
> the
> > >> diagrams, but apache lists don't allow attachments, so I can't see
> your
> > >> diagrams.
> > >>
> > >> If I understand correctly, your suggestion for balancing is to apply
> > >> reduce on subsets of the hashed data, and then run reduce again on
> this
> > >> reduced data set.  Is that correct?  If so, how does this differ from
> > the
> > >> combiner?  Second, some aggregation operations truly aren't algebraic
> > (that
> > >> is, they cannot be distributed across multiple iterations of reduce).
> > An
> > >> example of this is session analysis, where the algorithm truly needs
> to
> > see
> > >> all operations together to analyze the user session.  How do you
> propose
> > to
> > >> handle that case?
> > >>
> > >> Alan.
> > >>
> > >>
> > >> On Feb 7, 2010, at 11:25 PM, jian yi wrote:
> > >>
> > >>  Two targets:
> > >>> 1. Solving the skew problem
> > >>> 2. Regarding a task as a timeslice to improve on scheduler, switching
> a
> > >>> job to another job by timeslice.
> > >>>
> > >>> In MR (Map-Reduce) model, reducings are not balanced, because the
> scale
> > >>> of partitiones are unbalanced. How to balance? We can control the
> size
> > of
> > >>> partition, rehash the bigger parition and combine to the specified
> > size. If
> > >>> a key has many values, it's necessary to execute mapreduce twice.The
> > >>> following is the model digram:
> > >>>
> > >>> Scheduler can regard a task as a timeslice similarly OS scheduler.
> > >>> If a split is bigger than a specified size, it will be splitted
> again.
> > If
> > >>> a split is smaller than a specified size, it will be combined with
> > others,
> > >>> we can name the combining procedure regroup. The combining is logic,
> > it's
> > >>> not necessay to combine these smaller splits to a disk file, which
> will
> > not
> > >>> affect the performance.The target is that every task spent same time
> > >>> running.
> > >>>
> > >>>
> > >>
> > >
> >
>

Re: Map-Balance-Reduce draft

Posted by Todd Lipcon <to...@cloudera.com>.

Hi Jian,

Are you essentially proposing "work stealing"? That is to say, if a map task
is taking longer than others, another map task will be able to grab some of
its work from it after the job is already under way?

-Todd

On Tue, Feb 9, 2010 at 1:23 AM, jian yi <ey...@gmail.com> wrote:

> Just like cutting a cake, we will cut it again if the first is not
> balanced.
> If we can control the size of every task, system will become more simple
> and
> optimizing is possible.
>
> 2010/2/9 jian yi <ey...@gmail.com>
>
> > Hi Alan,
> >
> > In my opinion, MBR solves the skew problem with minimum cost. It is
> differ
> > with combiner. My English is poor, but I do my best to express myself
> > clearly.
> >
> > In MBR, we can set the size of a task, both map task and reduce task. For
> > example, 120~150MB input for a task, the motive is that we can control a
> > task's run-time to keep the size of every task is almost equal, which is
> the
> > precondition that a task can be regarded as a timeslice switched.
> >
> > By hashing output of map to more splits, we can regroup smaller splits to
> a
> > new split which size is specified and hash bigger spits to more small
> > splits, until the size of all splits is within the specified range.
> >
> > Balance interface is like map and reduce interface. Balance interface
> shoud
> > be implemented when a single key has too many values, because the key
> will
> > be hashed to more than one splits. In the case, we can't get the final
> > results in a MBR session, it is necessary to start a next MBR session.
> For a
> > same key, we can hash it with key+value, the action will be telled to
> > balance interface.
> >
> > Regards
> > Jian Yi
> >
> > 2010/2/9 Alan Gates <ga...@yahoo-inc.com>
> >
> > Jian,
> >>
> >> Sorry if any of my questions or comments would have been answered by the
> >> diagrams, but apache lists don't allow attachments, so I can't see your
> >> diagrams.
> >>
> >> If I understand correctly, your suggestion for balancing is to apply
> >> reduce on subsets of the hashed data, and then run reduce again on this
> >> reduced data set.  Is that correct?  If so, how does this differ from
> the
> >> combiner?  Second, some aggregation operations truly aren't algebraic
> (that
> >> is, they cannot be distributed across multiple iterations of reduce).
> An
> >> example of this is session analysis, where the algorithm truly needs to
> see
> >> all operations together to analyze the user session.  How do you propose
> to
> >> handle that case?
> >>
> >> Alan.
> >>
> >>
> >> On Feb 7, 2010, at 11:25 PM, jian yi wrote:
> >>
> >>  Two targets:
> >>> 1. Solving the skew problem
> >>> 2. Regarding a task as a timeslice to improve on scheduler, switching a
> >>> job to another job by timeslice.
> >>>
> >>> In MR (Map-Reduce) model, reducings are not balanced, because the scale
> >>> of partitiones are unbalanced. How to balance? We can control the size
> of
> >>> partition, rehash the bigger parition and combine to the specified
> size. If
> >>> a key has many values, it's necessary to execute mapreduce twice.The
> >>> following is the model digram:
> >>>
> >>> Scheduler can regard a task as a timeslice similarly OS scheduler.
> >>> If a split is bigger than a specified size, it will be splitted again.
> If
> >>> a split is smaller than a specified size, it will be combined with
> others,
> >>> we can name the combining procedure regroup. The combining is logic,
> it's
> >>> not necessay to combine these smaller splits to a disk file, which will
> not
> >>> affect the performance.The target is that every task spent same time
> >>> running.
> >>>
> >>>
> >>
> >
>

Re: Map-Balance-Reduce draft

Posted by jian yi <ey...@gmail.com>.

Just like cutting a cake, we will cut it again if the first is not balanced.
If we can control the size of every task, system will become more simple and
optimizing is possible.

2010/2/9 jian yi <ey...@gmail.com>

> Hi Alan,
>
> In my opinion, MBR solves the skew problem with minimum cost. It is differ
> with combiner. My English is poor, but I do my best to express myself
> clearly.
>
> In MBR, we can set the size of a task, both map task and reduce task. For
> example, 120~150MB input for a task, the motive is that we can control a
> task's run-time to keep the size of every task is almost equal, which is the
> precondition that a task can be regarded as a timeslice switched.
>
> By hashing output of map to more splits, we can regroup smaller splits to a
> new split which size is specified and hash bigger spits to more small
> splits, until the size of all splits is within the specified range.
>
> Balance interface is like map and reduce interface. Balance interface shoud
> be implemented when a single key has too many values, because the key will
> be hashed to more than one splits. In the case, we can't get the final
> results in a MBR session, it is necessary to start a next MBR session. For a
> same key, we can hash it with key+value, the action will be telled to
> balance interface.
>
> Regards
> Jian Yi
>
> 2010/2/9 Alan Gates <ga...@yahoo-inc.com>
>
> Jian,
>>
>> Sorry if any of my questions or comments would have been answered by the
>> diagrams, but apache lists don't allow attachments, so I can't see your
>> diagrams.
>>
>> If I understand correctly, your suggestion for balancing is to apply
>> reduce on subsets of the hashed data, and then run reduce again on this
>> reduced data set.  Is that correct?  If so, how does this differ from the
>> combiner?  Second, some aggregation operations truly aren't algebraic (that
>> is, they cannot be distributed across multiple iterations of reduce).   An
>> example of this is session analysis, where the algorithm truly needs to see
>> all operations together to analyze the user session.  How do you propose to
>> handle that case?
>>
>> Alan.
>>
>>
>> On Feb 7, 2010, at 11:25 PM, jian yi wrote:
>>
>>  Two targets:
>>> 1. Solving the skew problem
>>> 2. Regarding a task as a timeslice to improve on scheduler, switching a
>>> job to another job by timeslice.
>>>
>>> In MR (Map-Reduce) model, reducings are not balanced, because the scale
>>> of partitiones are unbalanced. How to balance? We can control the size of
>>> partition, rehash the bigger parition and combine to the specified size. If
>>> a key has many values, it's necessary to execute mapreduce twice.The
>>> following is the model digram:
>>>
>>> Scheduler can regard a task as a timeslice similarly OS scheduler.
>>> If a split is bigger than a specified size, it will be splitted again. If
>>> a split is smaller than a specified size, it will be combined with others,
>>> we can name the combining procedure regroup. The combining is logic, it's
>>> not necessay to combine these smaller splits to a disk file, which will not
>>> affect the performance.The target is that every task spent same time
>>> running.
>>>
>>>
>>
>

Re: Map-Balance-Reduce draft

Posted by jian yi <ey...@gmail.com>.

Hi Alan,

In my opinion, MBR solves the skew problem with minimum cost. It is differ
with combiner. My English is poor, but I do my best to express myself
clearly.

In MBR, we can set the size of a task, both map task and reduce task. For
example, 120~150MB input for a task, the motive is that we can control a
task's run-time to keep the size of every task is almost equal, which is the
precondition that a task can be regarded as a timeslice switched.

By hashing output of map to more splits, we can regroup smaller splits to a
new split which size is specified and hash bigger spits to more small
splits, until the size of all splits is within the specified range.

Balance interface is like map and reduce interface. Balance interface shoud
be implemented when a single key has too many values, because the key will
be hashed to more than one splits. In the case, we can't get the final
results in a MBR session, it is necessary to start a next MBR session. For a
same key, we can hash it with key+value, the action will be telled to
balance interface.

Regards
Jian Yi

2010/2/9 Alan Gates <ga...@yahoo-inc.com>

> Jian,
>
> Sorry if any of my questions or comments would have been answered by the
> diagrams, but apache lists don't allow attachments, so I can't see your
> diagrams.
>
> If I understand correctly, your suggestion for balancing is to apply reduce
> on subsets of the hashed data, and then run reduce again on this reduced
> data set.  Is that correct?  If so, how does this differ from the combiner?
>  Second, some aggregation operations truly aren't algebraic (that is, they
> cannot be distributed across multiple iterations of reduce).   An example of
> this is session analysis, where the algorithm truly needs to see all
> operations together to analyze the user session.  How do you propose to
> handle that case?
>
> Alan.
>
>
> On Feb 7, 2010, at 11:25 PM, jian yi wrote:
>
>  Two targets:
>> 1. Solving the skew problem
>> 2. Regarding a task as a timeslice to improve on scheduler, switching a
>> job to another job by timeslice.
>>
>> In MR (Map-Reduce) model, reducings are not balanced, because the scale of
>> partitiones are unbalanced. How to balance? We can control the size of
>> partition, rehash the bigger parition and combine to the specified size. If
>> a key has many values, it's necessary to execute mapreduce twice.The
>> following is the model digram:
>>
>> Scheduler can regard a task as a timeslice similarly OS scheduler.
>> If a split is bigger than a specified size, it will be splitted again. If
>> a split is smaller than a specified size, it will be combined with others,
>> we can name the combining procedure regroup. The combining is logic, it's
>> not necessay to combine these smaller splits to a disk file, which will not
>> affect the performance.The target is that every task spent same time
>> running.
>>
>>
>

Re: Map-Balance-Reduce draft

Posted by Alan Gates <ga...@yahoo-inc.com>.

Jian,

Sorry if any of my questions or comments would have been answered by  
the diagrams, but apache lists don't allow attachments, so I can't see  
your diagrams.

If I understand correctly, your suggestion for balancing is to apply  
reduce on subsets of the hashed data, and then run reduce again on  
this reduced data set.  Is that correct?  If so, how does this differ  
from the combiner?  Second, some aggregation operations truly aren't  
algebraic (that is, they cannot be distributed across multiple  
iterations of reduce).   An example of this is session analysis, where  
the algorithm truly needs to see all operations together to analyze  
the user session.  How do you propose to handle that case?

Alan.

On Feb 7, 2010, at 11:25 PM, jian yi wrote:

> Two targets:
> 1. Solving the skew problem
> 2. Regarding a task as a timeslice to improve on scheduler,  
> switching a job to another job by timeslice.
>
> In MR (Map-Reduce) model, reducings are not balanced, because the  
> scale of partitiones are unbalanced. How to balance? We can control  
> the size of partition, rehash the bigger parition and combine to the  
> specified size. If a key has many values, it's necessary to execute  
> mapreduce twice.The following is the model digram:
>
> Scheduler can regard a task as a timeslice similarly OS scheduler.
> If a split is bigger than a specified size, it will be splitted  
> again. If a split is smaller than a specified size, it will be  
> combined with others, we can name the combining procedure regroup.  
> The combining is logic, it's not necessay to combine these smaller  
> splits to a disk file, which will not affect the performance.The  
> target is that every task spent same time running.
>