You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-dev@hadoop.apache.org by Venu Gopala Rao <ve...@huawei.com> on 2011/09/08 15:36:27 UTC

Rationale behind Event based design of Next Gen Mapreduce components.

Hi All,

 

    I am going through the Next Gen mapReduce code base. Unlike MRV1 all the
components work based on event dispatching/consumption. Some times I see it
is a long list of events :) .

  

   I would like to understand, is there any specific reason behind this
event based design?

 

Regards

Venu

RE: Rationale behind Event based design of Next Gen Mapreduce components.

Posted by Venu Gopala Rao <ve...@huawei.com>.

Thanks Sharad and Vinod.

-----Original Message-----
From: Sharad [mailto:sharad.apache@gmail.com] 
Sent: Thursday, September 08, 2011 8:11 PM
To: mapreduce-dev@hadoop.apache.org
Cc: mapreduce-dev@hadoop.apache.org; venugopalarao.kotha@huawei.com
Subject: Re: Rationale behind Event based design of Next Gen Mapreduce
components.

Thanks vinod. Just to add it also gives better parallelism and hence  
scalability. OTOH with sync model with so many moving parts it is  
really very hard to design and maintain fine-grained locking.

Sent from my iPhone

On Sep 8, 2011, at 7:33 PM, Vinod Kumar Vavilapalli <vinodkv@hortonworks.com

 > wrote:

> This question needs a long answer and a bit of documentation+wiki  
> pages
> explaining the whole thing which I am working on.
>
> For now, I'll give you a short answer.
>
> If you have worked long enough on MRV1, particularly JobTracker and
> TaskTracker, you will know the complexity of the code w.r.t component
> interactions, synchronization, method calls in and out, management  
> of state
> via ENUMs etc. That was quite some maintenance nightmare if you ask  
> me.
>
> The event model along with the state machines are an effort to  
> manage that
> complexity better.
>
> There is one slide "Event Model in YARN" in Sharad's presentation  
> attached
> on MAPREDUCE-279:
>
https://issues.apache.org/jira/secure/attachment/12485267/hadoop_contributor
s_meet_07_01_2011.pdfthat
> can be served as a starter.
>
> Thanks,
> +Vinod
>
> On Thu, Sep 8, 2011 at 7:06 PM, Venu Gopala Rao <
> venugopalarao.kotha@huawei.com> wrote:
>
>> Hi All,
>>
>>
>>
>>   I am going through the Next Gen mapReduce code base. Unlike MRV1  
>> all the
>> components work based on event dispatching/consumption. Some times  
>> I see it
>> is a long list of events :) .
>>
>>
>>
>>  I would like to understand, is there any specific reason behind this
>> event based design?
>>
>>
>>
>> Regards
>>
>> Venu
>>
>>
>>
>>
>>
>>
>>
>>

Re: Rationale behind Event based design of Next Gen Mapreduce components.

Posted by Sharad <sh...@gmail.com>.

Thanks vinod. Just to add it also gives better parallelism and hence  
scalability. OTOH with sync model with so many moving parts it is  
really very hard to design and maintain fine-grained locking.

Sent from my iPhone

On Sep 8, 2011, at 7:33 PM, Vinod Kumar Vavilapalli <vinodkv@hortonworks.com 
 > wrote:

> This question needs a long answer and a bit of documentation+wiki  
> pages
> explaining the whole thing which I am working on.
>
> For now, I'll give you a short answer.
>
> If you have worked long enough on MRV1, particularly JobTracker and
> TaskTracker, you will know the complexity of the code w.r.t component
> interactions, synchronization, method calls in and out, management  
> of state
> via ENUMs etc. That was quite some maintenance nightmare if you ask  
> me.
>
> The event model along with the state machines are an effort to  
> manage that
> complexity better.
>
> There is one slide "Event Model in YARN" in Sharad's presentation  
> attached
> on MAPREDUCE-279:
> https://issues.apache.org/jira/secure/attachment/12485267/hadoop_contributors_meet_07_01_2011.pdfthat
> can be served as a starter.
>
> Thanks,
> +Vinod
>
> On Thu, Sep 8, 2011 at 7:06 PM, Venu Gopala Rao <
> venugopalarao.kotha@huawei.com> wrote:
>
>> Hi All,
>>
>>
>>
>>   I am going through the Next Gen mapReduce code base. Unlike MRV1  
>> all the
>> components work based on event dispatching/consumption. Some times  
>> I see it
>> is a long list of events :) .
>>
>>
>>
>>  I would like to understand, is there any specific reason behind this
>> event based design?
>>
>>
>>
>> Regards
>>
>> Venu
>>
>>
>>
>>
>>
>>
>>
>>

Re: Rationale behind Event based design of Next Gen Mapreduce components.

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

This question needs a long answer and a bit of documentation+wiki pages
explaining the whole thing which I am working on.

For now, I'll give you a short answer.

If you have worked long enough on MRV1, particularly JobTracker and
TaskTracker, you will know the complexity of the code w.r.t component
interactions, synchronization, method calls in and out, management of state
via ENUMs etc. That was quite some maintenance nightmare if you ask me.

The event model along with the state machines are an effort to manage that
complexity better.

There is one slide "Event Model in YARN" in Sharad's presentation attached
on MAPREDUCE-279:
https://issues.apache.org/jira/secure/attachment/12485267/hadoop_contributors_meet_07_01_2011.pdfthat
can be served as a starter.

Thanks,
+Vinod

On Thu, Sep 8, 2011 at 7:06 PM, Venu Gopala Rao <
venugopalarao.kotha@huawei.com> wrote:

> Hi All,
>
>
>
>    I am going through the Next Gen mapReduce code base. Unlike MRV1 all the
> components work based on event dispatching/consumption. Some times I see it
> is a long list of events :) .
>
>
>
>   I would like to understand, is there any specific reason behind this
> event based design?
>
>
>
> Regards
>
> Venu
>
>
>
>
>
>
>
>