You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-dev@hadoop.apache.org by Jeff Zhang <je...@gopivotal.com> on 2014/02/20 03:32:45 UTC

Is there any alternative solution thinking on the event model of YARN

Hi all,

I have studied YARN for several months, and have some thinking on the event
model of YARN.

1.  The event model do help the performance of YARN by allowing async call
2.  But the event model make the boundary of each component unclear. The
event receiver do not know the sender of this event which make the reader
difficult to understand the event flow.
      E.g. in node manager,  there's several event sender and handler which
include container , application, localization server, log aggregation
service and so on.  One component will send event to another component.
Because of the lack of the event sender in receiver, it is not easy to read
the code and understand the event flow.
      The event flow in resource manager is even more complex which involve
the RMApp, RMAppAttempt, RMContainer, RMNode, Scheduler
3.  INHO, the complexity of the event model make new contributor hard to
understand the code base, and hard to maintain the codebase in future. One
small change in the state machine may affect the other component and
difficult to find the cause.

Just wondering is there already some thinking on the event mode of YARN.
And correct me if my understanding if wrong.

Thanks

Jeff Zhang

Re: Is there any alternative solution thinking on the event model of YARN

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.

Sure, we made it private to not jump the gun before there is enough audience. I believe Apache TEZ also uses this. Will file a ticket to move it to be a supported public library.

+Vinod

On Feb 22, 2014, at 8:32 AM, Ted Yu <yu...@gmail.com> wrote:

> VisualizeStateMachine is used to generate the state diagram.
> It is annotated with @Private.
> 
> Can downstream project(s) use it for generation of their own state diagram ?
> 
> Thanks
> 
> 
> On Sat, Feb 22, 2014 at 5:14 AM, haosdent <ha...@gmail.com> wrote:
> 
>>> The event model is so much simpler and mvn -Pvisualize draws out a
>>> beautiful state diagram.
>> 
>> awesome
>> 
>> 
>> On Sat, Feb 22, 2014 at 3:49 AM, Chris Nauroth <cnauroth@hortonworks.com
>>> wrote:
>> 
>>>> The event model is so much simpler and mvn -Pvisualize draws out a
>>>> beautiful state diagram.
>>> 
>>> Oh my goodness.  How have I gone so long without knowing about this?
>> This
>>> is so awesome!  Thanks for the tip, Ravi!
>>> 
>>> Chris Nauroth
>>> Hortonworks
>>> http://hortonworks.com/
>>> 
>>> 
>>> 
>>> On Thu, Feb 20, 2014 at 7:47 PM, Vinod Kumar Vavilapalli <
>>> vinodkv@apache.org
>>>> wrote:
>>> 
>>>> I actually think that the component boundaries are much more cleaner
>> now
>>>> in YARN. Components (mostly) only interact via events and not via
>>>> synchronous method calls which Ravi hinted to. Each event is decorated
>>> with
>>>> its source and destination. This is arguably only using code comments,
>>> but
>>>> if you think it helps, you can pursue
>>>> https://issues.apache.org/jira/browse/YARN-1743.
>>>> 
>>>> The implementation in YARN is in fact loosely modeled around actors.
>> It's
>>>> a custom implementation, we didn't go the full route as we didn't need
>>> to.
>>>> 
>>>> Like Ravi said, it takes a little getting used to. I have seen
>> developers
>>>> beyond the initial set taking a little while getting used to but then
>>> doing
>>>> lots of things much easily after they get a grip on it; specifically
>>>> compared to my experience with devs working aroun Hadoop 1.x code,
>> where
>>> we
>>>> didn't have cleaner component boundaries.
>>>> 
>>>> Let us know if things like YARN-1743 will help. We can do more.
>>> Definitely
>>>> look for the state machines as Ravi mentioned, that can simplify your
>>>> understanding of things a lot.
>>>> 
>>>> +Vinod
>>>> 
>>>> On Feb 20, 2014, at 5:54 PM, Jeff Zhang <je...@gopivotal.com> wrote:
>>>> 
>>>>> Hi Ravi,
>>>>> 
>>>>> Thanks for your reply.  The reason I think another alternative
>> solution
>>>> of
>>>>> event model is that I found that the actor model which is used by
>> spark
>>>> is
>>>>> much easier to read and understand.
>>>>> 
>>>>> Here I will compare 2 differences on usage of these 2 framework ( I
>>> will
>>>>> ignore the performance comparison currently)
>>>>> 
>>>>> 1.  actor explicitly specify the event destination (event handler)
>> when
>>>>> sending message, while it is not clear to know the event handler for
>>> yarn
>>>>> event model
>>>>>    e.g
>>>>>    actor:
>>>>>        actorRef ! message           // it is easy to understand that
>>>>> actorRef is the event destination (event handler)
>>>>>    yarn:
>>>>>        dispatcher.dispatch(message)             //         it's not
>>>> clear
>>>>> who is the event handler, we must to look for the event registration
>>> code
>>>>> which is in other places.
>>>>> 
>>>>> 2. actor has the event source builtin, so it is easy to send the
>>> message
>>>>> back. There's lots of state machines in yarn, and these state
>> machines
>>>>> often send message between each other.   e.g,  ContainerImpl interact
>>>> with
>>>>> ApplicationImpl by sending message.
>>>>>   e.g.
>>>>>   actor:
>>>>>       sender ! message   // sender is message sender actor reference
>>>>> which is builtin in actor, so it is easy to send message back
>>>>> 
>>>>>   yarn:
>>>>>       dispatcher.dispatch(event)  // yarn event model do not know
>> the
>>>>> event source, even he know the source, he still need to rely on the
>>>>> dispatcher to send message.  It is not easy for user to know the
>> event
>>>> flow
>>>>> from this piece of code.
>>>>>       You still need to look for the event registration code to get
>>> know
>>>>> the event handler.
>>>>> 
>>>>> 
>>>>> Let me know if you have any thinking.  Thanks
>>>>> 
>>>>> 
>>>>> Jeff Zhang
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, Feb 21, 2014 at 4:02 AM, Ravi Prakash <ra...@ymail.com>
>>> wrote:
>>>>> 
>>>>>> Hi Jeff!
>>>>>> 
>>>>>> The event model does have some issues, but I believe it has made
>>> things
>>>> a
>>>>>> lot simpler. The source could easily be added to the event object if
>>> you
>>>>>> needed it to. There might be issues with flow control, but I thought
>>>> they
>>>>>> were fixed where they were cropping up.
>>>>>> 
>>>>>> MRv1 had all these method calls which could affect the state in
>>> several
>>>>>> ways, and synchronization and locking was extremely difficult to get
>>>> right
>>>>>> (perhaps only by the select few who completely understood the
>>> codebase).
>>>>>> The event model is so much simpler and mvn -Pvisualize draws out a
>>>>>> beautiful state diagram. It takes a little getting used to, but you
>>> can
>>>>>> connect the debugger and trace through the code too with conditional
>>>>>> breakpoints. This is of course just my opinion.
>>>>>> 
>>>>>> Ravi
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wednesday, February 19, 2014 6:33 PM, Jeff Zhang <
>>>>>> jezhang@gopivotal.com> wrote:
>>>>>> Hi all,
>>>>>> 
>>>>>> I have studied YARN for several months, and have some thinking on
>> the
>>>> event
>>>>>> model of YARN.
>>>>>> 
>>>>>> 1.  The event model do help the performance of YARN by allowing
>> async
>>>> call
>>>>>> 2.  But the event model make the boundary of each component unclear.
>>> The
>>>>>> event receiver do not know the sender of this event which make the
>>>> reader
>>>>>> difficult to understand the event flow.
>>>>>>     E.g. in node manager,  there's several event sender and handler
>>>> which
>>>>>> include container , application, localization server, log
>> aggregation
>>>>>> service and so on.  One component will send event to another
>>> component.
>>>>>> Because of the lack of the event sender in receiver, it is not easy
>> to
>>>> read
>>>>>> the code and understand the event flow.
>>>>>>     The event flow in resource manager is even more complex which
>>>> involve
>>>>>> the RMApp, RMAppAttempt, RMContainer, RMNode, Scheduler
>>>>>> 3.  INHO, the complexity of the event model make new contributor
>> hard
>>> to
>>>>>> understand the code base, and hard to maintain the codebase in
>> future.
>>>> One
>>>>>> small change in the state machine may affect the other component and
>>>>>> difficult to find the cause.
>>>>>> 
>>>>>> Just wondering is there already some thinking on the event mode of
>>> YARN.
>>>>>> And correct me if my understanding if wrong.
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> Jeff Zhang
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>>> --
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>> entity
>>> to
>>>> which it is addressed and may contain information that is confidential,
>>>> privileged and exempt from disclosure under applicable law. If the
>> reader
>>>> of this message is not the intended recipient, you are hereby notified
>>> that
>>>> any printing, copying, dissemination, distribution, disclosure or
>>>> forwarding of this communication is strictly prohibited. If you have
>>>> received this communication in error, please contact the sender
>>> immediately
>>>> and delete it from your system. Thank You.
>>>> 
>>> 
>>> --
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>> to
>>> which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified
>> that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender
>> immediately
>>> and delete it from your system. Thank You.
>>> 
>> 
>> 
>> 
>> --
>> Best Regards,
>> Haosdent Huang
>> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Is there any alternative solution thinking on the event model of YARN

Posted by Ted Yu <yu...@gmail.com>.

VisualizeStateMachine is used to generate the state diagram.
It is annotated with @Private.

Can downstream project(s) use it for generation of their own state diagram ?

Thanks


On Sat, Feb 22, 2014 at 5:14 AM, haosdent <ha...@gmail.com> wrote:

> > The event model is so much simpler and mvn -Pvisualize draws out a
> > beautiful state diagram.
>
> awesome
>
>
> On Sat, Feb 22, 2014 at 3:49 AM, Chris Nauroth <cnauroth@hortonworks.com
> >wrote:
>
> > > The event model is so much simpler and mvn -Pvisualize draws out a
> > > beautiful state diagram.
> >
> > Oh my goodness.  How have I gone so long without knowing about this?
>  This
> > is so awesome!  Thanks for the tip, Ravi!
> >
> > Chris Nauroth
> > Hortonworks
> > http://hortonworks.com/
> >
> >
> >
> > On Thu, Feb 20, 2014 at 7:47 PM, Vinod Kumar Vavilapalli <
> > vinodkv@apache.org
> > > wrote:
> >
> > > I actually think that the component boundaries are much more cleaner
> now
> > > in YARN. Components (mostly) only interact via events and not via
> > > synchronous method calls which Ravi hinted to. Each event is decorated
> > with
> > > its source and destination. This is arguably only using code comments,
> > but
> > > if you think it helps, you can pursue
> > > https://issues.apache.org/jira/browse/YARN-1743.
> > >
> > > The implementation in YARN is in fact loosely modeled around actors.
> It's
> > > a custom implementation, we didn't go the full route as we didn't need
> > to.
> > >
> > > Like Ravi said, it takes a little getting used to. I have seen
> developers
> > > beyond the initial set taking a little while getting used to but then
> > doing
> > > lots of things much easily after they get a grip on it; specifically
> > > compared to my experience with devs working aroun Hadoop 1.x code,
> where
> > we
> > > didn't have cleaner component boundaries.
> > >
> > > Let us know if things like YARN-1743 will help. We can do more.
> > Definitely
> > > look for the state machines as Ravi mentioned, that can simplify your
> > > understanding of things a lot.
> > >
> > > +Vinod
> > >
> > > On Feb 20, 2014, at 5:54 PM, Jeff Zhang <je...@gopivotal.com> wrote:
> > >
> > > > Hi Ravi,
> > > >
> > > > Thanks for your reply.  The reason I think another alternative
> solution
> > > of
> > > > event model is that I found that the actor model which is used by
> spark
> > > is
> > > > much easier to read and understand.
> > > >
> > > > Here I will compare 2 differences on usage of these 2 framework ( I
> > will
> > > > ignore the performance comparison currently)
> > > >
> > > > 1.  actor explicitly specify the event destination (event handler)
> when
> > > > sending message, while it is not clear to know the event handler for
> > yarn
> > > > event model
> > > >     e.g
> > > >     actor:
> > > >         actorRef ! message           // it is easy to understand that
> > > > actorRef is the event destination (event handler)
> > > >     yarn:
> > > >         dispatcher.dispatch(message)             //         it's not
> > > clear
> > > > who is the event handler, we must to look for the event registration
> > code
> > > > which is in other places.
> > > >
> > > > 2. actor has the event source builtin, so it is easy to send the
> > message
> > > > back. There's lots of state machines in yarn, and these state
> machines
> > > > often send message between each other.   e.g,  ContainerImpl interact
> > > with
> > > > ApplicationImpl by sending message.
> > > >    e.g.
> > > >    actor:
> > > >        sender ! message   // sender is message sender actor reference
> > > > which is builtin in actor, so it is easy to send message back
> > > >
> > > >    yarn:
> > > >        dispatcher.dispatch(event)  // yarn event model do not know
> the
> > > > event source, even he know the source, he still need to rely on the
> > > > dispatcher to send message.  It is not easy for user to know the
> event
> > > flow
> > > > from this piece of code.
> > > >        You still need to look for the event registration code to get
> > know
> > > > the event handler.
> > > >
> > > >
> > > > Let me know if you have any thinking.  Thanks
> > > >
> > > >
> > > > Jeff Zhang
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Feb 21, 2014 at 4:02 AM, Ravi Prakash <ra...@ymail.com>
> > wrote:
> > > >
> > > >> Hi Jeff!
> > > >>
> > > >> The event model does have some issues, but I believe it has made
> > things
> > > a
> > > >> lot simpler. The source could easily be added to the event object if
> > you
> > > >> needed it to. There might be issues with flow control, but I thought
> > > they
> > > >> were fixed where they were cropping up.
> > > >>
> > > >> MRv1 had all these method calls which could affect the state in
> > several
> > > >> ways, and synchronization and locking was extremely difficult to get
> > > right
> > > >> (perhaps only by the select few who completely understood the
> > codebase).
> > > >> The event model is so much simpler and mvn -Pvisualize draws out a
> > > >> beautiful state diagram. It takes a little getting used to, but you
> > can
> > > >> connect the debugger and trace through the code too with conditional
> > > >> breakpoints. This is of course just my opinion.
> > > >>
> > > >> Ravi
> > > >>
> > > >>
> > > >>
> > > >>  On Wednesday, February 19, 2014 6:33 PM, Jeff Zhang <
> > > >> jezhang@gopivotal.com> wrote:
> > > >> Hi all,
> > > >>
> > > >> I have studied YARN for several months, and have some thinking on
> the
> > > event
> > > >> model of YARN.
> > > >>
> > > >> 1.  The event model do help the performance of YARN by allowing
> async
> > > call
> > > >> 2.  But the event model make the boundary of each component unclear.
> > The
> > > >> event receiver do not know the sender of this event which make the
> > > reader
> > > >> difficult to understand the event flow.
> > > >>      E.g. in node manager,  there's several event sender and handler
> > > which
> > > >> include container , application, localization server, log
> aggregation
> > > >> service and so on.  One component will send event to another
> > component.
> > > >> Because of the lack of the event sender in receiver, it is not easy
> to
> > > read
> > > >> the code and understand the event flow.
> > > >>      The event flow in resource manager is even more complex which
> > > involve
> > > >> the RMApp, RMAppAttempt, RMContainer, RMNode, Scheduler
> > > >> 3.  INHO, the complexity of the event model make new contributor
> hard
> > to
> > > >> understand the code base, and hard to maintain the codebase in
> future.
> > > One
> > > >> small change in the state machine may affect the other component and
> > > >> difficult to find the cause.
> > > >>
> > > >> Just wondering is there already some thinking on the event mode of
> > YARN.
> > > >> And correct me if my understanding if wrong.
> > > >>
> > > >> Thanks
> > > >>
> > > >> Jeff Zhang
> > > >>
> > > >>
> > > >>
> > >
> > >
> > > --
> > > CONFIDENTIALITY NOTICE
> > > NOTICE: This message is intended for the use of the individual or
> entity
> > to
> > > which it is addressed and may contain information that is confidential,
> > > privileged and exempt from disclosure under applicable law. If the
> reader
> > > of this message is not the intended recipient, you are hereby notified
> > that
> > > any printing, copying, dissemination, distribution, disclosure or
> > > forwarding of this communication is strictly prohibited. If you have
> > > received this communication in error, please contact the sender
> > immediately
> > > and delete it from your system. Thank You.
> > >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>
>
>
> --
> Best Regards,
> Haosdent Huang
>

Re: Is there any alternative solution thinking on the event model of YARN

Posted by haosdent <ha...@gmail.com>.

> The event model is so much simpler and mvn -Pvisualize draws out a
> beautiful state diagram.

awesome


On Sat, Feb 22, 2014 at 3:49 AM, Chris Nauroth <cn...@hortonworks.com>wrote:

> > The event model is so much simpler and mvn -Pvisualize draws out a
> > beautiful state diagram.
>
> Oh my goodness.  How have I gone so long without knowing about this?  This
> is so awesome!  Thanks for the tip, Ravi!
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
> On Thu, Feb 20, 2014 at 7:47 PM, Vinod Kumar Vavilapalli <
> vinodkv@apache.org
> > wrote:
>
> > I actually think that the component boundaries are much more cleaner now
> > in YARN. Components (mostly) only interact via events and not via
> > synchronous method calls which Ravi hinted to. Each event is decorated
> with
> > its source and destination. This is arguably only using code comments,
> but
> > if you think it helps, you can pursue
> > https://issues.apache.org/jira/browse/YARN-1743.
> >
> > The implementation in YARN is in fact loosely modeled around actors. It's
> > a custom implementation, we didn't go the full route as we didn't need
> to.
> >
> > Like Ravi said, it takes a little getting used to. I have seen developers
> > beyond the initial set taking a little while getting used to but then
> doing
> > lots of things much easily after they get a grip on it; specifically
> > compared to my experience with devs working aroun Hadoop 1.x code, where
> we
> > didn't have cleaner component boundaries.
> >
> > Let us know if things like YARN-1743 will help. We can do more.
> Definitely
> > look for the state machines as Ravi mentioned, that can simplify your
> > understanding of things a lot.
> >
> > +Vinod
> >
> > On Feb 20, 2014, at 5:54 PM, Jeff Zhang <je...@gopivotal.com> wrote:
> >
> > > Hi Ravi,
> > >
> > > Thanks for your reply.  The reason I think another alternative solution
> > of
> > > event model is that I found that the actor model which is used by spark
> > is
> > > much easier to read and understand.
> > >
> > > Here I will compare 2 differences on usage of these 2 framework ( I
> will
> > > ignore the performance comparison currently)
> > >
> > > 1.  actor explicitly specify the event destination (event handler) when
> > > sending message, while it is not clear to know the event handler for
> yarn
> > > event model
> > >     e.g
> > >     actor:
> > >         actorRef ! message           // it is easy to understand that
> > > actorRef is the event destination (event handler)
> > >     yarn:
> > >         dispatcher.dispatch(message)             //         it's not
> > clear
> > > who is the event handler, we must to look for the event registration
> code
> > > which is in other places.
> > >
> > > 2. actor has the event source builtin, so it is easy to send the
> message
> > > back. There's lots of state machines in yarn, and these state machines
> > > often send message between each other.   e.g,  ContainerImpl interact
> > with
> > > ApplicationImpl by sending message.
> > >    e.g.
> > >    actor:
> > >        sender ! message   // sender is message sender actor reference
> > > which is builtin in actor, so it is easy to send message back
> > >
> > >    yarn:
> > >        dispatcher.dispatch(event)  // yarn event model do not know the
> > > event source, even he know the source, he still need to rely on the
> > > dispatcher to send message.  It is not easy for user to know the event
> > flow
> > > from this piece of code.
> > >        You still need to look for the event registration code to get
> know
> > > the event handler.
> > >
> > >
> > > Let me know if you have any thinking.  Thanks
> > >
> > >
> > > Jeff Zhang
> > >
> > >
> > >
> > >
> > > On Fri, Feb 21, 2014 at 4:02 AM, Ravi Prakash <ra...@ymail.com>
> wrote:
> > >
> > >> Hi Jeff!
> > >>
> > >> The event model does have some issues, but I believe it has made
> things
> > a
> > >> lot simpler. The source could easily be added to the event object if
> you
> > >> needed it to. There might be issues with flow control, but I thought
> > they
> > >> were fixed where they were cropping up.
> > >>
> > >> MRv1 had all these method calls which could affect the state in
> several
> > >> ways, and synchronization and locking was extremely difficult to get
> > right
> > >> (perhaps only by the select few who completely understood the
> codebase).
> > >> The event model is so much simpler and mvn -Pvisualize draws out a
> > >> beautiful state diagram. It takes a little getting used to, but you
> can
> > >> connect the debugger and trace through the code too with conditional
> > >> breakpoints. This is of course just my opinion.
> > >>
> > >> Ravi
> > >>
> > >>
> > >>
> > >>  On Wednesday, February 19, 2014 6:33 PM, Jeff Zhang <
> > >> jezhang@gopivotal.com> wrote:
> > >> Hi all,
> > >>
> > >> I have studied YARN for several months, and have some thinking on the
> > event
> > >> model of YARN.
> > >>
> > >> 1.  The event model do help the performance of YARN by allowing async
> > call
> > >> 2.  But the event model make the boundary of each component unclear.
> The
> > >> event receiver do not know the sender of this event which make the
> > reader
> > >> difficult to understand the event flow.
> > >>      E.g. in node manager,  there's several event sender and handler
> > which
> > >> include container , application, localization server, log aggregation
> > >> service and so on.  One component will send event to another
> component.
> > >> Because of the lack of the event sender in receiver, it is not easy to
> > read
> > >> the code and understand the event flow.
> > >>      The event flow in resource manager is even more complex which
> > involve
> > >> the RMApp, RMAppAttempt, RMContainer, RMNode, Scheduler
> > >> 3.  INHO, the complexity of the event model make new contributor hard
> to
> > >> understand the code base, and hard to maintain the codebase in future.
> > One
> > >> small change in the state machine may affect the other component and
> > >> difficult to find the cause.
> > >>
> > >> Just wondering is there already some thinking on the event mode of
> YARN.
> > >> And correct me if my understanding if wrong.
> > >>
> > >> Thanks
> > >>
> > >> Jeff Zhang
> > >>
> > >>
> > >>
> >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>



-- 
Best Regards,
Haosdent Huang

Re: Is there any alternative solution thinking on the event model of YARN

Posted by Chris Nauroth <cn...@hortonworks.com>.

> The event model is so much simpler and mvn -Pvisualize draws out a
> beautiful state diagram.

Oh my goodness.  How have I gone so long without knowing about this?  This
is so awesome!  Thanks for the tip, Ravi!

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Thu, Feb 20, 2014 at 7:47 PM, Vinod Kumar Vavilapalli <vinodkv@apache.org
> wrote:

> I actually think that the component boundaries are much more cleaner now
> in YARN. Components (mostly) only interact via events and not via
> synchronous method calls which Ravi hinted to. Each event is decorated with
> its source and destination. This is arguably only using code comments, but
> if you think it helps, you can pursue
> https://issues.apache.org/jira/browse/YARN-1743.
>
> The implementation in YARN is in fact loosely modeled around actors. It's
> a custom implementation, we didn't go the full route as we didn't need to.
>
> Like Ravi said, it takes a little getting used to. I have seen developers
> beyond the initial set taking a little while getting used to but then doing
> lots of things much easily after they get a grip on it; specifically
> compared to my experience with devs working aroun Hadoop 1.x code, where we
> didn't have cleaner component boundaries.
>
> Let us know if things like YARN-1743 will help. We can do more. Definitely
> look for the state machines as Ravi mentioned, that can simplify your
> understanding of things a lot.
>
> +Vinod
>
> On Feb 20, 2014, at 5:54 PM, Jeff Zhang <je...@gopivotal.com> wrote:
>
> > Hi Ravi,
> >
> > Thanks for your reply.  The reason I think another alternative solution
> of
> > event model is that I found that the actor model which is used by spark
> is
> > much easier to read and understand.
> >
> > Here I will compare 2 differences on usage of these 2 framework ( I will
> > ignore the performance comparison currently)
> >
> > 1.  actor explicitly specify the event destination (event handler) when
> > sending message, while it is not clear to know the event handler for yarn
> > event model
> >     e.g
> >     actor:
> >         actorRef ! message           // it is easy to understand that
> > actorRef is the event destination (event handler)
> >     yarn:
> >         dispatcher.dispatch(message)             //         it's not
> clear
> > who is the event handler, we must to look for the event registration code
> > which is in other places.
> >
> > 2. actor has the event source builtin, so it is easy to send the message
> > back. There's lots of state machines in yarn, and these state machines
> > often send message between each other.   e.g,  ContainerImpl interact
> with
> > ApplicationImpl by sending message.
> >    e.g.
> >    actor:
> >        sender ! message   // sender is message sender actor reference
> > which is builtin in actor, so it is easy to send message back
> >
> >    yarn:
> >        dispatcher.dispatch(event)  // yarn event model do not know the
> > event source, even he know the source, he still need to rely on the
> > dispatcher to send message.  It is not easy for user to know the event
> flow
> > from this piece of code.
> >        You still need to look for the event registration code to get know
> > the event handler.
> >
> >
> > Let me know if you have any thinking.  Thanks
> >
> >
> > Jeff Zhang
> >
> >
> >
> >
> > On Fri, Feb 21, 2014 at 4:02 AM, Ravi Prakash <ra...@ymail.com> wrote:
> >
> >> Hi Jeff!
> >>
> >> The event model does have some issues, but I believe it has made things
> a
> >> lot simpler. The source could easily be added to the event object if you
> >> needed it to. There might be issues with flow control, but I thought
> they
> >> were fixed where they were cropping up.
> >>
> >> MRv1 had all these method calls which could affect the state in several
> >> ways, and synchronization and locking was extremely difficult to get
> right
> >> (perhaps only by the select few who completely understood the codebase).
> >> The event model is so much simpler and mvn -Pvisualize draws out a
> >> beautiful state diagram. It takes a little getting used to, but you can
> >> connect the debugger and trace through the code too with conditional
> >> breakpoints. This is of course just my opinion.
> >>
> >> Ravi
> >>
> >>
> >>
> >>  On Wednesday, February 19, 2014 6:33 PM, Jeff Zhang <
> >> jezhang@gopivotal.com> wrote:
> >> Hi all,
> >>
> >> I have studied YARN for several months, and have some thinking on the
> event
> >> model of YARN.
> >>
> >> 1.  The event model do help the performance of YARN by allowing async
> call
> >> 2.  But the event model make the boundary of each component unclear. The
> >> event receiver do not know the sender of this event which make the
> reader
> >> difficult to understand the event flow.
> >>      E.g. in node manager,  there's several event sender and handler
> which
> >> include container , application, localization server, log aggregation
> >> service and so on.  One component will send event to another component.
> >> Because of the lack of the event sender in receiver, it is not easy to
> read
> >> the code and understand the event flow.
> >>      The event flow in resource manager is even more complex which
> involve
> >> the RMApp, RMAppAttempt, RMContainer, RMNode, Scheduler
> >> 3.  INHO, the complexity of the event model make new contributor hard to
> >> understand the code base, and hard to maintain the codebase in future.
> One
> >> small change in the state machine may affect the other component and
> >> difficult to find the cause.
> >>
> >> Just wondering is there already some thinking on the event mode of YARN.
> >> And correct me if my understanding if wrong.
> >>
> >> Thanks
> >>
> >> Jeff Zhang
> >>
> >>
> >>
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Is there any alternative solution thinking on the event model of YARN

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.

I actually think that the component boundaries are much more cleaner now in YARN. Components (mostly) only interact via events and not via synchronous method calls which Ravi hinted to. Each event is decorated with its source and destination. This is arguably only using code comments, but if you think it helps, you can pursue https://issues.apache.org/jira/browse/YARN-1743.

The implementation in YARN is in fact loosely modeled around actors. It's a custom implementation, we didn't go the full route as we didn't need to.

Like Ravi said, it takes a little getting used to. I have seen developers beyond the initial set taking a little while getting used to but then doing lots of things much easily after they get a grip on it; specifically compared to my experience with devs working aroun Hadoop 1.x code, where we didn't have cleaner component boundaries.

Let us know if things like YARN-1743 will help. We can do more. Definitely look for the state machines as Ravi mentioned, that can simplify your understanding of things a lot.

+Vinod

On Feb 20, 2014, at 5:54 PM, Jeff Zhang <je...@gopivotal.com> wrote:

> Hi Ravi,
> 
> Thanks for your reply.  The reason I think another alternative solution of
> event model is that I found that the actor model which is used by spark is
> much easier to read and understand.
> 
> Here I will compare 2 differences on usage of these 2 framework ( I will
> ignore the performance comparison currently)
> 
> 1.  actor explicitly specify the event destination (event handler) when
> sending message, while it is not clear to know the event handler for yarn
> event model
>     e.g
>     actor:
>         actorRef ! message           // it is easy to understand that
> actorRef is the event destination (event handler)
>     yarn:
>         dispatcher.dispatch(message)             //         it's not clear
> who is the event handler, we must to look for the event registration code
> which is in other places.
> 
> 2. actor has the event source builtin, so it is easy to send the message
> back. There's lots of state machines in yarn, and these state machines
> often send message between each other.   e.g,  ContainerImpl interact with
> ApplicationImpl by sending message.
>    e.g.
>    actor:
>        sender ! message   // sender is message sender actor reference
> which is builtin in actor, so it is easy to send message back
> 
>    yarn:
>        dispatcher.dispatch(event)  // yarn event model do not know the
> event source, even he know the source, he still need to rely on the
> dispatcher to send message.  It is not easy for user to know the event flow
> from this piece of code.
>        You still need to look for the event registration code to get know
> the event handler.
> 
> 
> Let me know if you have any thinking.  Thanks
> 
> 
> Jeff Zhang
> 
> 
> 
> 
> On Fri, Feb 21, 2014 at 4:02 AM, Ravi Prakash <ra...@ymail.com> wrote:
> 
>> Hi Jeff!
>> 
>> The event model does have some issues, but I believe it has made things a
>> lot simpler. The source could easily be added to the event object if you
>> needed it to. There might be issues with flow control, but I thought they
>> were fixed where they were cropping up.
>> 
>> MRv1 had all these method calls which could affect the state in several
>> ways, and synchronization and locking was extremely difficult to get right
>> (perhaps only by the select few who completely understood the codebase).
>> The event model is so much simpler and mvn -Pvisualize draws out a
>> beautiful state diagram. It takes a little getting used to, but you can
>> connect the debugger and trace through the code too with conditional
>> breakpoints. This is of course just my opinion.
>> 
>> Ravi
>> 
>> 
>> 
>>  On Wednesday, February 19, 2014 6:33 PM, Jeff Zhang <
>> jezhang@gopivotal.com> wrote:
>> Hi all,
>> 
>> I have studied YARN for several months, and have some thinking on the event
>> model of YARN.
>> 
>> 1.  The event model do help the performance of YARN by allowing async call
>> 2.  But the event model make the boundary of each component unclear. The
>> event receiver do not know the sender of this event which make the reader
>> difficult to understand the event flow.
>>      E.g. in node manager,  there's several event sender and handler which
>> include container , application, localization server, log aggregation
>> service and so on.  One component will send event to another component.
>> Because of the lack of the event sender in receiver, it is not easy to read
>> the code and understand the event flow.
>>      The event flow in resource manager is even more complex which involve
>> the RMApp, RMAppAttempt, RMContainer, RMNode, Scheduler
>> 3.  INHO, the complexity of the event model make new contributor hard to
>> understand the code base, and hard to maintain the codebase in future. One
>> small change in the state machine may affect the other component and
>> difficult to find the cause.
>> 
>> Just wondering is there already some thinking on the event mode of YARN.
>> And correct me if my understanding if wrong.
>> 
>> Thanks
>> 
>> Jeff Zhang
>> 
>> 
>> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Is there any alternative solution thinking on the event model of YARN

Posted by Jeff Zhang <je...@gopivotal.com>.

Hi Ravi,

Thanks for your reply.  The reason I think another alternative solution of
event model is that I found that the actor model which is used by spark is
much easier to read and understand.

Here I will compare 2 differences on usage of these 2 framework ( I will
ignore the performance comparison currently)

1.  actor explicitly specify the event destination (event handler) when
sending message, while it is not clear to know the event handler for yarn
event model
     e.g
     actor:
         actorRef ! message           // it is easy to understand that
actorRef is the event destination (event handler)
     yarn:
         dispatcher.dispatch(message)             //         it's not clear
who is the event handler, we must to look for the event registration code
which is in other places.

2. actor has the event source builtin, so it is easy to send the message
back. There's lots of state machines in yarn, and these state machines
often send message between each other.   e.g,  ContainerImpl interact with
ApplicationImpl by sending message.
    e.g.
    actor:
        sender ! message   // sender is message sender actor reference
which is builtin in actor, so it is easy to send message back

    yarn:
        dispatcher.dispatch(event)  // yarn event model do not know the
event source, even he know the source, he still need to rely on the
dispatcher to send message.  It is not easy for user to know the event flow
from this piece of code.
        You still need to look for the event registration code to get know
the event handler.


Let me know if you have any thinking.  Thanks


Jeff Zhang




On Fri, Feb 21, 2014 at 4:02 AM, Ravi Prakash <ra...@ymail.com> wrote:

> Hi Jeff!
>
> The event model does have some issues, but I believe it has made things a
> lot simpler. The source could easily be added to the event object if you
> needed it to. There might be issues with flow control, but I thought they
> were fixed where they were cropping up.
>
> MRv1 had all these method calls which could affect the state in several
> ways, and synchronization and locking was extremely difficult to get right
> (perhaps only by the select few who completely understood the codebase).
> The event model is so much simpler and mvn -Pvisualize draws out a
> beautiful state diagram. It takes a little getting used to, but you can
> connect the debugger and trace through the code too with conditional
> breakpoints. This is of course just my opinion.
>
> Ravi
>
>
>
>   On Wednesday, February 19, 2014 6:33 PM, Jeff Zhang <
> jezhang@gopivotal.com> wrote:
>  Hi all,
>
> I have studied YARN for several months, and have some thinking on the event
> model of YARN.
>
> 1.  The event model do help the performance of YARN by allowing async call
> 2.  But the event model make the boundary of each component unclear. The
> event receiver do not know the sender of this event which make the reader
> difficult to understand the event flow.
>       E.g. in node manager,  there's several event sender and handler which
> include container , application, localization server, log aggregation
> service and so on.  One component will send event to another component.
> Because of the lack of the event sender in receiver, it is not easy to read
> the code and understand the event flow.
>       The event flow in resource manager is even more complex which involve
> the RMApp, RMAppAttempt, RMContainer, RMNode, Scheduler
> 3.  INHO, the complexity of the event model make new contributor hard to
> understand the code base, and hard to maintain the codebase in future. One
> small change in the state machine may affect the other component and
> difficult to find the cause.
>
> Just wondering is there already some thinking on the event mode of YARN.
> And correct me if my understanding if wrong.
>
> Thanks
>
> Jeff Zhang
>
>
>