You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@twill.apache.org by Yuliya Feldman <yu...@dremio.com> on 2016/12/22 22:20:12 UTC

How/does Twill can survive a restart of TwillClient

Hello there,

I started using Twill recently and I came across couple of issues I wanted
to check on:

1. If I resize YARN cluster to more capacity it can handle I can't resize
down, as it did not satisfy first request

2. If my application that spawns up Twill YARN Cluster restarts (meaning I
am losing YarnTwillRunnerService) I can not get hold of the TwillController
after it even I know runId and what not.

Could anybody advise/confirm/deny on the issues I am seeing?

Thanks in advance

Re: How/does Twill can survive a restart of TwillClient

Posted by Yuliya Feldman <yu...@dremio.com>.
I have created:
https://issues.apache.org/jira/browse/TWILL-202
to limit time on waiting for resources from YARN to allow process requests
in queue.

Thanks,
Yuliya

On Sat, Dec 24, 2016 at 11:24 PM, Yuliya <yu...@dremio.com> wrote:

> Thank you for the replies
>
> Comments inline
>
> > On Dec 24, 2016, at 10:38 PM, Terence Yim <ch...@gmail.com> wrote:
> >
> > Hi,
> >
> > 1. I see what you mean now. The reason why Twill currently wait for all
> the
> > requested containers up and running before changing the number of
> > containers again is mainly to provide a more deterministic state
> transition
> > for runnable lifecyle, in case the application logic is sensitive to
> number
> > of instances. However, I do agree that Twill can provide more flexible
> way
> > to let the application to decide whether waiting is needed or not. Would
> > you mind opening a JIRA for the improvement?
>
> I will open a JIRA - thank you
> >
> > 2. The TwillRunner is decided to survive process restart with the ability
> > to rediscover all the running twill applications via ZooKeeper. However,
> > due to the natural of async operations in ZK, you might need to call
> > "lookup" couple times before all the necessary information is synced up
> > from ZK after the process restarted.
> Interesting - let me try - it does not look like this from the code, but I
> may be missing something
> >
> > Terence
> >
> >> On Fri, Dec 23, 2016 at 11:18 AM, Yuliya Feldman <yu...@dremio.com>
> wrote:
> >>
> >> Thank you very much for the reply
> >> Please see inline
> >>
> >>
> >>> On Fri, Dec 23, 2016 at 11:10 AM, Terence Yim <ch...@gmail.com>
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> 1. It really depends on how much resources that your application need.
> >>> Twill simply act as a bridge between your app and YARN, however, the
> YARN
> >>> cluster itself needs to have enough resources (memory and vcores) to
> run
> >>> your application.
> >> I definitely agree that YARN should have capacity. What I am trying to
> say
> >> is that if I want to change my mind and resize 2nd time before 1st
> request
> >> was satisfied I can not do it. What if I mistyped number of requested
> >> containers - put 100 instead of 10 and YARN will never have this
> capacity.
> >> If I change back to 10 it won't change it unless 100 is satisfied.
> >>
> >>>
> >>> 2. You should be able to do that through the TwillRunner.lookup method.
> >> Do
> >>> you mean you tried but it doesn't return anything?
> >> TwillRunner.lookup works ONLY if application that uses
> TwillRunner.lookup
> >> (YARN/Twill client another words) NEVER restarted - if it restarted all
> the
> >> information is lost and I am not sure how to make TwillRunner to obtain
> it
> >> again from running cluster.
> >>
> >>>
> >>> Terence
> >>>
> >>>> On Thu, Dec 22, 2016 at 2:20 PM, Yuliya Feldman <yu...@dremio.com>
> >>> wrote:
> >>>
> >>>> Hello there,
> >>>>
> >>>> I started using Twill recently and I came across couple of issues I
> >>> wanted
> >>>> to check on:
> >>>>
> >>>> 1. If I resize YARN cluster to more capacity it can handle I can't
> >> resize
> >>>> down, as it did not satisfy first request
> >>>>
> >>>> 2. If my application that spawns up Twill YARN Cluster restarts
> >> (meaning
> >>> I
> >>>> am losing YarnTwillRunnerService) I can not get hold of the
> >>> TwillController
> >>>> after it even I know runId and what not.
> >>>>
> >>>> Could anybody advise/confirm/deny on the issues I am seeing?
> >>>>
> >>>> Thanks in advance
> >>
>

Re: How/does Twill can survive a restart of TwillClient

Posted by Yuliya <yu...@dremio.com>.
Thank you for the replies 

Comments inline 

> On Dec 24, 2016, at 10:38 PM, Terence Yim <ch...@gmail.com> wrote:
> 
> Hi,
> 
> 1. I see what you mean now. The reason why Twill currently wait for all the
> requested containers up and running before changing the number of
> containers again is mainly to provide a more deterministic state transition
> for runnable lifecyle, in case the application logic is sensitive to number
> of instances. However, I do agree that Twill can provide more flexible way
> to let the application to decide whether waiting is needed or not. Would
> you mind opening a JIRA for the improvement?

I will open a JIRA - thank you
> 
> 2. The TwillRunner is decided to survive process restart with the ability
> to rediscover all the running twill applications via ZooKeeper. However,
> due to the natural of async operations in ZK, you might need to call
> "lookup" couple times before all the necessary information is synced up
> from ZK after the process restarted.
Interesting - let me try - it does not look like this from the code, but I may be missing something
> 
> Terence
> 
>> On Fri, Dec 23, 2016 at 11:18 AM, Yuliya Feldman <yu...@dremio.com> wrote:
>> 
>> Thank you very much for the reply
>> Please see inline
>> 
>> 
>>> On Fri, Dec 23, 2016 at 11:10 AM, Terence Yim <ch...@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> 1. It really depends on how much resources that your application need.
>>> Twill simply act as a bridge between your app and YARN, however, the YARN
>>> cluster itself needs to have enough resources (memory and vcores) to run
>>> your application.
>> I definitely agree that YARN should have capacity. What I am trying to say
>> is that if I want to change my mind and resize 2nd time before 1st request
>> was satisfied I can not do it. What if I mistyped number of requested
>> containers - put 100 instead of 10 and YARN will never have this capacity.
>> If I change back to 10 it won't change it unless 100 is satisfied.
>> 
>>> 
>>> 2. You should be able to do that through the TwillRunner.lookup method.
>> Do
>>> you mean you tried but it doesn't return anything?
>> TwillRunner.lookup works ONLY if application that uses TwillRunner.lookup
>> (YARN/Twill client another words) NEVER restarted - if it restarted all the
>> information is lost and I am not sure how to make TwillRunner to obtain it
>> again from running cluster.
>> 
>>> 
>>> Terence
>>> 
>>>> On Thu, Dec 22, 2016 at 2:20 PM, Yuliya Feldman <yu...@dremio.com>
>>> wrote:
>>> 
>>>> Hello there,
>>>> 
>>>> I started using Twill recently and I came across couple of issues I
>>> wanted
>>>> to check on:
>>>> 
>>>> 1. If I resize YARN cluster to more capacity it can handle I can't
>> resize
>>>> down, as it did not satisfy first request
>>>> 
>>>> 2. If my application that spawns up Twill YARN Cluster restarts
>> (meaning
>>> I
>>>> am losing YarnTwillRunnerService) I can not get hold of the
>>> TwillController
>>>> after it even I know runId and what not.
>>>> 
>>>> Could anybody advise/confirm/deny on the issues I am seeing?
>>>> 
>>>> Thanks in advance
>> 

Re: How/does Twill can survive a restart of TwillClient

Posted by Yuliya Feldman <yu...@dremio.com>.
Yes, would be great to have deterministic way and not eventually consistent
:) as it is now. I would imagine first pull from ZK should be inline and
updates can be done as they come using watchers.

On Fri, Jan 6, 2017 at 5:16 PM, Martin Serrano <ma...@attivio.com> wrote:

> I believe that https://issues.apache.org/jira/browse/TWILL-183 should
> address this issue, correct?
>
>
> On 12/25/2016 01:38 AM, Terence Yim wrote:
>
>> 2. The TwillRunner is decided to survive process restart with the ability
>> to rediscover all the running twill applications via ZooKeeper. However,
>> due to the natural of async operations in ZK, you might need to call
>> "lookup" couple times before all the necessary information is synced up
>> from ZK after the process restarted.
>>
>

Re: How/does Twill can survive a restart of TwillClient

Posted by Martin Serrano <ma...@attivio.com>.
I believe that https://issues.apache.org/jira/browse/TWILL-183 should 
address this issue, correct?

On 12/25/2016 01:38 AM, Terence Yim wrote:
> 2. The TwillRunner is decided to survive process restart with the ability
> to rediscover all the running twill applications via ZooKeeper. However,
> due to the natural of async operations in ZK, you might need to call
> "lookup" couple times before all the necessary information is synced up
> from ZK after the process restarted.

Re: How/does Twill can survive a restart of TwillClient

Posted by Terence Yim <ch...@gmail.com>.
Hi,

1. I see what you mean now. The reason why Twill currently wait for all the
requested containers up and running before changing the number of
containers again is mainly to provide a more deterministic state transition
for runnable lifecyle, in case the application logic is sensitive to number
of instances. However, I do agree that Twill can provide more flexible way
to let the application to decide whether waiting is needed or not. Would
you mind opening a JIRA for the improvement?

2. The TwillRunner is decided to survive process restart with the ability
to rediscover all the running twill applications via ZooKeeper. However,
due to the natural of async operations in ZK, you might need to call
"lookup" couple times before all the necessary information is synced up
from ZK after the process restarted.

Terence

On Fri, Dec 23, 2016 at 11:18 AM, Yuliya Feldman <yu...@dremio.com> wrote:

> Thank you very much for the reply
> Please see inline
>
>
> On Fri, Dec 23, 2016 at 11:10 AM, Terence Yim <ch...@gmail.com> wrote:
>
> > Hi,
> >
> > 1. It really depends on how much resources that your application need.
> > Twill simply act as a bridge between your app and YARN, however, the YARN
> > cluster itself needs to have enough resources (memory and vcores) to run
> > your application.
> >
> I definitely agree that YARN should have capacity. What I am trying to say
> is that if I want to change my mind and resize 2nd time before 1st request
> was satisfied I can not do it. What if I mistyped number of requested
> containers - put 100 instead of 10 and YARN will never have this capacity.
> If I change back to 10 it won't change it unless 100 is satisfied.
>
> >
> > 2. You should be able to do that through the TwillRunner.lookup method.
> Do
> > you mean you tried but it doesn't return anything?
> >
> TwillRunner.lookup works ONLY if application that uses TwillRunner.lookup
> (YARN/Twill client another words) NEVER restarted - if it restarted all the
> information is lost and I am not sure how to make TwillRunner to obtain it
> again from running cluster.
>
> >
> > Terence
> >
> > On Thu, Dec 22, 2016 at 2:20 PM, Yuliya Feldman <yu...@dremio.com>
> wrote:
> >
> > > Hello there,
> > >
> > > I started using Twill recently and I came across couple of issues I
> > wanted
> > > to check on:
> > >
> > > 1. If I resize YARN cluster to more capacity it can handle I can't
> resize
> > > down, as it did not satisfy first request
> > >
> > > 2. If my application that spawns up Twill YARN Cluster restarts
> (meaning
> > I
> > > am losing YarnTwillRunnerService) I can not get hold of the
> > TwillController
> > > after it even I know runId and what not.
> > >
> > > Could anybody advise/confirm/deny on the issues I am seeing?
> > >
> > > Thanks in advance
> > >
> >
>

Re: How/does Twill can survive a restart of TwillClient

Posted by Yuliya Feldman <yu...@dremio.com>.
Thank you very much for the reply
Please see inline


On Fri, Dec 23, 2016 at 11:10 AM, Terence Yim <ch...@gmail.com> wrote:

> Hi,
>
> 1. It really depends on how much resources that your application need.
> Twill simply act as a bridge between your app and YARN, however, the YARN
> cluster itself needs to have enough resources (memory and vcores) to run
> your application.
>
I definitely agree that YARN should have capacity. What I am trying to say
is that if I want to change my mind and resize 2nd time before 1st request
was satisfied I can not do it. What if I mistyped number of requested
containers - put 100 instead of 10 and YARN will never have this capacity.
If I change back to 10 it won't change it unless 100 is satisfied.

>
> 2. You should be able to do that through the TwillRunner.lookup method. Do
> you mean you tried but it doesn't return anything?
>
TwillRunner.lookup works ONLY if application that uses TwillRunner.lookup
(YARN/Twill client another words) NEVER restarted - if it restarted all the
information is lost and I am not sure how to make TwillRunner to obtain it
again from running cluster.

>
> Terence
>
> On Thu, Dec 22, 2016 at 2:20 PM, Yuliya Feldman <yu...@dremio.com> wrote:
>
> > Hello there,
> >
> > I started using Twill recently and I came across couple of issues I
> wanted
> > to check on:
> >
> > 1. If I resize YARN cluster to more capacity it can handle I can't resize
> > down, as it did not satisfy first request
> >
> > 2. If my application that spawns up Twill YARN Cluster restarts (meaning
> I
> > am losing YarnTwillRunnerService) I can not get hold of the
> TwillController
> > after it even I know runId and what not.
> >
> > Could anybody advise/confirm/deny on the issues I am seeing?
> >
> > Thanks in advance
> >
>

Re: How/does Twill can survive a restart of TwillClient

Posted by Terence Yim <ch...@gmail.com>.
Hi,

1. It really depends on how much resources that your application need.
Twill simply act as a bridge between your app and YARN, however, the YARN
cluster itself needs to have enough resources (memory and vcores) to run
your application.

2. You should be able to do that through the TwillRunner.lookup method. Do
you mean you tried but it doesn't return anything?

Terence

On Thu, Dec 22, 2016 at 2:20 PM, Yuliya Feldman <yu...@dremio.com> wrote:

> Hello there,
>
> I started using Twill recently and I came across couple of issues I wanted
> to check on:
>
> 1. If I resize YARN cluster to more capacity it can handle I can't resize
> down, as it did not satisfy first request
>
> 2. If my application that spawns up Twill YARN Cluster restarts (meaning I
> am losing YarnTwillRunnerService) I can not get hold of the TwillController
> after it even I know runId and what not.
>
> Could anybody advise/confirm/deny on the issues I am seeing?
>
> Thanks in advance
>