You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@slider.apache.org by 杨浩 <ya...@gmail.com> on 2014/12/12 12:41:05 UTC

what should be value of slider.am.restart.supported

How to configure the configuration? When set false, sometimes it works, and
sometimes not.

Re: what should be value of slider.am.restart.supported

Posted by Steve Loughran <st...@hortonworks.com>.

On 16 December 2014 at 12:30, 杨浩 <ya...@gmail.com> wrote:

The "killed" in that state graph is a container being killed by the YARN
api, as with "slider kill --force", not the unix "kill" command, which
kills the process and is translated into a failure

    $kill SliderAppMaster # all containers will restart
>

that should not have happened


> experiment 2
>    $kill -6 SliderAppMaster  #only the am will restart
>    $kill SliderAppMaster #only the am will restart
>

this is what we expect. The AM loses some of its history, but rebinds to
all the containers.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: what should be value of slider.am.restart.supported

Posted by 杨浩 <ya...@gmail.com>.

I hava seen some code of YARN ,

RMAppAttemptImpl.java
BaseFinalTransition
        case KILLED:
        {
          // don't leave the tracking URL pointing to a non-existent AM
          appAttempt.setTrackingUrlToRMAppPage();
          appAttempt.invalidateAMHostAndPort();
          appEvent =
              new RMAppFailedAttemptEvent(applicationId,
                  RMAppEventType.ATTEMPT_KILLED,
                  "Application killed by user.", false);
        }
        break;
        case FAILED:
        {
          // don't leave the tracking URL pointing to a non-existent AM
          appAttempt.setTrackingUrlToRMAppPage();
          appAttempt.invalidateAMHostAndPort();
          if (appAttempt.submissionContext
            .getKeepContainersAcrossApplicationAttempts()
              && !appAttempt.isLastAttempt
              && !appAttempt.submissionContext.getUnmanagedAM()) {
            keepContainersAcrossAppAttempts = true;
          }
          appEvent =
              new RMAppFailedAttemptEvent(applicationId,
                RMAppEventType.ATTEMPT_FAILED, appAttempt.getDiagnostics(),
                keepContainersAcrossAppAttempts);

        }

when AM container fails ,it may restart and recover the state
but when it was killed ,it would act on another flow.


Later, I have some experiments on pseudo distributed mode

experiment 1
    $kill SliderAppMaster # all containers will restart
experiment 2
   $kill -6 SliderAppMaster  #only the am will restart
   $kill SliderAppMaster #only the am will restart

it's very instereting .

2014-12-16 18:34 GMT+08:00 Steve Loughran <st...@hortonworks.com>:
>
> the suicide operation is only there for testing, for demonstrating that AM
> restart takes place. It lets us
> 1. kill an AM on a remote cluster where we don't have the rights to SSH in
> and kill processes.
> 2. do it as part of a repeatable sequence, such as here
>
> https://github.com/apache/incubator-slider/blob/develop/slider-funtest/src/test/groovy/org/apache/slider/funtest/lifecycle/AMFailuresIT.groovy#L87
>
> so yes, you are right: it's not needed in normal operation. It's only there
> to help test, verify & demo restart behaviour.
>
> On 15 December 2014 at 07:17, 杨浩 <ya...@gmail.com> wrote:
>
> > I have done an experient, when I kill the am process, all the container
> > related to this applicationmaster will restart. So the  am-suicide method
> > may not be so useful
> >
> > 2014-12-14 13:41 GMT+08:00 杨浩 <ya...@gmail.com>:
> > >
> > > It's very useful for me the configure is gone.
> > > As you know , if the am restart , components will not restart. But the
> am
> > > process may be killed , like the server which runs am may shutdown,
> then
> > > will the component restart?
> > >
> > > 2014-12-12 22:08 GMT+08:00 Steve Loughran <st...@hortonworks.com>:
> > >>
> > >> That's something I think we cut out of the slider code a while back,
> > >> probably before  slider 0.50
> > >>
> > >> It was added so that we could work with versions of Hadoop that didn't
> > >> have
> > >> working support for the YARN AM restart feature didn't try to use it.
> > >>
> > >> Prior to Hadoop 2.4, the fields to enable it weren't in the code the
> > >> client
> > >> used to request the feature, or in the data that came back from YARN
> > when
> > >> the AM Started. We used reflection to try to load the methods if they
> > >> weren't there. For extra fun, the method could be in the hadoop JARs
> on
> > >> the
> > >> client, but not on the server, and as we were using the pre-installed
> > >> hadoop JARs on the server, we could end up setting the option on the
> > >> client, but not have it do anything.
> > >>
> > >> I think the flag was there to tell the tests whether or not the
> feature
> > >> was
> > >> present in the destination cluster, so whether to run tests to kill
> the
> > AM
> > >> and expect it to come back up *retaining the existing containers*
> —that
> > >> is,
> > >> if the AM could be restarted without the running application noticing.
> > >>
> > >> Everything works on Hadoop 2.6, so the option is gone, tests do kill
> the
> > >> AM
> > >> and expect it come back (there's a "slider am-suicide" command for
> > testing
> > >> this).
> > >>
> > >> There's a property "slider.yarn.restart.limit" which sets a limit on
> how
> > >> many times slider should ask to restart; if unset you get the YARN
> limit
> > >> defined by "yarn.resourcemanager.am.max-retries" (plus some windowing
> > >> feature which handles intermittent timeouts over a long running
> > service).
> > >> Set it to 1 and should say "no restarts" (i.e. one attempt to run
> slider
> > >> is
> > >> made -the first)
> > >>
> > >> It's covered in the
> > >> http://slider.incubator.apache.org/docs/client-configuration.html
> docs
> > >>
> > >> -steve
> > >>
> > >>
> > >>
> > >> On 12 December 2014 at 11:41, 杨浩 <ya...@gmail.com> wrote:
> > >>
> > >> > How to configure the configuration? When set false, sometimes it
> > works,
> > >> and
> > >> > sometimes not.
> > >> >
> > >>
> > >> --
> > >> CONFIDENTIALITY NOTICE
> > >> NOTICE: This message is intended for the use of the individual or
> entity
> > >> to
> > >> which it is addressed and may contain information that is
> confidential,
> > >> privileged and exempt from disclosure under applicable law. If the
> > reader
> > >> of this message is not the intended recipient, you are hereby notified
> > >> that
> > >> any printing, copying, dissemination, distribution, disclosure or
> > >> forwarding of this communication is strictly prohibited. If you have
> > >> received this communication in error, please contact the sender
> > >> immediately
> > >> and delete it from your system. Thank You.
> > >>
> > >
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: what should be value of slider.am.restart.supported

Posted by Steve Loughran <st...@hortonworks.com>.

the suicide operation is only there for testing, for demonstrating that AM
restart takes place. It lets us
1. kill an AM on a remote cluster where we don't have the rights to SSH in
and kill processes.
2. do it as part of a repeatable sequence, such as here
https://github.com/apache/incubator-slider/blob/develop/slider-funtest/src/test/groovy/org/apache/slider/funtest/lifecycle/AMFailuresIT.groovy#L87

so yes, you are right: it's not needed in normal operation. It's only there
to help test, verify & demo restart behaviour.

On 15 December 2014 at 07:17, 杨浩 <ya...@gmail.com> wrote:

> I have done an experient, when I kill the am process, all the container
> related to this applicationmaster will restart. So the  am-suicide method
> may not be so useful
>
> 2014-12-14 13:41 GMT+08:00 杨浩 <ya...@gmail.com>:
> >
> > It's very useful for me the configure is gone.
> > As you know , if the am restart , components will not restart. But the am
> > process may be killed , like the server which runs am may shutdown, then
> > will the component restart?
> >
> > 2014-12-12 22:08 GMT+08:00 Steve Loughran <st...@hortonworks.com>:
> >>
> >> That's something I think we cut out of the slider code a while back,
> >> probably before  slider 0.50
> >>
> >> It was added so that we could work with versions of Hadoop that didn't
> >> have
> >> working support for the YARN AM restart feature didn't try to use it.
> >>
> >> Prior to Hadoop 2.4, the fields to enable it weren't in the code the
> >> client
> >> used to request the feature, or in the data that came back from YARN
> when
> >> the AM Started. We used reflection to try to load the methods if they
> >> weren't there. For extra fun, the method could be in the hadoop JARs on
> >> the
> >> client, but not on the server, and as we were using the pre-installed
> >> hadoop JARs on the server, we could end up setting the option on the
> >> client, but not have it do anything.
> >>
> >> I think the flag was there to tell the tests whether or not the feature
> >> was
> >> present in the destination cluster, so whether to run tests to kill the
> AM
> >> and expect it to come back up *retaining the existing containers* —that
> >> is,
> >> if the AM could be restarted without the running application noticing.
> >>
> >> Everything works on Hadoop 2.6, so the option is gone, tests do kill the
> >> AM
> >> and expect it come back (there's a "slider am-suicide" command for
> testing
> >> this).
> >>
> >> There's a property "slider.yarn.restart.limit" which sets a limit on how
> >> many times slider should ask to restart; if unset you get the YARN limit
> >> defined by "yarn.resourcemanager.am.max-retries" (plus some windowing
> >> feature which handles intermittent timeouts over a long running
> service).
> >> Set it to 1 and should say "no restarts" (i.e. one attempt to run slider
> >> is
> >> made -the first)
> >>
> >> It's covered in the
> >> http://slider.incubator.apache.org/docs/client-configuration.html docs
> >>
> >> -steve
> >>
> >>
> >>
> >> On 12 December 2014 at 11:41, 杨浩 <ya...@gmail.com> wrote:
> >>
> >> > How to configure the configuration? When set false, sometimes it
> works,
> >> and
> >> > sometimes not.
> >> >
> >>
> >> --
> >> CONFIDENTIALITY NOTICE
> >> NOTICE: This message is intended for the use of the individual or entity
> >> to
> >> which it is addressed and may contain information that is confidential,
> >> privileged and exempt from disclosure under applicable law. If the
> reader
> >> of this message is not the intended recipient, you are hereby notified
> >> that
> >> any printing, copying, dissemination, distribution, disclosure or
> >> forwarding of this communication is strictly prohibited. If you have
> >> received this communication in error, please contact the sender
> >> immediately
> >> and delete it from your system. Thank You.
> >>
> >
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: what should be value of slider.am.restart.supported

Posted by 杨浩 <ya...@gmail.com>.

I have done an experient, when I kill the am process, all the container
related to this applicationmaster will restart. So the  am-suicide method
may not be so useful

2014-12-14 13:41 GMT+08:00 杨浩 <ya...@gmail.com>:
>
> It's very useful for me the configure is gone.
> As you know , if the am restart , components will not restart. But the am
> process may be killed , like the server which runs am may shutdown, then
> will the component restart?
>
> 2014-12-12 22:08 GMT+08:00 Steve Loughran <st...@hortonworks.com>:
>>
>> That's something I think we cut out of the slider code a while back,
>> probably before  slider 0.50
>>
>> It was added so that we could work with versions of Hadoop that didn't
>> have
>> working support for the YARN AM restart feature didn't try to use it.
>>
>> Prior to Hadoop 2.4, the fields to enable it weren't in the code the
>> client
>> used to request the feature, or in the data that came back from YARN when
>> the AM Started. We used reflection to try to load the methods if they
>> weren't there. For extra fun, the method could be in the hadoop JARs on
>> the
>> client, but not on the server, and as we were using the pre-installed
>> hadoop JARs on the server, we could end up setting the option on the
>> client, but not have it do anything.
>>
>> I think the flag was there to tell the tests whether or not the feature
>> was
>> present in the destination cluster, so whether to run tests to kill the AM
>> and expect it to come back up *retaining the existing containers* —that
>> is,
>> if the AM could be restarted without the running application noticing.
>>
>> Everything works on Hadoop 2.6, so the option is gone, tests do kill the
>> AM
>> and expect it come back (there's a "slider am-suicide" command for testing
>> this).
>>
>> There's a property "slider.yarn.restart.limit" which sets a limit on how
>> many times slider should ask to restart; if unset you get the YARN limit
>> defined by "yarn.resourcemanager.am.max-retries" (plus some windowing
>> feature which handles intermittent timeouts over a long running service).
>> Set it to 1 and should say "no restarts" (i.e. one attempt to run slider
>> is
>> made -the first)
>>
>> It's covered in the
>> http://slider.incubator.apache.org/docs/client-configuration.html docs
>>
>> -steve
>>
>>
>>
>> On 12 December 2014 at 11:41, 杨浩 <ya...@gmail.com> wrote:
>>
>> > How to configure the configuration? When set false, sometimes it works,
>> and
>> > sometimes not.
>> >
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified
>> that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender
>> immediately
>> and delete it from your system. Thank You.
>>
>

Re: what should be value of slider.am.restart.supported

Posted by 杨浩 <ya...@gmail.com>.

It's very useful for me the configure is gone.
As you know , if the am restart , components will not restart. But the am
process may be killed , like the server which runs am may shutdown, then
will the component restart?

2014-12-12 22:08 GMT+08:00 Steve Loughran <st...@hortonworks.com>:
>
> That's something I think we cut out of the slider code a while back,
> probably before  slider 0.50
>
> It was added so that we could work with versions of Hadoop that didn't have
> working support for the YARN AM restart feature didn't try to use it.
>
> Prior to Hadoop 2.4, the fields to enable it weren't in the code the client
> used to request the feature, or in the data that came back from YARN when
> the AM Started. We used reflection to try to load the methods if they
> weren't there. For extra fun, the method could be in the hadoop JARs on the
> client, but not on the server, and as we were using the pre-installed
> hadoop JARs on the server, we could end up setting the option on the
> client, but not have it do anything.
>
> I think the flag was there to tell the tests whether or not the feature was
> present in the destination cluster, so whether to run tests to kill the AM
> and expect it to come back up *retaining the existing containers* —that is,
> if the AM could be restarted without the running application noticing.
>
> Everything works on Hadoop 2.6, so the option is gone, tests do kill the AM
> and expect it come back (there's a "slider am-suicide" command for testing
> this).
>
> There's a property "slider.yarn.restart.limit" which sets a limit on how
> many times slider should ask to restart; if unset you get the YARN limit
> defined by "yarn.resourcemanager.am.max-retries" (plus some windowing
> feature which handles intermittent timeouts over a long running service).
> Set it to 1 and should say "no restarts" (i.e. one attempt to run slider is
> made -the first)
>
> It's covered in the
> http://slider.incubator.apache.org/docs/client-configuration.html docs
>
> -steve
>
>
>
> On 12 December 2014 at 11:41, 杨浩 <ya...@gmail.com> wrote:
>
> > How to configure the configuration? When set false, sometimes it works,
> and
> > sometimes not.
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: what should be value of slider.am.restart.supported

Posted by Steve Loughran <st...@hortonworks.com>.

That's something I think we cut out of the slider code a while back,
probably before  slider 0.50

It was added so that we could work with versions of Hadoop that didn't have
working support for the YARN AM restart feature didn't try to use it.

Prior to Hadoop 2.4, the fields to enable it weren't in the code the client
used to request the feature, or in the data that came back from YARN when
the AM Started. We used reflection to try to load the methods if they
weren't there. For extra fun, the method could be in the hadoop JARs on the
client, but not on the server, and as we were using the pre-installed
hadoop JARs on the server, we could end up setting the option on the
client, but not have it do anything.

I think the flag was there to tell the tests whether or not the feature was
present in the destination cluster, so whether to run tests to kill the AM
and expect it to come back up *retaining the existing containers* —that is,
if the AM could be restarted without the running application noticing.

Everything works on Hadoop 2.6, so the option is gone, tests do kill the AM
and expect it come back (there's a "slider am-suicide" command for testing
this).

There's a property "slider.yarn.restart.limit" which sets a limit on how
many times slider should ask to restart; if unset you get the YARN limit
defined by "yarn.resourcemanager.am.max-retries" (plus some windowing
feature which handles intermittent timeouts over a long running service).
Set it to 1 and should say "no restarts" (i.e. one attempt to run slider is
made -the first)

It's covered in the
http://slider.incubator.apache.org/docs/client-configuration.html docs

-steve

On 12 December 2014 at 11:41, 杨浩 <ya...@gmail.com> wrote:

> How to configure the configuration? When set false, sometimes it works, and
> sometimes not.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: what should be value of slider.am.restart.supported

Posted by Steve Loughran <st...@hortonworks.com>.

That's something I think we cut out of the slider code a while back.

It was added so that we could work with versions of Hadoop that didn't have
working support for the YARN AM restart feature didn't try to use it.

Prior to Hadoop 2.4, the fields to enable it weren't in the code the client
used to request the feature, or in the data that came back from YARN when
the AM Started. We used reflection to try to load the methods if they
weren't there. For extra fun, the method could be in the hadoop JARs on the
client, but not on the server, and as we were using the pre-installed
hadoop JARs on the server, we could end up setting the option on the
client, but not have it do anything.

I think the flag was there to tell the tests whether or not the feature was
present in the destination cluster, so whether to run tests to kill the AM
and expect it to come back up *retaining the existing containers* —that is,
if the AM could be restarted without the running application noticing.

Everything works on Hadoop 2.6, so the option is gone, tests do kill the AM
and expect it come back (there's a "slider am-suicide" command for testing
this).

On 12 December 2014 at 11:41, 杨浩 <ya...@gmail.com> wrote:

> How to configure the configuration? When set false, sometimes it works, and
> sometimes not.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.