You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@slider.apache.org by Krishna Kishore Bonagiri <wr...@gmail.com> on 2015/03/14 04:05:53 UTC

Apache Slider stop function not working

Hi,

  We are using Apache Slider 0.60 and implemented the management operations
start, status, stop, etc. in python script. Everything else is working but
the stop function is not getting invoked when the container is stopped. Is
this a known issue already? or is there any trick to make it work?


Thanks,
Kishore

Re: Apache Slider stop function not working

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Thank you Sumit.

On Sat, Mar 14, 2015 at 9:51 PM, Sumit Mohanty <sm...@hortonworks.com>
wrote:

> This error is usually harmless - it happens when application is being
> stopped (slider stop cl1) and Slider Agents may still be heartbeating with
> the AppMaster.
>
> <snip>
> > impl.AMRMClientAsyncImpl - Interrupted while waiting for queue
> > java.lang.InterruptedException
> >         at
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
> >         at
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052)
> >         at
> >
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> >         at
> >
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:274)
> </snip>
>
> What is not implemented is an explicit call to "stop function in the
> python scripts".
>
> What I was referring to that an attempt is made by the Agent to call stop
> in the python script but it is not guaranteed. The reason it is not
> guaranteed is that the call to stop() and kill of the containers by YARN is
> not co-ordinated.
>
> In summary, the ability to call stop() functions in the python script is
> not implemented. Its in the plan though.
>
> ________________________________________
> From: Ted Yu <yu...@gmail.com>
> Sent: Saturday, March 14, 2015 8:52 AM
> To: dev@slider.incubator.apache.org
> Subject: Re: Apache Slider stop function not working
>
> Kishore:
> Looks like logging was at INFO level.
> Do you mind turning on DEBUG logging ?
>
> Thanks
>
> On Sat, Mar 14, 2015 at 7:39 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
> > Hi Steve,
> >
> >   This is what I see in the AM's log since the STOP command is issued.
> Even
> > though it indicates that STOP command SUCCEEDED, I see that the stop
> > function in my python script is not getting executed. Does the exception
> at
> > the end of this log indicate something?
> >
> > 2015-03-14 07:24:01,202 [IPC Server handler 2 on 39387] INFO
> > appmaster.SliderAppMaster - SliderAppMasterApi.stopCluster: stop
> > command issued:  exit code = 0, SUCCEEDED: stop command issued;
> > 2015-03-14 07:24:02,202 [AmExecutor-006] INFO
> > appmaster.SliderAppMaster - SliderAppMasterApi.stopCluster: stop
> > command issued
> > 2015-03-14 07:24:02,202 [main] INFO  appmaster.SliderAppMaster -
> > Triggering shutdown of the AM: stop command issued:  exit code = 0,
> > SUCCEEDED: stop command issued;
> > 2015-03-14 07:24:02,202 [main] INFO  appmaster.SliderAppMaster -
> > Process has exited with exit code 0 mapped to 0 -ignoring
> > 2015-03-14 07:24:02,202 [main] INFO  workflow.WorkflowCompositeService
> > - Child service completed Service RoleLaunchService in state
> > RoleLaunchService: STOPPED
> > 2015-03-14 07:24:02,202 [main] INFO  state.AppState - Releasing 2
> > containers
> > 2015-03-14 07:24:02,203 [main] INFO  state.AppState - Releasing
> > container. Log:
> >
> >
> http://bdvs1395.svl.ibm.com:19888/jobhistory/logs/bdvs1395.svl.ibm.com:45454/container_1425452295813_0123_01_000002/ctx/bigsql
> > 2015-03-14 07:24:02,203 [main] INFO  state.AppState - Releasing
> > container. Log:
> >
> >
> http://bdvs1395.svl.ibm.com:19888/jobhistory/logs/bdvs1396.svl.ibm.com:45454/container_1425452295813_0123_01_000003/ctx/bigsql
> > 2015-03-14 07:24:02,204 [main] INFO  appmaster.SliderAppMaster -
> > Application completed. Signalling finish to RM
> > 2015-03-14 07:24:02,204 [main] INFO  appmaster.SliderAppMaster -
> > Unregistering AM status=SUCCEEDED message=stop command issued
> > 2015-03-14 07:24:02,209 [main] INFO  impl.AMRMClientImpl - Waiting for
> > application to be successfully unregistered.
> > 2015-03-14 07:24:02,310 [main] INFO  appmaster.SliderAppMaster -
> > Exiting AM; final exit code = 0
> > 2015-03-14 07:24:02,312 [main] INFO  util.ExitUtil - Exiting with status
> 0
> > 2015-03-14 07:24:02,326 [Shutdown] INFO  mortbay.log - Shutdown hook
> > executing
> > 2015-03-14 07:24:02,343 [Shutdown] INFO  mortbay.log - Stopped
> > SslSelectChannelConnector@0.0.0.0:45840
> > 2015-03-14 07:24:02,354 [Thread-1] INFO  mortbay.log - Stopped
> > HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:0
> > 2015-03-14 07:24:02,355 [Shutdown] INFO  mortbay.log - Stopped
> > SslSelectChannelConnector@0.0.0.0:48056
> > 2015-03-14 07:24:02,358 [Shutdown] INFO  mortbay.log - Shutdown hook
> > complete
> > 2015-03-14 07:24:02,364 [Thread-1] INFO  ipc.Server - Stopping server on
> > 39387
> > 2015-03-14 07:24:02,365 [IPC Server listener on 39387] INFO
> > ipc.Server - Stopping IPC Server listener on 39387
> > 2015-03-14 07:24:02,366 [IPC Server Responder] INFO  ipc.Server -
> > Stopping IPC Server Responder
> > 2015-03-14 07:24:02,367 [Thread-1] INFO
> > impl.ContainerManagementProtocolProxy - Opening proxy :
> > bdvs1395.svl.ibm.com:45454
> > 2015-03-14 07:24:02,383 [Thread-1] INFO
> > impl.ContainerManagementProtocolProxy - Opening proxy :
> > bdvs1396.svl.ibm.com:45454
> > 2015-03-14 07:24:02,429 [AMRM Callback Handler Thread] INFO
> > impl.AMRMClientAsyncImpl - Interrupted while waiting for queue
> > java.lang.InterruptedException
> >         at
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
> >         at
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052)
> >         at
> >
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> >         at
> >
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:274)
> > 2015-03-14 07:24:02,432 [AmExecutor-005] INFO  actions.QueueService -
> > QueueService processor terminated
> > 2015-03-14 07:24:02,432 [AmExecutor-006] WARN  actions.ActionStopQueue -
> > STOP
> > 2015-03-14 07:24:02,432 [AmExecutor-006] INFO  actions.QueueExecutor -
> > Queue Executor run() stopped
> >
> >
> > Thanks,
> >
> > Kishore
> >
> >
> >
> > On Sat, Mar 14, 2015 at 7:28 PM, Steve Loughran <st...@hortonworks.com>
> > wrote:
> >
> > >
> > > Sorry, I think we've been creating confusion
> > >
> > > Sumit was referring to the fact that in the app-specific python scripts
> > > inside an app package, there's a stop operation which isn't
> implemented;
> > > the specific component instances currently get destroyed without
> warning
> > > when the slider AM hands back the containers to YARN.
> > >
> > > The CLI "stop" operation is very much supported, and it should work.
> > >
> > > 1. The basic "slider stop cl1" operation is meant to find the running
> > > application and ask it to shut down. If this doesn't work, can we see
> (a)
> > > any stack trace on the client and (b) the tail end of the AM logs.
> > >
> > > 2. "slider stop cl1 --force" skips talking to the slider AM and talks
> to
> > > YARN direct. No matter what's going on inside the application, this
> will
> > > kill it. If it doesn't, there's something gone wrong on the client side
> > > about talking to YARN, or something very very wrong with the YARN
> system
> > > itself. Again, a client-side log will help us review this
> > >
> > > -steve
> > >
> > >
> > > > On 14 Mar 2015, at 07:09, Krishna Kishore Bonagiri <
> > > write2kishore@gmail.com> wrote:
> > > >
> > > > Hi Sumit,
> > > > First of all thanks for the reply.
> > > >
> > > > What we have been trying is this kind of command from CLI.
> > > >  slider stop cl1
> > > >
> > > >  So, as you are saying it doesn't yet work. But what is the other way
> > to
> > > > stop the application? What do you mean by "The only time stop is
> > called,
> > > > today, is when the application is stopped the Slider Agents call
> Stop"?
> > > >
> > > > Kishore
> > > >
> > > > On Sat, Mar 14, 2015 at 10:56 AM, Sumit Mohanty <
> > sumit.mohanty@gmail.com
> > > >
> > > > wrote:
> > > >
> > > >> Stop is not wired up to the Stop command from the CLI. The only time
> > > stop
> > > >> is called, today, is when the application is stopped the Slider
> Agents
> > > call
> > > >> Stop and wait for ~10 seconds before killing the processes.
> > > >>
> > > >> On Fri, Mar 13, 2015 at 8:05 PM, Krishna Kishore Bonagiri <
> > > >> write2kishore@gmail.com> wrote:
> > > >>
> > > >>> Hi,
> > > >>>
> > > >>>  We are using Apache Slider 0.60 and implemented the management
> > > >> operations
> > > >>> start, status, stop, etc. in python script. Everything else is
> > working
> > > >> but
> > > >>> the stop function is not getting invoked when the container is
> > stopped.
> > > >> Is
> > > >>> this a known issue already? or is there any trick to make it work?
> > > >>>
> > > >>>
> > > >>> Thanks,
> > > >>> Kishore
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> thanks
> > > >> Sumit
> > > >>
> > >
> > >
> >
>

Re: Apache Slider stop function not working

Posted by Sumit Mohanty <sm...@hortonworks.com>.
This error is usually harmless - it happens when application is being stopped (slider stop cl1) and Slider Agents may still be heartbeating with the AppMaster.

<snip>
> impl.AMRMClientAsyncImpl - Interrupted while waiting for queue
> java.lang.InterruptedException
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052)
>         at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>         at
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:274)
</snip>

What is not implemented is an explicit call to "stop function in the python scripts".

What I was referring to that an attempt is made by the Agent to call stop in the python script but it is not guaranteed. The reason it is not guaranteed is that the call to stop() and kill of the containers by YARN is not co-ordinated.

In summary, the ability to call stop() functions in the python script is not implemented. Its in the plan though.

________________________________________
From: Ted Yu <yu...@gmail.com>
Sent: Saturday, March 14, 2015 8:52 AM
To: dev@slider.incubator.apache.org
Subject: Re: Apache Slider stop function not working

Kishore:
Looks like logging was at INFO level.
Do you mind turning on DEBUG logging ?

Thanks

On Sat, Mar 14, 2015 at 7:39 AM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi Steve,
>
>   This is what I see in the AM's log since the STOP command is issued. Even
> though it indicates that STOP command SUCCEEDED, I see that the stop
> function in my python script is not getting executed. Does the exception at
> the end of this log indicate something?
>
> 2015-03-14 07:24:01,202 [IPC Server handler 2 on 39387] INFO
> appmaster.SliderAppMaster - SliderAppMasterApi.stopCluster: stop
> command issued:  exit code = 0, SUCCEEDED: stop command issued;
> 2015-03-14 07:24:02,202 [AmExecutor-006] INFO
> appmaster.SliderAppMaster - SliderAppMasterApi.stopCluster: stop
> command issued
> 2015-03-14 07:24:02,202 [main] INFO  appmaster.SliderAppMaster -
> Triggering shutdown of the AM: stop command issued:  exit code = 0,
> SUCCEEDED: stop command issued;
> 2015-03-14 07:24:02,202 [main] INFO  appmaster.SliderAppMaster -
> Process has exited with exit code 0 mapped to 0 -ignoring
> 2015-03-14 07:24:02,202 [main] INFO  workflow.WorkflowCompositeService
> - Child service completed Service RoleLaunchService in state
> RoleLaunchService: STOPPED
> 2015-03-14 07:24:02,202 [main] INFO  state.AppState - Releasing 2
> containers
> 2015-03-14 07:24:02,203 [main] INFO  state.AppState - Releasing
> container. Log:
>
> http://bdvs1395.svl.ibm.com:19888/jobhistory/logs/bdvs1395.svl.ibm.com:45454/container_1425452295813_0123_01_000002/ctx/bigsql
> 2015-03-14 07:24:02,203 [main] INFO  state.AppState - Releasing
> container. Log:
>
> http://bdvs1395.svl.ibm.com:19888/jobhistory/logs/bdvs1396.svl.ibm.com:45454/container_1425452295813_0123_01_000003/ctx/bigsql
> 2015-03-14 07:24:02,204 [main] INFO  appmaster.SliderAppMaster -
> Application completed. Signalling finish to RM
> 2015-03-14 07:24:02,204 [main] INFO  appmaster.SliderAppMaster -
> Unregistering AM status=SUCCEEDED message=stop command issued
> 2015-03-14 07:24:02,209 [main] INFO  impl.AMRMClientImpl - Waiting for
> application to be successfully unregistered.
> 2015-03-14 07:24:02,310 [main] INFO  appmaster.SliderAppMaster -
> Exiting AM; final exit code = 0
> 2015-03-14 07:24:02,312 [main] INFO  util.ExitUtil - Exiting with status 0
> 2015-03-14 07:24:02,326 [Shutdown] INFO  mortbay.log - Shutdown hook
> executing
> 2015-03-14 07:24:02,343 [Shutdown] INFO  mortbay.log - Stopped
> SslSelectChannelConnector@0.0.0.0:45840
> 2015-03-14 07:24:02,354 [Thread-1] INFO  mortbay.log - Stopped
> HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:0
> 2015-03-14 07:24:02,355 [Shutdown] INFO  mortbay.log - Stopped
> SslSelectChannelConnector@0.0.0.0:48056
> 2015-03-14 07:24:02,358 [Shutdown] INFO  mortbay.log - Shutdown hook
> complete
> 2015-03-14 07:24:02,364 [Thread-1] INFO  ipc.Server - Stopping server on
> 39387
> 2015-03-14 07:24:02,365 [IPC Server listener on 39387] INFO
> ipc.Server - Stopping IPC Server listener on 39387
> 2015-03-14 07:24:02,366 [IPC Server Responder] INFO  ipc.Server -
> Stopping IPC Server Responder
> 2015-03-14 07:24:02,367 [Thread-1] INFO
> impl.ContainerManagementProtocolProxy - Opening proxy :
> bdvs1395.svl.ibm.com:45454
> 2015-03-14 07:24:02,383 [Thread-1] INFO
> impl.ContainerManagementProtocolProxy - Opening proxy :
> bdvs1396.svl.ibm.com:45454
> 2015-03-14 07:24:02,429 [AMRM Callback Handler Thread] INFO
> impl.AMRMClientAsyncImpl - Interrupted while waiting for queue
> java.lang.InterruptedException
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052)
>         at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>         at
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:274)
> 2015-03-14 07:24:02,432 [AmExecutor-005] INFO  actions.QueueService -
> QueueService processor terminated
> 2015-03-14 07:24:02,432 [AmExecutor-006] WARN  actions.ActionStopQueue -
> STOP
> 2015-03-14 07:24:02,432 [AmExecutor-006] INFO  actions.QueueExecutor -
> Queue Executor run() stopped
>
>
> Thanks,
>
> Kishore
>
>
>
> On Sat, Mar 14, 2015 at 7:28 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
>
> >
> > Sorry, I think we've been creating confusion
> >
> > Sumit was referring to the fact that in the app-specific python scripts
> > inside an app package, there's a stop operation which isn't implemented;
> > the specific component instances currently get destroyed without warning
> > when the slider AM hands back the containers to YARN.
> >
> > The CLI "stop" operation is very much supported, and it should work.
> >
> > 1. The basic "slider stop cl1" operation is meant to find the running
> > application and ask it to shut down. If this doesn't work, can we see (a)
> > any stack trace on the client and (b) the tail end of the AM logs.
> >
> > 2. "slider stop cl1 --force" skips talking to the slider AM and talks to
> > YARN direct. No matter what's going on inside the application, this will
> > kill it. If it doesn't, there's something gone wrong on the client side
> > about talking to YARN, or something very very wrong with the YARN system
> > itself. Again, a client-side log will help us review this
> >
> > -steve
> >
> >
> > > On 14 Mar 2015, at 07:09, Krishna Kishore Bonagiri <
> > write2kishore@gmail.com> wrote:
> > >
> > > Hi Sumit,
> > > First of all thanks for the reply.
> > >
> > > What we have been trying is this kind of command from CLI.
> > >  slider stop cl1
> > >
> > >  So, as you are saying it doesn't yet work. But what is the other way
> to
> > > stop the application? What do you mean by "The only time stop is
> called,
> > > today, is when the application is stopped the Slider Agents call Stop"?
> > >
> > > Kishore
> > >
> > > On Sat, Mar 14, 2015 at 10:56 AM, Sumit Mohanty <
> sumit.mohanty@gmail.com
> > >
> > > wrote:
> > >
> > >> Stop is not wired up to the Stop command from the CLI. The only time
> > stop
> > >> is called, today, is when the application is stopped the Slider Agents
> > call
> > >> Stop and wait for ~10 seconds before killing the processes.
> > >>
> > >> On Fri, Mar 13, 2015 at 8:05 PM, Krishna Kishore Bonagiri <
> > >> write2kishore@gmail.com> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>>  We are using Apache Slider 0.60 and implemented the management
> > >> operations
> > >>> start, status, stop, etc. in python script. Everything else is
> working
> > >> but
> > >>> the stop function is not getting invoked when the container is
> stopped.
> > >> Is
> > >>> this a known issue already? or is there any trick to make it work?
> > >>>
> > >>>
> > >>> Thanks,
> > >>> Kishore
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> thanks
> > >> Sumit
> > >>
> >
> >
>

Re: Apache Slider stop function not working

Posted by Ted Yu <yu...@gmail.com>.
Kishore:
Looks like logging was at INFO level.
Do you mind turning on DEBUG logging ?

Thanks

On Sat, Mar 14, 2015 at 7:39 AM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi Steve,
>
>   This is what I see in the AM's log since the STOP command is issued. Even
> though it indicates that STOP command SUCCEEDED, I see that the stop
> function in my python script is not getting executed. Does the exception at
> the end of this log indicate something?
>
> 2015-03-14 07:24:01,202 [IPC Server handler 2 on 39387] INFO
> appmaster.SliderAppMaster - SliderAppMasterApi.stopCluster: stop
> command issued:  exit code = 0, SUCCEEDED: stop command issued;
> 2015-03-14 07:24:02,202 [AmExecutor-006] INFO
> appmaster.SliderAppMaster - SliderAppMasterApi.stopCluster: stop
> command issued
> 2015-03-14 07:24:02,202 [main] INFO  appmaster.SliderAppMaster -
> Triggering shutdown of the AM: stop command issued:  exit code = 0,
> SUCCEEDED: stop command issued;
> 2015-03-14 07:24:02,202 [main] INFO  appmaster.SliderAppMaster -
> Process has exited with exit code 0 mapped to 0 -ignoring
> 2015-03-14 07:24:02,202 [main] INFO  workflow.WorkflowCompositeService
> - Child service completed Service RoleLaunchService in state
> RoleLaunchService: STOPPED
> 2015-03-14 07:24:02,202 [main] INFO  state.AppState - Releasing 2
> containers
> 2015-03-14 07:24:02,203 [main] INFO  state.AppState - Releasing
> container. Log:
>
> http://bdvs1395.svl.ibm.com:19888/jobhistory/logs/bdvs1395.svl.ibm.com:45454/container_1425452295813_0123_01_000002/ctx/bigsql
> 2015-03-14 07:24:02,203 [main] INFO  state.AppState - Releasing
> container. Log:
>
> http://bdvs1395.svl.ibm.com:19888/jobhistory/logs/bdvs1396.svl.ibm.com:45454/container_1425452295813_0123_01_000003/ctx/bigsql
> 2015-03-14 07:24:02,204 [main] INFO  appmaster.SliderAppMaster -
> Application completed. Signalling finish to RM
> 2015-03-14 07:24:02,204 [main] INFO  appmaster.SliderAppMaster -
> Unregistering AM status=SUCCEEDED message=stop command issued
> 2015-03-14 07:24:02,209 [main] INFO  impl.AMRMClientImpl - Waiting for
> application to be successfully unregistered.
> 2015-03-14 07:24:02,310 [main] INFO  appmaster.SliderAppMaster -
> Exiting AM; final exit code = 0
> 2015-03-14 07:24:02,312 [main] INFO  util.ExitUtil - Exiting with status 0
> 2015-03-14 07:24:02,326 [Shutdown] INFO  mortbay.log - Shutdown hook
> executing
> 2015-03-14 07:24:02,343 [Shutdown] INFO  mortbay.log - Stopped
> SslSelectChannelConnector@0.0.0.0:45840
> 2015-03-14 07:24:02,354 [Thread-1] INFO  mortbay.log - Stopped
> HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:0
> 2015-03-14 07:24:02,355 [Shutdown] INFO  mortbay.log - Stopped
> SslSelectChannelConnector@0.0.0.0:48056
> 2015-03-14 07:24:02,358 [Shutdown] INFO  mortbay.log - Shutdown hook
> complete
> 2015-03-14 07:24:02,364 [Thread-1] INFO  ipc.Server - Stopping server on
> 39387
> 2015-03-14 07:24:02,365 [IPC Server listener on 39387] INFO
> ipc.Server - Stopping IPC Server listener on 39387
> 2015-03-14 07:24:02,366 [IPC Server Responder] INFO  ipc.Server -
> Stopping IPC Server Responder
> 2015-03-14 07:24:02,367 [Thread-1] INFO
> impl.ContainerManagementProtocolProxy - Opening proxy :
> bdvs1395.svl.ibm.com:45454
> 2015-03-14 07:24:02,383 [Thread-1] INFO
> impl.ContainerManagementProtocolProxy - Opening proxy :
> bdvs1396.svl.ibm.com:45454
> 2015-03-14 07:24:02,429 [AMRM Callback Handler Thread] INFO
> impl.AMRMClientAsyncImpl - Interrupted while waiting for queue
> java.lang.InterruptedException
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052)
>         at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>         at
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:274)
> 2015-03-14 07:24:02,432 [AmExecutor-005] INFO  actions.QueueService -
> QueueService processor terminated
> 2015-03-14 07:24:02,432 [AmExecutor-006] WARN  actions.ActionStopQueue -
> STOP
> 2015-03-14 07:24:02,432 [AmExecutor-006] INFO  actions.QueueExecutor -
> Queue Executor run() stopped
>
>
> Thanks,
>
> Kishore
>
>
>
> On Sat, Mar 14, 2015 at 7:28 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
>
> >
> > Sorry, I think we've been creating confusion
> >
> > Sumit was referring to the fact that in the app-specific python scripts
> > inside an app package, there's a stop operation which isn't implemented;
> > the specific component instances currently get destroyed without warning
> > when the slider AM hands back the containers to YARN.
> >
> > The CLI "stop" operation is very much supported, and it should work.
> >
> > 1. The basic "slider stop cl1" operation is meant to find the running
> > application and ask it to shut down. If this doesn't work, can we see (a)
> > any stack trace on the client and (b) the tail end of the AM logs.
> >
> > 2. "slider stop cl1 --force" skips talking to the slider AM and talks to
> > YARN direct. No matter what's going on inside the application, this will
> > kill it. If it doesn't, there's something gone wrong on the client side
> > about talking to YARN, or something very very wrong with the YARN system
> > itself. Again, a client-side log will help us review this
> >
> > -steve
> >
> >
> > > On 14 Mar 2015, at 07:09, Krishna Kishore Bonagiri <
> > write2kishore@gmail.com> wrote:
> > >
> > > Hi Sumit,
> > > First of all thanks for the reply.
> > >
> > > What we have been trying is this kind of command from CLI.
> > >  slider stop cl1
> > >
> > >  So, as you are saying it doesn't yet work. But what is the other way
> to
> > > stop the application? What do you mean by "The only time stop is
> called,
> > > today, is when the application is stopped the Slider Agents call Stop"?
> > >
> > > Kishore
> > >
> > > On Sat, Mar 14, 2015 at 10:56 AM, Sumit Mohanty <
> sumit.mohanty@gmail.com
> > >
> > > wrote:
> > >
> > >> Stop is not wired up to the Stop command from the CLI. The only time
> > stop
> > >> is called, today, is when the application is stopped the Slider Agents
> > call
> > >> Stop and wait for ~10 seconds before killing the processes.
> > >>
> > >> On Fri, Mar 13, 2015 at 8:05 PM, Krishna Kishore Bonagiri <
> > >> write2kishore@gmail.com> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>>  We are using Apache Slider 0.60 and implemented the management
> > >> operations
> > >>> start, status, stop, etc. in python script. Everything else is
> working
> > >> but
> > >>> the stop function is not getting invoked when the container is
> stopped.
> > >> Is
> > >>> this a known issue already? or is there any trick to make it work?
> > >>>
> > >>>
> > >>> Thanks,
> > >>> Kishore
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> thanks
> > >> Sumit
> > >>
> >
> >
>

Re: Apache Slider stop function not working

Posted by Steve Loughran <st...@hortonworks.com>.
> On 14 Mar 2015, at 14:39, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:
> 
>  This is what I see in the AM's log since the STOP command is issued. Even
> though it indicates that STOP command SUCCEEDED, I see that the stop
> function in my python script is not getting executed. Does the exception at
> the end of this log indicate something?

OK. AM is stopping here, so that bit is working: CLI -> AM -> AM shuts down.

The .py stop script is not being invoked ... which is what Sumit stated: that bit isn't wired up.

Why not? we've not got round to it yet. To run robustly in a cluster with unreliable hosts, your components need to be designed to be killed without warning. This is particularly the case in a queue with pre-emption, as the container is destroyed without even telling the AM until afterwards. We're currently assuming that the components do handle unannounced container destruction, — so the agent isn't forwarding the command

Re: Apache Slider stop function not working

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Steve,

  This is what I see in the AM's log since the STOP command is issued. Even
though it indicates that STOP command SUCCEEDED, I see that the stop
function in my python script is not getting executed. Does the exception at
the end of this log indicate something?

2015-03-14 07:24:01,202 [IPC Server handler 2 on 39387] INFO
appmaster.SliderAppMaster - SliderAppMasterApi.stopCluster: stop
command issued:  exit code = 0, SUCCEEDED: stop command issued;
2015-03-14 07:24:02,202 [AmExecutor-006] INFO
appmaster.SliderAppMaster - SliderAppMasterApi.stopCluster: stop
command issued
2015-03-14 07:24:02,202 [main] INFO  appmaster.SliderAppMaster -
Triggering shutdown of the AM: stop command issued:  exit code = 0,
SUCCEEDED: stop command issued;
2015-03-14 07:24:02,202 [main] INFO  appmaster.SliderAppMaster -
Process has exited with exit code 0 mapped to 0 -ignoring
2015-03-14 07:24:02,202 [main] INFO  workflow.WorkflowCompositeService
- Child service completed Service RoleLaunchService in state
RoleLaunchService: STOPPED
2015-03-14 07:24:02,202 [main] INFO  state.AppState - Releasing 2 containers
2015-03-14 07:24:02,203 [main] INFO  state.AppState - Releasing
container. Log:
http://bdvs1395.svl.ibm.com:19888/jobhistory/logs/bdvs1395.svl.ibm.com:45454/container_1425452295813_0123_01_000002/ctx/bigsql
2015-03-14 07:24:02,203 [main] INFO  state.AppState - Releasing
container. Log:
http://bdvs1395.svl.ibm.com:19888/jobhistory/logs/bdvs1396.svl.ibm.com:45454/container_1425452295813_0123_01_000003/ctx/bigsql
2015-03-14 07:24:02,204 [main] INFO  appmaster.SliderAppMaster -
Application completed. Signalling finish to RM
2015-03-14 07:24:02,204 [main] INFO  appmaster.SliderAppMaster -
Unregistering AM status=SUCCEEDED message=stop command issued
2015-03-14 07:24:02,209 [main] INFO  impl.AMRMClientImpl - Waiting for
application to be successfully unregistered.
2015-03-14 07:24:02,310 [main] INFO  appmaster.SliderAppMaster -
Exiting AM; final exit code = 0
2015-03-14 07:24:02,312 [main] INFO  util.ExitUtil - Exiting with status 0
2015-03-14 07:24:02,326 [Shutdown] INFO  mortbay.log - Shutdown hook executing
2015-03-14 07:24:02,343 [Shutdown] INFO  mortbay.log - Stopped
SslSelectChannelConnector@0.0.0.0:45840
2015-03-14 07:24:02,354 [Thread-1] INFO  mortbay.log - Stopped
HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:0
2015-03-14 07:24:02,355 [Shutdown] INFO  mortbay.log - Stopped
SslSelectChannelConnector@0.0.0.0:48056
2015-03-14 07:24:02,358 [Shutdown] INFO  mortbay.log - Shutdown hook complete
2015-03-14 07:24:02,364 [Thread-1] INFO  ipc.Server - Stopping server on 39387
2015-03-14 07:24:02,365 [IPC Server listener on 39387] INFO
ipc.Server - Stopping IPC Server listener on 39387
2015-03-14 07:24:02,366 [IPC Server Responder] INFO  ipc.Server -
Stopping IPC Server Responder
2015-03-14 07:24:02,367 [Thread-1] INFO
impl.ContainerManagementProtocolProxy - Opening proxy :
bdvs1395.svl.ibm.com:45454
2015-03-14 07:24:02,383 [Thread-1] INFO
impl.ContainerManagementProtocolProxy - Opening proxy :
bdvs1396.svl.ibm.com:45454
2015-03-14 07:24:02,429 [AMRM Callback Handler Thread] INFO
impl.AMRMClientAsyncImpl - Interrupted while waiting for queue
java.lang.InterruptedException
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:274)
2015-03-14 07:24:02,432 [AmExecutor-005] INFO  actions.QueueService -
QueueService processor terminated
2015-03-14 07:24:02,432 [AmExecutor-006] WARN  actions.ActionStopQueue - STOP
2015-03-14 07:24:02,432 [AmExecutor-006] INFO  actions.QueueExecutor -
Queue Executor run() stopped


Thanks,

Kishore



On Sat, Mar 14, 2015 at 7:28 PM, Steve Loughran <st...@hortonworks.com>
wrote:

>
> Sorry, I think we've been creating confusion
>
> Sumit was referring to the fact that in the app-specific python scripts
> inside an app package, there's a stop operation which isn't implemented;
> the specific component instances currently get destroyed without warning
> when the slider AM hands back the containers to YARN.
>
> The CLI "stop" operation is very much supported, and it should work.
>
> 1. The basic "slider stop cl1" operation is meant to find the running
> application and ask it to shut down. If this doesn't work, can we see (a)
> any stack trace on the client and (b) the tail end of the AM logs.
>
> 2. "slider stop cl1 --force" skips talking to the slider AM and talks to
> YARN direct. No matter what's going on inside the application, this will
> kill it. If it doesn't, there's something gone wrong on the client side
> about talking to YARN, or something very very wrong with the YARN system
> itself. Again, a client-side log will help us review this
>
> -steve
>
>
> > On 14 Mar 2015, at 07:09, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
> >
> > Hi Sumit,
> > First of all thanks for the reply.
> >
> > What we have been trying is this kind of command from CLI.
> >  slider stop cl1
> >
> >  So, as you are saying it doesn't yet work. But what is the other way to
> > stop the application? What do you mean by "The only time stop is called,
> > today, is when the application is stopped the Slider Agents call Stop"?
> >
> > Kishore
> >
> > On Sat, Mar 14, 2015 at 10:56 AM, Sumit Mohanty <sumit.mohanty@gmail.com
> >
> > wrote:
> >
> >> Stop is not wired up to the Stop command from the CLI. The only time
> stop
> >> is called, today, is when the application is stopped the Slider Agents
> call
> >> Stop and wait for ~10 seconds before killing the processes.
> >>
> >> On Fri, Mar 13, 2015 at 8:05 PM, Krishna Kishore Bonagiri <
> >> write2kishore@gmail.com> wrote:
> >>
> >>> Hi,
> >>>
> >>>  We are using Apache Slider 0.60 and implemented the management
> >> operations
> >>> start, status, stop, etc. in python script. Everything else is working
> >> but
> >>> the stop function is not getting invoked when the container is stopped.
> >> Is
> >>> this a known issue already? or is there any trick to make it work?
> >>>
> >>>
> >>> Thanks,
> >>> Kishore
> >>>
> >>
> >>
> >>
> >> --
> >> thanks
> >> Sumit
> >>
>
>

Re: Apache Slider stop function not working

Posted by Steve Loughran <st...@hortonworks.com>.
Sorry, I think we've been creating confusion

Sumit was referring to the fact that in the app-specific python scripts inside an app package, there's a stop operation which isn't implemented; the specific component instances currently get destroyed without warning when the slider AM hands back the containers to YARN.

The CLI "stop" operation is very much supported, and it should work.

1. The basic "slider stop cl1" operation is meant to find the running application and ask it to shut down. If this doesn't work, can we see (a) any stack trace on the client and (b) the tail end of the AM logs.

2. "slider stop cl1 --force" skips talking to the slider AM and talks to YARN direct. No matter what's going on inside the application, this will kill it. If it doesn't, there's something gone wrong on the client side about talking to YARN, or something very very wrong with the YARN system itself. Again, a client-side log will help us review this

-steve


> On 14 Mar 2015, at 07:09, Krishna Kishore Bonagiri <wr...@gmail.com> wrote:
> 
> Hi Sumit,
> First of all thanks for the reply.
> 
> What we have been trying is this kind of command from CLI.
>  slider stop cl1
> 
>  So, as you are saying it doesn't yet work. But what is the other way to
> stop the application? What do you mean by "The only time stop is called,
> today, is when the application is stopped the Slider Agents call Stop"?
> 
> Kishore
> 
> On Sat, Mar 14, 2015 at 10:56 AM, Sumit Mohanty <su...@gmail.com>
> wrote:
> 
>> Stop is not wired up to the Stop command from the CLI. The only time stop
>> is called, today, is when the application is stopped the Slider Agents call
>> Stop and wait for ~10 seconds before killing the processes.
>> 
>> On Fri, Mar 13, 2015 at 8:05 PM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>>  We are using Apache Slider 0.60 and implemented the management
>> operations
>>> start, status, stop, etc. in python script. Everything else is working
>> but
>>> the stop function is not getting invoked when the container is stopped.
>> Is
>>> this a known issue already? or is there any trick to make it work?
>>> 
>>> 
>>> Thanks,
>>> Kishore
>>> 
>> 
>> 
>> 
>> --
>> thanks
>> Sumit
>> 


Re: Apache Slider stop function not working

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Sumit,
 First of all thanks for the reply.

 What we have been trying is this kind of command from CLI.
  slider stop cl1

  So, as you are saying it doesn't yet work. But what is the other way to
stop the application? What do you mean by "The only time stop is called,
today, is when the application is stopped the Slider Agents call Stop"?

Kishore

On Sat, Mar 14, 2015 at 10:56 AM, Sumit Mohanty <su...@gmail.com>
wrote:

> Stop is not wired up to the Stop command from the CLI. The only time stop
> is called, today, is when the application is stopped the Slider Agents call
> Stop and wait for ~10 seconds before killing the processes.
>
> On Fri, Mar 13, 2015 at 8:05 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
> > Hi,
> >
> >   We are using Apache Slider 0.60 and implemented the management
> operations
> > start, status, stop, etc. in python script. Everything else is working
> but
> > the stop function is not getting invoked when the container is stopped.
> Is
> > this a known issue already? or is there any trick to make it work?
> >
> >
> > Thanks,
> > Kishore
> >
>
>
>
> --
> thanks
> Sumit
>

Re: Apache Slider stop function not working

Posted by Sumit Mohanty <su...@gmail.com>.
Stop is not wired up to the Stop command from the CLI. The only time stop
is called, today, is when the application is stopped the Slider Agents call
Stop and wait for ~10 seconds before killing the processes.

On Fri, Mar 13, 2015 at 8:05 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi,
>
>   We are using Apache Slider 0.60 and implemented the management operations
> start, status, stop, etc. in python script. Everything else is working but
> the stop function is not getting invoked when the container is stopped. Is
> this a known issue already? or is there any trick to make it work?
>
>
> Thanks,
> Kishore
>



-- 
thanks
Sumit