You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@slider.apache.org by Rajesh Kartha <ka...@gmail.com> on 2015/06/24 01:17:01 UTC

Slider HBase stop question

Hi,

I am trying to understand the shutdown process in Slider and was playing
with the HBase Slider package.

I am able to deploy the HBase app without issues and do the start, stop,
flex etc..no issues there.

One thing I noticed is when the stop is issued, all the HBase components go
away and the respective HBase components like Region server etc,  logs does
not show the graceful shutdown messages that is typically seen.

While there is stop command in the hbase_service.py  script- I am not
seeing it being called.Does the Slider stop command simple requests the
containers to be killed ? How can one achieve a graceful shutdown of the
components ?

For example, in a normal HBase Region server shutdown outside of Slider
once can see messages like these:

2015-06-23 14:21:47,364 INFO  [RpcServer.responder] ipc.RpcServer:
RpcServer.responder: stopped
2015-06-23 14:21:47,364 INFO  [RpcServer.responder] ipc.RpcServer:
RpcServer.responder: stopping
2015-06-23 14:21:47,372 INFO  [regionserver/
bdavm001.svl.ibm.com/9.30.252.1:16020] zookeeper.ZooKeeper: Session:
0x24e09011902002b closed
2015-06-23 14:21:47,372 INFO  [main-EventThread] zookeeper.ClientCnxn:
EventThread shut down
2015-06-23 14:21:47,372 INFO  [regionserver/
bdavm001.svl.ibm.com/9.30.252.1:16020] regionserver.HRegionServer: stopping
server bdavm001.svl.ibm.com,16020,1435013574085; zookeeper connection
closed.
2015-06-23 14:21:47,372 INFO  [regionserver/
bdavm001.svl.ibm.com/9.30.252.1:16020] regionserver.HRegionServer:
regionserver/bdavm001.svl.ibm.com/9.30.252.1:16020 exiting
2015-06-23 14:21:47,372 INFO  [Thread-5] regionserver.ShutdownHook:
Starting fs shutdown hook thread.
2015-06-23 14:21:47,373 INFO  [Thread-5] regionserver.ShutdownHook:
Shutdown hook finished.


Am I missing any settings ? Please share your thoughts/suggestions.

Thanks,
Rajesh

Re: Slider HBase stop question

Posted by Steve Loughran <st...@hortonworks.com>.

that's the one

> On 25 Jun 2015, at 02:42, Rajesh Kartha <ka...@gmail.com> wrote:
> 
> Thanks Steve, I assume this maybe tracked under SLIDER-128 (?)
> 
> On Wed, Jun 24, 2015 at 2:14 AM, Steve Loughran <st...@hortonworks.com>
> wrote:
> 
>> 
>>> On 24 Jun 2015, at 00:17, Rajesh Kartha <ka...@gmail.com> wrote:
>>> 
>>> One thing I noticed is when the stop is issued, all the HBase components
>> go
>>> away and the respective HBase components like Region server etc,  logs
>> does
>>> not show the graceful shutdown messages that is typically seen.
>>> 
>>> While there is stop command in the hbase_service.py  script- I am not
>>> seeing it being called.Does the Slider stop command simple requests the
>>> containers to be killed ? How can one achieve a graceful shutdown of the
>>> components ?
>> 
>> graceful managed shutdowns is one thing we've actually avoided so far,
>> just killing containers instead.
>> 
>> It's a bit of an over-strict adherence to the "crash only software"
>> architecture
>> 
>> https://www.usenix.org/legacy/events/hotos03/tech/full_papers/candea/candea.pdf
>> 
>> ...but its based on the assumption that containers will be going away
>> without warning anyway, a "yarn kill" will kill the application without
>> warning, and when you turn pre-emption on then the rat of container failure
>> can be even higher.
>> 
>> That said, we've got the placeholders for having the agents run scripts to
>> stop applications, they've just not been wired up to date. I think now with
>> the rolling upgrade work -which includes stopping and starting applications
>> in a single container, it's probably time to finish the wiring.
>> 
>> of course, even with that: containers and apps will still go away without
>> any warning...
>>

Re: Slider HBase stop question

Posted by Rajesh Kartha <ka...@gmail.com>.

Thanks Steve, I assume this maybe tracked under SLIDER-128 (?)

On Wed, Jun 24, 2015 at 2:14 AM, Steve Loughran <st...@hortonworks.com>
wrote:

>
> > On 24 Jun 2015, at 00:17, Rajesh Kartha <ka...@gmail.com> wrote:
> >
> > One thing I noticed is when the stop is issued, all the HBase components
> go
> > away and the respective HBase components like Region server etc,  logs
> does
> > not show the graceful shutdown messages that is typically seen.
> >
> > While there is stop command in the hbase_service.py  script- I am not
> > seeing it being called.Does the Slider stop command simple requests the
> > containers to be killed ? How can one achieve a graceful shutdown of the
> > components ?
>
> graceful managed shutdowns is one thing we've actually avoided so far,
> just killing containers instead.
>
> It's a bit of an over-strict adherence to the "crash only software"
> architecture
>
> https://www.usenix.org/legacy/events/hotos03/tech/full_papers/candea/candea.pdf
>
> ...but its based on the assumption that containers will be going away
> without warning anyway, a "yarn kill" will kill the application without
> warning, and when you turn pre-emption on then the rat of container failure
> can be even higher.
>
> That said, we've got the placeholders for having the agents run scripts to
> stop applications, they've just not been wired up to date. I think now with
> the rolling upgrade work -which includes stopping and starting applications
> in a single container, it's probably time to finish the wiring.
>
> of course, even with that: containers and apps will still go away without
> any warning...
>

Re: Slider HBase stop question

Posted by Steve Loughran <st...@hortonworks.com>.

> On 24 Jun 2015, at 00:17, Rajesh Kartha <ka...@gmail.com> wrote:
> 
> One thing I noticed is when the stop is issued, all the HBase components go
> away and the respective HBase components like Region server etc,  logs does
> not show the graceful shutdown messages that is typically seen.
> 
> While there is stop command in the hbase_service.py  script- I am not
> seeing it being called.Does the Slider stop command simple requests the
> containers to be killed ? How can one achieve a graceful shutdown of the
> components ?

graceful managed shutdowns is one thing we've actually avoided so far, just killing containers instead.

It's a bit of an over-strict adherence to the "crash only software" architecture
https://www.usenix.org/legacy/events/hotos03/tech/full_papers/candea/candea.pdf

...but its based on the assumption that containers will be going away without warning anyway, a "yarn kill" will kill the application without warning, and when you turn pre-emption on then the rat of container failure can be even higher.

That said, we've got the placeholders for having the agents run scripts to stop applications, they've just not been wired up to date. I think now with the rolling upgrade work -which includes stopping and starting applications in a single container, it's probably time to finish the wiring.

of course, even with that: containers and apps will still go away without any warning...