You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by "Ligade, Shailesh [USA]" <Li...@bah.com> on 2021/11/01 13:11:13 UTC

accumulo 1.10 stop-all.sh script

Hello,

I noticed that stop-all.sh script first calls accmulo admin stopAll and then if the servers are still up, it does stop individual servers by going thru masters, gc, slaves etc files.

Since we are using systemd unit files to start the services, and our unit files has restart=always, we can’t cleanly stop the services ☹. I understand the unit files didn’t come with accumulo distribution. So the question is, use of unit file supported and if they are what may be the correct way to issue stopAll? IF anyone can share a good unit file that can be used? or do I need to write my own stopAll script? What may be the main logic of such script (since it calls admin stopAll and that does several different things underneath..)

-S

Re: [External] Re: accumulo 1.10 stop-all.sh script

Posted by Christopher <ct...@apache.org>.
The "accumulo admin stopAll" command is for a graceful shutdown. There
are advantages to shutting down that way. It can reduce the start-up
time and avoid unnecessary write-ahead log recovery. But, it shouldn't
strictly be necessary. You can shut down Accumulo by just killing it,
and it should recovery just fine. It's up to you, based on your risk
tolerance.

But, if you want to keep the graceful shutdown, you need to alter your
systemd unit files in some way to not restart automatically in that
situation.

The failures in systemd could be because SIGTERM, which probably
happens from the stop-all.sh script as well as from `systemctl stop`,
is causing the Java processes to return exit code 143. If you don't
want that to happen, you have to configure your unit script to
explicitly treat that exit code as a successful termination.

If you're using a script that daemonizes/backgrounds the Java process,
then you will need to use Type=forking, but I prefer to use
Type=simple (default for ExecStart) and just execute a script that
does not daemonize Java. It's just simpler, I think, with fewer moving
parts and complex interactions. Systemd is more than capable of
handling the backgrounding and monitoring of the process on its own,
so it's best to use its features directly, rather than use it merely
to wrap a more traditional SysVinit-style daemon script with
backgrounding processes, PID file tracking, STDOUT/STDERR log output
capturing and everything else. Using Systemd directly with
Type=simple, you get all that for free with a lot less effort.


On Mon, Nov 1, 2021 at 11:20 AM Ligade, Shailesh [USA]
<Li...@bah.com> wrote:
>
> Thanks for quick reply.
>
> In reality I have a very similar unit file, only thing difference is I have added
> Type=forking
> Restart=always.
>
> I am also using ExecStart "bin/start-ademon.sh <host> <service>" (as opposed to accmulo master in your unit file)
>
> Yes, I updated the stop-server.sh script (which is called from stop-all.sh) to use systemctl and not kill command. However, the first part of stop-all calls "accumulo admin stopall", do I still need that functionality? If so I can replicate it? I can see in the code it does flush etc..
>
> Also when I run start-all (after stop-all), my systemctl status shows failed, however, accumulo is working great (can scan tables, monitor is up etc)
>
> I also updated start-all to use systemctl but that did not help..
>
> -S
> -----Original Message-----
> From: Christopher <ct...@apache.org>
> Sent: Monday, November 1, 2021 9:47 AM
> To: accumulo-user <us...@accumulo.apache.org>
> Subject: [External] Re: accumulo 1.10 stop-all.sh script
>
> The start-all.sh / stop-all.sh scripts that come with Accumulo 1.10 are just one possible set of out-of-the-box scripts that you could use. If you have written or acquired systemd unit files to manage your services, you may be better off using those instead, and avoiding the built-in scripts entirely.
>
> For me, with unit files, I would probably just do something like `pssh -h <myhostsfile> systemctl stop accumulo-<service>` or similar, rather than use the stop-all.sh script.
>
> If you want to try to shut it down "cleanly" first, then you'll definitely have to remove the "restart=always" line from your systemd unit files. In fact, I'm not sure automatic restarts are ever a good idea, since you won't necessarily have triaged the problem that caused a crash before it tries to restart, and could be perpetuating a failure or making it worse.
>
> You could also modify your launch scripts or unit files to guard on some precondition that must be met before it can be restarted (like the existence of a specific file or some other systemd unit being loaded). Systemd supports lots of conditions to check:
> https://urldefense.com/v3/__https://www.freedesktop.org/software/systemd/man/systemd.unit.html*Conditions*20and*20Asserts__;IyUl!!May37g!Yfrm-rHqBpA30d0Q02VJfQVFeJf7_l9T5CxIvNk0a13QZan_v-weOevKW3weCRpbmw$
> When you want to do a graceful shutdown, you can change the state that is checked by the precondition, so the service doesn't restart.
>
> One example set of very simple unit files was written by me a couple of years ago for 1.x in Fedora. It did not, however, have automatic restarts. These were accompanied by a custom accumulo launch script generated by the %jpackage_script macro. See https://urldefense.com/v3/__https://src.fedoraproject.org/rpms/accumulo/blob/f31/f/accumulo.spec*_369__;Iw!!May37g!Yfrm-rHqBpA30d0Q02VJfQVFeJf7_l9T5CxIvNk0a13QZan_v-weOevKW3x_l1iDhA$
> and https://urldefense.com/v3/__https://src.fedoraproject.org/rpms/accumulo/tree/f31__;!!May37g!Yfrm-rHqBpA30d0Q02VJfQVFeJf7_l9T5CxIvNk0a13QZan_v-weOevKW3zMcjdULA$  ; These may not be better than the unit files you're currently using, though.
>
> On Mon, Nov 1, 2021 at 9:11 AM Ligade, Shailesh [USA] <Li...@bah.com> wrote:
> >
> > Hello,
> >
> >
> >
> > I noticed that stop-all.sh script first calls accmulo admin stopAll and then if the servers are still up, it does stop individual servers by going thru masters, gc, slaves etc files.
> >
> >
> >
> > Since we are using systemd unit files to start the services, and our
> > unit files has restart=always, we can’t cleanly stop the services ☹. I
> > understand the unit files didn’t come with accumulo distribution. So
> > the question is, use of unit file supported and if they are what may
> > be the correct way to issue stopAll? IF anyone can share a good unit file that can be used? or do I need to write my own stopAll script? What may be the main logic of such script (since it calls admin stopAll and that does several different things underneath..)
> >
> >
> >
> > -S

RE: [External] Re: accumulo 1.10 stop-all.sh script

Posted by "Ligade, Shailesh [USA]" <Li...@bah.com>.
Thanks for quick reply.

In reality I have a very similar unit file, only thing difference is I have added 
Type=forking 
Restart=always. 

I am also using ExecStart "bin/start-ademon.sh <host> <service>" (as opposed to accmulo master in your unit file)

Yes, I updated the stop-server.sh script (which is called from stop-all.sh) to use systemctl and not kill command. However, the first part of stop-all calls "accumulo admin stopall", do I still need that functionality? If so I can replicate it? I can see in the code it does flush etc..

Also when I run start-all (after stop-all), my systemctl status shows failed, however, accumulo is working great (can scan tables, monitor is up etc)

I also updated start-all to use systemctl but that did not help..

-S
-----Original Message-----
From: Christopher <ct...@apache.org> 
Sent: Monday, November 1, 2021 9:47 AM
To: accumulo-user <us...@accumulo.apache.org>
Subject: [External] Re: accumulo 1.10 stop-all.sh script

The start-all.sh / stop-all.sh scripts that come with Accumulo 1.10 are just one possible set of out-of-the-box scripts that you could use. If you have written or acquired systemd unit files to manage your services, you may be better off using those instead, and avoiding the built-in scripts entirely.

For me, with unit files, I would probably just do something like `pssh -h <myhostsfile> systemctl stop accumulo-<service>` or similar, rather than use the stop-all.sh script.

If you want to try to shut it down "cleanly" first, then you'll definitely have to remove the "restart=always" line from your systemd unit files. In fact, I'm not sure automatic restarts are ever a good idea, since you won't necessarily have triaged the problem that caused a crash before it tries to restart, and could be perpetuating a failure or making it worse.

You could also modify your launch scripts or unit files to guard on some precondition that must be met before it can be restarted (like the existence of a specific file or some other systemd unit being loaded). Systemd supports lots of conditions to check:
https://urldefense.com/v3/__https://www.freedesktop.org/software/systemd/man/systemd.unit.html*Conditions*20and*20Asserts__;IyUl!!May37g!Yfrm-rHqBpA30d0Q02VJfQVFeJf7_l9T5CxIvNk0a13QZan_v-weOevKW3weCRpbmw$
When you want to do a graceful shutdown, you can change the state that is checked by the precondition, so the service doesn't restart.

One example set of very simple unit files was written by me a couple of years ago for 1.x in Fedora. It did not, however, have automatic restarts. These were accompanied by a custom accumulo launch script generated by the %jpackage_script macro. See https://urldefense.com/v3/__https://src.fedoraproject.org/rpms/accumulo/blob/f31/f/accumulo.spec*_369__;Iw!!May37g!Yfrm-rHqBpA30d0Q02VJfQVFeJf7_l9T5CxIvNk0a13QZan_v-weOevKW3x_l1iDhA$
and https://urldefense.com/v3/__https://src.fedoraproject.org/rpms/accumulo/tree/f31__;!!May37g!Yfrm-rHqBpA30d0Q02VJfQVFeJf7_l9T5CxIvNk0a13QZan_v-weOevKW3zMcjdULA$  ; These may not be better than the unit files you're currently using, though.

On Mon, Nov 1, 2021 at 9:11 AM Ligade, Shailesh [USA] <Li...@bah.com> wrote:
>
> Hello,
>
>
>
> I noticed that stop-all.sh script first calls accmulo admin stopAll and then if the servers are still up, it does stop individual servers by going thru masters, gc, slaves etc files.
>
>
>
> Since we are using systemd unit files to start the services, and our 
> unit files has restart=always, we can’t cleanly stop the services ☹. I 
> understand the unit files didn’t come with accumulo distribution. So 
> the question is, use of unit file supported and if they are what may 
> be the correct way to issue stopAll? IF anyone can share a good unit file that can be used? or do I need to write my own stopAll script? What may be the main logic of such script (since it calls admin stopAll and that does several different things underneath..)
>
>
>
> -S

Re: accumulo 1.10 stop-all.sh script

Posted by Christopher <ct...@apache.org>.
The start-all.sh / stop-all.sh scripts that come with Accumulo 1.10
are just one possible set of out-of-the-box scripts that you could
use. If you have written or acquired systemd unit files to manage your
services, you may be better off using those instead, and avoiding the
built-in scripts entirely.

For me, with unit files, I would probably just do something like `pssh
-h <myhostsfile> systemctl stop accumulo-<service>` or similar, rather
than use the stop-all.sh script.

If you want to try to shut it down "cleanly" first, then you'll
definitely have to remove the "restart=always" line from your systemd
unit files. In fact, I'm not sure automatic restarts are ever a good
idea, since you won't necessarily have triaged the problem that caused
a crash before it tries to restart, and could be perpetuating a
failure or making it worse.

You could also modify your launch scripts or unit files to guard on
some precondition that must be met before it can be restarted (like
the existence of a specific file or some other systemd unit being
loaded). Systemd supports lots of conditions to check:
https://www.freedesktop.org/software/systemd/man/systemd.unit.html#Conditions%20and%20Asserts
When you want to do a graceful shutdown, you can change the state that
is checked by the precondition, so the service doesn't restart.

One example set of very simple unit files was written by me a couple
of years ago for 1.x in Fedora. It did not, however, have automatic
restarts. These were accompanied by a custom accumulo launch script
generated by the %jpackage_script macro. See
https://src.fedoraproject.org/rpms/accumulo/blob/f31/f/accumulo.spec#_369
and https://src.fedoraproject.org/rpms/accumulo/tree/f31 ; These may
not be better than the unit files you're currently using, though.

On Mon, Nov 1, 2021 at 9:11 AM Ligade, Shailesh [USA]
<Li...@bah.com> wrote:
>
> Hello,
>
>
>
> I noticed that stop-all.sh script first calls accmulo admin stopAll and then if the servers are still up, it does stop individual servers by going thru masters, gc, slaves etc files.
>
>
>
> Since we are using systemd unit files to start the services, and our unit files has restart=always, we can’t cleanly stop the services ☹. I understand the unit files didn’t come with accumulo distribution. So the question is, use of unit file supported and if they are what may be the correct way to issue stopAll? IF anyone can share a good unit file that can be used? or do I need to write my own stopAll script? What may be the main logic of such script (since it calls admin stopAll and that does several different things underneath..)
>
>
>
> -S