You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@fluo.apache.org by Aish <ai...@gmail.com> on 2020/03/20 20:42:34 UTC

Re: Run Accumulo and Hadoop services under systemd

Hello Christopher and everyone else in this thread,

First of all, I hope everyone is doing well and safe with everything that is going on in the world. I am sorry for such a long delay in responding to this email and thank you all for your valuable inputs. I have now submitted a PR incorporating your suggestions in fluo-muchos. Please take a look when you have time. 

https://github.com/apache/fluo-muchos/pull/334

Thanks,
Aishwarya

On 2019/12/20 05:43:38, Christopher <ct...@apache.org> wrote: 
> On Thu, Dec 19, 2019 at 9:57 PM Aishwarya Thangappa
> <ai...@gmail.com> wrote:
> >
> > Thanks, Christopher. I see your point. The changes to the accumulo-cluster scripts aside,
> >
> > 1. Is there a value in landing the systemd changes in muchos repo? If it is deemed valuable, we can put up a PR with the systemd units as template files and ansible tasks to copy these to the cluster nodes and enable/start them. This will be easy for us to upstream as we already have the work done.
> 
> There is probably some value in that, assuming the use cases Keith
> mentioned aren't made more difficult. But, the details of the changes
> might matter.
> 
> >
> > 2. Alternatively would you find value if we re-worked a set of shell scripts which would do the equivalent of above changes and have a PR opened against the Accumulo repo?
> 
> That would very much depend on the details, but I am wary of adding
> downstream integration tooling directly into Accumulo's main
> repository, even if it had significant added value, rather than have
> such tooling live along side it separately in its own repo (possibly
> as another repo maintained by the Accumulo PMC, or by a community
> member). This is because the Accumulo PMC cannot possibly maintain
> everything of value that is marginally related to Accumulo under its
> own umbrella. I've seen projects try to do that, and it doesn't go
> well.
> 
> >   2.1 . In this case, would reference scripts to do the start/stop operations using systemd similar to that of accumulo-cluster scripts be of value?
> 
> Perhaps yes, but probably not maintained in Accumulo's main repo.
> However, I think it would make a good blog post on Accumulo's website,
> either way.
> 
> >   2.2 . We found that it was necessary to do minor changes to accumulo-service script to support the multiple tserver case. Is there any concerns on modifying it?
> 
> There's a lot to say about accumulo-service, so I'll try to be brief.
> In short, I don't think accumulo-service (and accumulo-cluster) should
> be used for for systemd integration. Work was done in bin/accumulo in
> 2.0 to more easily support downstream integration by dramatically
> simplifying its implementation. This allowed
> accumulo-cluster/accumulo-service to be easily created as one such set
> of "downstream" tools that built off of the simplicity of the new
> bin/accumulo, and which was provided within the main repo as
> convenient out-of-the-box cluster management / service management
> tools for when we build the binary tarball. However, they were not
> intended as integration points for downstream tools... bin/accumulo
> was.
> 
> As for accumulo-service:
> 
> 1. accumulo-service uses old SysV init patterns for managing services,
> none of which are needed under systemd
> 2. it does PIDfile stuff that is unnecessary to do at all with systemd
> (assuming Type=simple, which is what you should probably use, since
> you don't need to background it, not Type=forking; and even if you did
> use forking, systemd has its own way of managing PIDfiles)
> 3. it does custom, manual log file rotation stuff, which we probably
> should never have had in there at all, but definitely isn't needed
> with systemd/journald
> 4. supporting multiple tservers is so much simpler with unit files
> using systemd instances (parameter injection in unit file templates)
> 5. accumulo-service should really only be used by accumulo-cluster, or
> perhaps as part of a suite of legacy SysV init scripts
> 
> accumulo-cluster and accumulo-service go together, and were written
> with a specific use case in mind. Systemd integration is an altogether
> different use case, and I think a much simpler set of tooling could be
> built using systemd and bin/accumulo than it could by trying to use
> accumulo-service in a way it wasn't intended to be used (but
> bin/accumulo was).
> 
> >
> > And, not sure why you are getting a 404 on the gist files. I am able to access them from a private browser window without issues.
> 
> Sorry, I figured this out. The href got mangled in the HTML version of
> the email.
> 
> >
> > On 2019/12/18 01:54:00, Christopher <ct...@apache.org> wrote:
> > > On Tue, Dec 17, 2019 at 8:07 PM Aishwarya Thangappa
> > > <ai...@gmail.com> wrote:
> > > >
> > > > Sorry, I wasn't aware that attachments are not allowed in ASF Mailing lists. I have  now created them as gists. Please have a look.
> > > >
> > > > master systemd unit:  https://gist.github.com/ata18/e8f7577c99cd08ba46544aacef26969f
> > > > accumulo-service: https://gist.github.com/ata18/48014ea78b09e4febb88480ea48ed62c
> > >
> > > These first two links don't work for me. I get a 404 error message.
> > >
> > > For reference, here's the basic unit files I wrote for Accumulo from
> > > Fedora 29: https://src.fedoraproject.org/rpms/accumulo/tree/f29
> > > They used a /usr/bin/accumulo script generated using the
> > > %jpackage_script macro (see accumulo.spec file for that) which worked
> > > a lot like Accumulo 2.0's bin/accumulo file works (not a coincidence,
> > > since the 2.0 script was written with insight gained from the attempt
> > > to package in Fedora).
> > >
> > > > accumulo-cluster: https://gist.github.com/ata18/234c2e63d2718aec65bd2037ec3125cd
> > >
> > > This appears to be based on an older version of our accumulo-cluster
> > > script (from 2.0?) rather than the current one in the master branch,
> > > but I think I got the sense of what was changed by glancing at the
> > > diff. Once you have systemd, I'm not convinced it's beneficial to use
> > > something like accumulo-cluster anymore, as it doesn't really serve
> > > any added value beyond what you would get with using systemctl via
> > > pssh or pdsh and a hostsfile. The accumulo-cluster script's purpose is
> > > for when you don't have an existing service management tool for the
> > > cluster, and its intent is to be very basic, to support the "deploy
> > > out of tarball" use case, with no other vendor or downstream
> > > packaging. Modifying it to wrap systemd seems a bit unnecessarily
> > > complex to me, since I don't think you need it when using systemd.
> > >
> > > It might be better to create a simpler script that makes it easy to
> > > run specific tasks using pdsh or pssh, a hostsfile, to be used when
> > > using systemd, rather than trying to put those features into the
> > > accumulo-cluster script.
> > >
> > > >
> > > > Thanks,
> > > > Aishwarya
> > > >
> > > > On 2019/12/15 16:16:56, Michael Wall <mj...@gmail.com> wrote:
> > > > > Hi Aishwarya,
> > > > >
> > > > > I didn't get any attachments on this.
> > > > >
> > > > > Thanks
> > > > >
> > > > > Mike
> > > > >
> > > > > On Fri, Dec 13, 2019 at 5:46 PM Aishwarya Thangappa
> > > > > <Ai...@microsoft.com.invalid> wrote:
> > > > >
> > > > > > Hello everyone,
> > > > > >
> > > > > > I have not subscribed to the dev mailing list earlier and missed on some
> > > > > > of your questions. I will address them here.
> > > > > >
> > > > > > @Christopher
> > > > > > Most of the changes except the actual installation of the systemd units
> > > > > > could possibly go into Accumulo. These would be the systemd units for
> > > > > > various accumulo services, modification to cluster-wide scripts in accumulo
> > > > > > to use systemd instead of directly starting/stopping the processes. We
> > > > > > would be happy to accommodate/answer any suggestions or follow-up questions
> > > > > > you may have.
> > > > > >
> > > > > > Attached the accumulo_cluster and accumulo_service scripts with systemd
> > > > > > changes.
> > > > > >
> > > > > >
> > > > > > @Keith Turner
> > > > > > Once we determine where the different pieces should land, I can post PRs
> > > > > > accordingly. In our current setup, in muchos.properties file I have added a
> > > > > > `use_systemd` flag which when set to true, will overwrite the accumulo
> > > > > > cluster-wide scripts in the nodes with the attached ones. These files
> > > > > > currently reside at ansible/roles/accumulo/files. If we determine that
> > > > > > these scripts and the systemd unit files will instead go to Accumulo
> > > > > > project, I will have to make changes accordingly.
> > > > > >
> > > > > > @Michael Wall
> > > > > > Systemd units internally call the same scripts that accumulo_cluster
> > > > > > commands currently use. The change is that accumulo_cluster commands would
> > > > > > call systemd start/stop which inturn would call accumulo_service commands.
> > > > > > Attached a sample systemd_unit template. Can you please elaborate if this
> > > > > > is still an issue?
> > > > > >
> > > > > > ------------------------------
> > > > > > *From:* Aishwarya Thangappa
> > > > > > *Sent:* Thursday, December 12, 2019 11:25 AM
> > > > > > *To:* dev@fluo.apache.org <de...@fluo.apache.org>
> > > > > > *Cc:* Arvind Shyamsundar <ar...@microsoft.com>; Billie Rinaldi <
> > > > > > Billie.Rinaldi@microsoft.com>
> > > > > > *Subject:* Run Accumulo and Hadoop services under systemd
> > > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > While using fluo-muchos to deploy an Accumulo cluster, we recognized the
> > > > > > need for various Accumulo and Hadoop services to be run under a service
> > > > > > manager like systemd which will ensure that all these services are brought
> > > > > > up correctly in the event of VM / OS reboots / cold starts. We have made
> > > > > > the required changes for this and would like to contribute it back to the
> > > > > > community if there is any interest around it.
> > > > > >
> > > > > > Summarizing what we have done:
> > > > > >
> > > > > >    - Crafted separate systemd unit files for Accumulo (master, monitor,
> > > > > >    gc, traser, tserver), Hadoop (journalnode, namenode, datanode,
> > > > > >    resourcemanager, nodemanager, zkfc) and Zookeeper services.
> > > > > >    - All of these unit files will be copied to the respective nodes'
> > > > > >    /etc/systemd/system directory; the services will then be started and
> > > > > >    enabled by ansible systemd module.
> > > > > >    - In case of num_tservers > 1, multiple tserver systemd units will be
> > > > > >    copied to the node and each will be independently managed.
> > > > > >    - Also made necessary changes to the existing cluster-wide scripts
> > > > > >    including accumulo_cluster, accumulo_service, start_dfs, start_yarn etc.,
> > > > > >    to have them work seamlessly with sytemd.
> > > > > >
> > > > > > Is there an appetite to look at the details? If so, we can post a PR or if
> > > > > > there are any feedbacks and other considerations, please let us know and we
> > > > > > can discuss them.
> > > > > >
> > > > > > Thanks,
> > > > > > Aishwarya
> > > > > >
> > > > > >
> > > > >
> > >
>