You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Tomas Repik <tr...@redhat.com> on 2017/07/11 14:08:49 UTC

Three scripts needed to run the server, Why?

Greetings,

I've been working with Cassandra for more than a year but I still wonder about one thing:

To run the server there is a bash script (cassandra) which uses another script (cassandra.in.sh) which uses yet another bash script (cassandra-env.sh).
What is the reason behind this?
Why there is not only a single file setting up the environment and running the server? 

Thanks for your answers

Tomas

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Three scripts needed to run the server, Why?

Posted by Eric Evans <jo...@gmail.com>.

On Wed, Jul 12, 2017 at 5:49 AM, Tomas Repik <tr...@redhat.com> wrote:
> Thanks guys for joining the discussion, I hope you don't mind if I continue to argue a bit more.
>
> The core intelligence and functionality of Cassandra server lays in the Java classes, which reside in jar archives. This is the place where the main functionality updates take place. To ease the use of the classes there is, let's call it "wrapper" script (bin/cassandra), which sets up the environment for the classes to provide the functionality. This wrapper uses two other scripts: one of which sits in bin (the include) and the other in etc (the env file). I agree that the files in bin should not be edited by the users, but the following quotes from the wrapper script state the opposite:
> "Any serious use-case though will likely require customization of the include."
> "Developers and enthusiasts can put a customized include file at ~/.cassandra.in.sh."
> According to these the include file is no different from the environment file. But why would you have two separate files meant for the same purpose?

cassandra-env.sh is meant to be user configuration, whereas
cassandra.in.sh is system configuration.

cassandra.in.sh can be used to customize the behavior of the startup
script for the system you are deploying to; It is used to integrate.
Packages can make customizations here, or you could template it for
use with Puppet, Chef, etc.  Once deployed, you would not edit this
file again.

cassandra-env.sh is configuration for Cassandra that lives above what
is reasonable to configure in the application.  Heap size is a good
example of the sort handled here, something to be passed as an
argument to the JVM, not something you could use cassandra.yaml for.

-- 
Eric Evans
john.eric.evans@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Three scripts needed to run the server, Why?

Posted by Tomas Repik <tr...@redhat.com>.

Thanks guys for joining the discussion, I hope you don't mind if I continue to argue a bit more.

The core intelligence and functionality of Cassandra server lays in the Java classes, which reside in jar archives. This is the place where the main functionality updates take place. To ease the use of the classes there is, let's call it "wrapper" script (bin/cassandra), which sets up the environment for the classes to provide the functionality. This wrapper uses two other scripts: one of which sits in bin (the include) and the other in etc (the env file). I agree that the files in bin should not be edited by the users, but the following quotes from the wrapper script state the opposite: 
"Any serious use-case though will likely require customization of the include."
"Developers and enthusiasts can put a customized include file at ~/.cassandra.in.sh."
According to these the include file is no different from the environment file. But why would you have two separate files meant for the same purpose? What is more important is that to "configure" the options in both scripts the user has to be somewhat familiar with bash. The "bashy" stuff could be well hidden from the user in the wrapper script and the configuration options could be sitting in the cassandra.yaml file in the key-value pairs fashion like the other ones. When solving some issues that the users run into they would provide just a single configuration file and the maintainer would easily reproduce the issue by plugging in the single config file. Regarding the updating, only the wrapper script would be updated of course and the user modified config file would stay untouched in etc directory. Speaking about flexibility and the use-case when there is a upstream default, admin specific and user specific configuration, it is not a problem at all. Making the config file modular would do the job. There won't be any duplicity. In case user does not care about the configuration and just wants to run the server out of the box there are always default options embedded in the java classes.

What do you think? I don't think my solution is ideal and I'd be glad to hear where my assumptions are wrong.

Tomas

----- Original Message -----
> Standard unix/linux systems policy is that editable configurable files
> go under /etc. It is not proper to edit files under /{s}bin or
> /usr/{s}bin. $PATH contains /{s}bin and /usr/{s}bin files as executables
> that can be run by a user, so that's why the basic separation of the
> runnable files and tunable configuration files that are intended to be
> edited.
> 
> There may be multiple executables in /{s}bin and /usr/{s}bin that use
> the common configurations under /etc - they may not be just single
> purpose. If there were all configs contained in each executable script,
> we would be repeating ourselves, as well as possibly creating unexpected
> results, if they are not all aligned by the user.
> 
> Additionally, package managers like apt and rpm should not overwrite
> configuration files, if they have been edited, so hopefully, upgrades
> won't hose a user-edited change under /etc. (Back them up, regardless).
> If there is a fundamental change to the executables it /usr/{s}bin, they
> will be overwritten by package managers, since users are expected to not
> edit those.
> 
> This is all really basic system administration and common policy for
> most different software packages. Group common configs where they are
> meant to be edited and split out various configs when it makes sense or
> they may be utilized by various executables.
> 
> The user may deviate from these common practices as they see fit, but
> may also introduce self inflicted problems. :)
> 
> --
> Kind regards,
> Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Three scripts needed to run the server, Why?

Posted by Michael Shuler <mi...@pbandjelly.org>.

Standard unix/linux systems policy is that editable configurable files
go under /etc. It is not proper to edit files under /{s}bin or
/usr/{s}bin. $PATH contains /{s}bin and /usr/{s}bin files as executables
that can be run by a user, so that's why the basic separation of the
runnable files and tunable configuration files that are intended to be
edited.

There may be multiple executables in /{s}bin and /usr/{s}bin that use
the common configurations under /etc - they may not be just single
purpose. If there were all configs contained in each executable script,
we would be repeating ourselves, as well as possibly creating unexpected
results, if they are not all aligned by the user.

Additionally, package managers like apt and rpm should not overwrite
configuration files, if they have been edited, so hopefully, upgrades
won't hose a user-edited change under /etc. (Back them up, regardless).
If there is a fundamental change to the executables it /usr/{s}bin, they
will be overwritten by package managers, since users are expected to not
edit those.

This is all really basic system administration and common policy for
most different software packages. Group common configs where they are
meant to be edited and split out various configs when it makes sense or
they may be utilized by various executables.

The user may deviate from these common practices as they see fit, but
may also introduce self inflicted problems. :)

-- 
Kind regards,
Michael

On 07/11/2017 09:39 AM, Tomas Repik wrote:
> Thanks for the answer, it did not help much. I have read this several
> times and this I already know, It still does not answer the question,
> why there is the need for three files instead of a single file. Not
> to mention multiple different config files. All these files are more
> or less configuration file which set up the environment and
> properties of the server. Why can't there be a single file that one
> would modify in order to tweak the server to his or her needs. In the
> current situation you have to search many different files to find the
> place where the option is configured.
> 
> ----- Original Message -----
>> 
>> The bin/cassandra script has an explanation 
>> (https://github.com/apache/cassandra/blob/trunk/bin/cassandra#L24):
>>
>>
>> 
# As a convenience, a fragment of shell is sourced in order to set one or
>> # more of these variables. This so-called `include' can be placed
>> in a # number of locations and will be searched for in order. The
>> lowest # priority search path is the same directory as the startup
>> script, and # since this is the location of the sample in the
>> project tree, it should # almost work Out Of The Box. # # Any
>> serious use-case though will likely require customization of the #
>> include. For production installations, it is recommended that you
>> copy # the sample to one of /usr/share/cassandra/cassandra.in.sh, #
>> /usr/local/share/cassandra/cassandra.in.sh, or #
>> /opt/cassandra/cassandra.in.sh and make your modifications there. 
>> # #[...] # # If you would rather configure startup entirely from
>> the environment, you # can disable the include by exporting an
>> empty CASSANDRA_INCLUDE, or by # ensuring that no include files
>> exist in the aforementioned search list. # Be aware that you will
>> be entirely responsible for populating the needed # environment
>> variables.
>> 
>> You can use just a single environment file, if you so wish.
>> 
> 
> ---------------------------------------------------------------------
>
> 
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Three scripts needed to run the server, Why?

Posted by Murukesh Mohanan <mu...@gmail.com>.

Also, you can use bash to debug bin/cassandra:

PS4=' $BASH_SOURCE:$LINENO:   ' bash -x bin/cassandra

This should print the filename of the file being executed/sourced and the line number being currently executed, so it should be easier to find out what happened, where and when. Of course, /bin/sh need not be bash, but I'm not sure what the equivalent method would be for dash or other shells.

On 2017-07-12 00:15 (+0900), "Murukesh Mohanan"<mu...@gmail.com> wrote: 
> What you complain about may be useful to someone else who might appreciate the added flexibility. I'd personally be opposed to a single script, as I'd rather not edit something that might cause conflicts or be overwritten on upgrades (the location of the include and environment files being configurable mean that they can be in an entirely different corner of the filesystem).
> 
> I can also think of cases where having two configurable files is useful. For example, as an administrator, I'd keep everything in the cassandra install directory read-only except for upgrades, then keep a common include file for my users with some common configuration for my server, and let the users use  `$CASSANDRA_CONF` (the directory where the environment file is) to configure everything else they wish for running their instances of Cassandra taking advantage of the common install and base setup. Admittedly this isn't a common use case.
> 
> If you're modifying bin/cassandra, then you're doing it wrong, IMHO. Only two files need to be examined: the (an?) included file and the environment file. And if you simply need to override a setting, then, you can just use the environment file as the ultimate override, since it is sourced after the include (not by it).
> 
> On 2017-07-11 23:39 (+0900), Tomas Repik <tr...@redhat.com> wrote: 
> > Thanks for the answer, it did not help much. I have read this several times and this I already know, It still does not answer the question, why there is the need for three files instead of a single file. Not to mention multiple different config files.
> > All these files are more or less configuration file which set up the environment and properties of the server. Why can't there be a single file that one would modify in order to tweak the server to his or her needs. In the current situation you have to search many different files to find the place where the option is configured.
> > 
> > ----- Original Message -----
> > > 
> > > The bin/cassandra script has an explanation
> > > (https://github.com/apache/cassandra/blob/trunk/bin/cassandra#L24):
> > > 
> > > # As a convenience, a fragment of shell is sourced in order to set one or
> > > # more of these variables. This so-called `include' can be placed in a
> > > # number of locations and will be searched for in order. The lowest
> > > # priority search path is the same directory as the startup script, and
> > > # since this is the location of the sample in the project tree, it should
> > > # almost work Out Of The Box.
> > > #
> > > # Any serious use-case though will likely require customization of the
> > > # include. For production installations, it is recommended that you copy
> > > # the sample to one of /usr/share/cassandra/cassandra.in.sh,
> > > # /usr/local/share/cassandra/cassandra.in.sh, or
> > > # /opt/cassandra/cassandra.in.sh and make your modifications there.
> > > #
> > > #[...]
> > > #
> > > # If you would rather configure startup entirely from the environment, you
> > > # can disable the include by exporting an empty CASSANDRA_INCLUDE, or by
> > > # ensuring that no include files exist in the aforementioned search list.
> > > # Be aware that you will be entirely responsible for populating the needed
> > > # environment variables.
> > > 
> > > You can use just a single environment file, if you so wish.
> > > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> > 
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Three scripts needed to run the server, Why?

Posted by Murukesh Mohanan <mu...@gmail.com>.

What you complain about may be useful to someone else who might appreciate the added flexibility. I'd personally be opposed to a single script, as I'd rather not edit something that might cause conflicts or be overwritten on upgrades (the location of the include and environment files being configurable mean that they can be in an entirely different corner of the filesystem).

I can also think of cases where having two configurable files is useful. For example, as an administrator, I'd keep everything in the cassandra install directory read-only except for upgrades, then keep a common include file for my users with some common configuration for my server, and let the users use  `$CASSANDRA_CONF` (the directory where the environment file is) to configure everything else they wish for running their instances of Cassandra taking advantage of the common install and base setup. Admittedly this isn't a common use case.

If you're modifying bin/cassandra, then you're doing it wrong, IMHO. Only two files need to be examined: the (an?) included file and the environment file. And if you simply need to override a setting, then, you can just use the environment file as the ultimate override, since it is sourced after the include (not by it).

On 2017-07-11 23:39 (+0900), Tomas Repik <tr...@redhat.com> wrote: 
> Thanks for the answer, it did not help much. I have read this several times and this I already know, It still does not answer the question, why there is the need for three files instead of a single file. Not to mention multiple different config files.
> All these files are more or less configuration file which set up the environment and properties of the server. Why can't there be a single file that one would modify in order to tweak the server to his or her needs. In the current situation you have to search many different files to find the place where the option is configured.
> 
> ----- Original Message -----
> > 
> > The bin/cassandra script has an explanation
> > (https://github.com/apache/cassandra/blob/trunk/bin/cassandra#L24):
> > 
> > # As a convenience, a fragment of shell is sourced in order to set one or
> > # more of these variables. This so-called `include' can be placed in a
> > # number of locations and will be searched for in order. The lowest
> > # priority search path is the same directory as the startup script, and
> > # since this is the location of the sample in the project tree, it should
> > # almost work Out Of The Box.
> > #
> > # Any serious use-case though will likely require customization of the
> > # include. For production installations, it is recommended that you copy
> > # the sample to one of /usr/share/cassandra/cassandra.in.sh,
> > # /usr/local/share/cassandra/cassandra.in.sh, or
> > # /opt/cassandra/cassandra.in.sh and make your modifications there.
> > #
> > #[...]
> > #
> > # If you would rather configure startup entirely from the environment, you
> > # can disable the include by exporting an empty CASSANDRA_INCLUDE, or by
> > # ensuring that no include files exist in the aforementioned search list.
> > # Be aware that you will be entirely responsible for populating the needed
> > # environment variables.
> > 
> > You can use just a single environment file, if you so wish.
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Three scripts needed to run the server, Why?

Posted by Tomas Repik <tr...@redhat.com>.

Thanks for the answer, it did not help much. I have read this several times and this I already know, It still does not answer the question, why there is the need for three files instead of a single file. Not to mention multiple different config files.
All these files are more or less configuration file which set up the environment and properties of the server. Why can't there be a single file that one would modify in order to tweak the server to his or her needs. In the current situation you have to search many different files to find the place where the option is configured.

----- Original Message -----
> 
> The bin/cassandra script has an explanation
> (https://github.com/apache/cassandra/blob/trunk/bin/cassandra#L24):
> 
> # As a convenience, a fragment of shell is sourced in order to set one or
> # more of these variables. This so-called `include' can be placed in a
> # number of locations and will be searched for in order. The lowest
> # priority search path is the same directory as the startup script, and
> # since this is the location of the sample in the project tree, it should
> # almost work Out Of The Box.
> #
> # Any serious use-case though will likely require customization of the
> # include. For production installations, it is recommended that you copy
> # the sample to one of /usr/share/cassandra/cassandra.in.sh,
> # /usr/local/share/cassandra/cassandra.in.sh, or
> # /opt/cassandra/cassandra.in.sh and make your modifications there.
> #
> #[...]
> #
> # If you would rather configure startup entirely from the environment, you
> # can disable the include by exporting an empty CASSANDRA_INCLUDE, or by
> # ensuring that no include files exist in the aforementioned search list.
> # Be aware that you will be entirely responsible for populating the needed
> # environment variables.
> 
> You can use just a single environment file, if you so wish.
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Three scripts needed to run the server, Why?

Posted by Murukesh Mohanan <mu...@gmail.com>.

The bin/cassandra script has an explanation (https://github.com/apache/cassandra/blob/trunk/bin/cassandra#L24):

# As a convenience, a fragment of shell is sourced in order to set one or
# more of these variables. This so-called `include' can be placed in a 
# number of locations and will be searched for in order. The lowest 
# priority search path is the same directory as the startup script, and
# since this is the location of the sample in the project tree, it should
# almost work Out Of The Box.
#
# Any serious use-case though will likely require customization of the
# include. For production installations, it is recommended that you copy
# the sample to one of /usr/share/cassandra/cassandra.in.sh,
# /usr/local/share/cassandra/cassandra.in.sh, or 
# /opt/cassandra/cassandra.in.sh and make your modifications there.
#
#[...]
# 
# If you would rather configure startup entirely from the environment, you
# can disable the include by exporting an empty CASSANDRA_INCLUDE, or by 
# ensuring that no include files exist in the aforementioned search list.
# Be aware that you will be entirely responsible for populating the needed
# environment variables.

You can use just a single environment file, if you so wish.

On 2017-07-11 23:08 (+0900), Tomas Repik <tr...@redhat.com> wrote: 
> Greetings,
> 
> I've been working with Cassandra for more than a year but I still wonder about one thing:
> 
> To run the server there is a bash script (cassandra) which uses another script (cassandra.in.sh) which uses yet another bash script (cassandra-env.sh).
> What is the reason behind this?
> Why there is not only a single file setting up the environment and running the server? 
> 
> Thanks for your answers
> 
> Tomas
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org