You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@metron.apache.org by "Matt Foley (JIRA)" <ji...@apache.org> on 2017/01/01 02:20:58 UTC

[jira] [Updated] (METRON-634) Mpack bug fixes and improvements (not related to singlenode install)

     [ https://issues.apache.org/jira/browse/METRON-634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Foley updated METRON-634:
------------------------------
    Description: 
Multiple bug fixes and recommended improvements were found in the course of implementing METRON-608 that are unrelated to METRON-609 (singlenode install).  This jira provides the work items for merging those changes into the mainline Mpack.  Almost all items relate to Elasticsearch.

Some changes impact the Ambari database, and should therefore be done with an Mpack version bump and database upgrade script.  These are in the second block below.

h2. Not impacting Ambari database:
* ES pid_dir specification and usage:
** Currently pid_dir is multiply specified in elastic-env.xml and params.py.  The config parameter should not be over-ridden in params.py.
** PID_DIR failed to be included in /etc/sysconfig/elasticsearch.  It needs to be added to the template in elastic-sysconfig, as it must be provided to ES at launch-time (else the default directory will be used).
** pid_file is specified in params.py, but is not used anywhere.  (The ES internal launcher synthesizes it from PID_DIR, and this is appropriate.)
** status_params.py, which redundantly defines pid_dir as a python variable, is unnecessary and unused by the ES portion of the Mpack.  It can be removed.
* JAVA_HOME needs to be provided in /etc/sysconfig/elasticsearch (templated in elastic-sysconfig.xml).  Its absence causes Centos7 systemctl to fail the ES launch, unless /bin/java is defined (which it isn't necessarily).
* Also in the /etc/sysconfig/elasticsearch template in elastic-sysconfig.xml, the value of ES_JAVA_OPTS incorrectly spans 3 lines.  The lines must be terminated with backslashes to effectively become a single line.  The current inclusion of newlines in the long string value is acceptable (although unusual) in shellscript, but not in a systemd EnvironmentFile.  /etc/sysconfig/elasticsearch must function as both.
* Also in ES_JAVA_OPTS, the two instances of {{log_dir}} needs to be followed by a slash '/'
* Recommend adding QuickLinks for Elasticsearch.  This is a low-impact change that is very helpful to validate ES health.
** ES health status
** ES indexes list
* In elastic.py, when directories are being pre-created and permissions set, the file $CONF_DIR/scripts should also be pre-created.  I intermittently hit permissions issues with this directory being created later by root, and not properly assigned to elastic_user.
* In several places in elastic.py, "params.elastic_user" is incorrectly used when "params.user_group" should be used.
* Undefined "format()" method is used in elastic.py, unnecessarily in File(format("/etc/sysconfig/elasticsearch")...
* Undefined "format()" method is similarly used several times unnecessarily in elastic_master.py
* The comments and descriptions in elastic-site.xml have multiple suggested improvements.
* The comment description in common-services/METRON/0.3.0/configuration/metron-env.xml for "es_hosts" should clarify that this should be a list of ES Master hosts, not any ES hosts.


h2. Affects Ambari database, needs db upgrade script:
* pid_dir SHOULD be specified in elastic-sysconfig.xml, rather than elastic-env.xml, as it is a parameter that must be provided to ES at launch-time, but is not something there's any reason for the admin to change in usual circumstances.
* conf_dir SHOULD be specified in elastic-env.xml or elastic-site.xml, not in elastic-sysconfig.xml.  While it too is a parameter that must be provided to ES at launch-time, it is typically left to the installing admin where to put the config files.
* The configuration parameter names in elastic-site.xml should be improved in several instances to make the semantics more obvious to the human reader (who may not be real familiar with Elasticsearch configuration).  Mouse-over documentation will continue to provide the ES config parameter equivalents.  In particular, suggest:
{code}
cluster_name -> es_cluster_name  (to distinguish ES cluster from Stack cluster)
zen_discovery_ping_unicast_hosts -> es_cluster_hosts
network_host -> network_bindings  (these are in fact interface names, not host names)
{code}
* There are at least two places in elasticsearch.master.yaml.j2 (zen_discovery_ping_unicast_hosts and network_host) where needed square brackets are either missing or included in the configuration string.  To be consistent with other usages, and less prone to human error, the square brackets should not be in the configuration string but rather should be provided in the template text.
* "data_dir" apparently should be eliminated (from elastic-sysconfig) in preference for "path_data" (in elastic-site.xml).  The latter value ends up overriding the former anyway, but the existence of the former is confusing and unnecessary.
* All four configuration parameters in elastic-env.xml should be moved to elastic-site.xml, because they are all reasonable to set in a "site.xml" file and do end up in the .yml file that ES uses instead of "site.xml", and do NOT end up in environment variables.  The only parameters that end up in env vars are set in elastic-sysconfig, and the ES launch process in fact ignores the elastic-env.sh file that is templated in elastic-env.xml (which consists only of JAVA_HOME and PATH).  Therefore we could also eliminate elastic-env.sh and hence entirely remove elastic-env.xml, or we could choose to keep the small elastic-env.sh file and its template, just to remind people that it is necessary to have JAVA_HOME defined.
* In METRON/0.3.0/configuration/metron-env.xml and METRON/0.3.0/package/scripts/params/params_linux.py, the value "metron_apps_indexed_hdfs_dir" does not need to be settable by admin; it is appropriate to require it to be subordinate to "metron_apps_hdfs_dir".  Thus it can be removed from metron-env.xml and set to 
"\{metron_apps_hdfs_dir\}/indexing/indexed" in params_linux.py.  This also eliminates a really unacceptable use of "double format".



  was:
Multiple bug fixes and recommended improvements were found in the course of implementing METRON-608 that are unrelated to METRON-609 (singlenode install).  This jira provides the work items for merging those changes into the mainline Mpack.  Almost all items relate to Elasticsearch.

Some changes impact the Ambari database, and should therefore be done with an Mpack version bump and database upgrade script.  These are in the second block below.

h2. Not impacting Ambari database:
* ES pid_dir specification and usage:
** Currently pid_dir is multiply specified in elastic-env.xml and params.py.  The config parameter should not be over-ridden in params.py.
** PID_DIR failed to be included in /etc/sysconfig/elasticsearch.  It needs to be added to the template in elastic-sysconfig, as it must be provided to ES at launch-time (else the default directory will be used).
** pid_file is specified in params.py, but is not used anywhere.  (The ES internal launcher synthesizes it from PID_DIR, and this is appropriate.)
** status_params.py, which redundantly defines pid_dir as a python variable, is unnecessary and unused by the ES portion of the Mpack.  It can be removed.
* JAVA_HOME needs to be provided in /etc/sysconfig/elasticsearch (templated in elastic-sysconfig.xml).  Its absence causes Centos7 systemctl to fail the ES launch, unless /bin/java is defined (which it isn't necessarily).
* Also in the /etc/sysconfig/elasticsearch template in elastic-sysconfig.xml, the value of ES_JAVA_OPTS incorrectly spans 3 lines.  The lines must be terminated with backslashes to effectively become a single line.  The current inclusion of newlines in the long string value is acceptable (although unusual) in shellscript, but not in a systemd EnvironmentFile.  /etc/sysconfig/elasticsearch must function as both.
* Recommend adding QuickLinks for Elasticsearch.  This is a low-impact change that is very helpful to validate ES health.
** ES health status
** ES indexes list
* In elastic.py, when directories are being pre-created and permissions set, the file $CONF_DIR/scripts should also be pre-created.  I intermittently hit permissions issues with this directory being created later by root, and not properly assigned to elastic_user.
* In several places in elastic.py, "params.elastic_user" is incorrectly used when "params.user_group" should be used.
* Undefined "format()" method is used in elastic.py, unnecessarily in File(format("/etc/sysconfig/elasticsearch")...
* Undefined "format()" method is similarly used several times unnecessarily in elastic_master.py
* The comments and descriptions in elastic-site.xml have multiple suggested improvements.
* The comment description in common-services/METRON/0.3.0/configuration/metron-env.xml for "es_hosts" should clarify that this should be a list of ES Master hosts, not any ES hosts.


h2. Affects Ambari database, needs db upgrade script:
* pid_dir SHOULD be specified in elastic-sysconfig.xml, rather than elastic-env.xml, as it is a parameter that must be provided to ES at launch-time, but is not something there's any reason for the admin to change in usual circumstances.
* conf_dir SHOULD be specified in elastic-env.xml or elastic-site.xml, not in elastic-sysconfig.xml.  While it too is a parameter that must be provided to ES at launch-time, it is typically left to the installing admin where to put the config files.
* The configuration parameter names in elastic-site.xml should be improved in several instances to make the semantics more obvious to the human reader (who may not be real familiar with Elasticsearch configuration).  Mouse-over documentation will continue to provide the ES config parameter equivalents.  In particular, suggest:
{code}
cluster_name -> es_cluster_name  (to distinguish ES cluster from Stack cluster)
zen_discovery_ping_unicast_hosts -> es_cluster_hosts
network_host -> network_bindings  (these are in fact interface names, not host names)
{code}
* There are at least two places in elasticsearch.master.yaml.j2 (zen_discovery_ping_unicast_hosts and network_host) where needed square brackets are either missing or included in the configuration string.  To be consistent with other usages, and less prone to human error, the square brackets should not be in the configuration string but rather should be provided in the template text.
* "data_dir" apparently should be eliminated (from elastic-sysconfig) in preference for "path_data" (in elastic-site.xml).  The latter value ends up overriding the former anyway, but the existence of the former is confusing and unnecessary.
* All four configuration parameters in elastic-env.xml should be moved to elastic-site.xml, because they are all reasonable to set in a "site.xml" file and do end up in the .yml file that ES uses instead of "site.xml", and do NOT end up in environment variables.  The only parameters that end up in env vars are set in elastic-sysconfig, and the ES launch process in fact ignores the elastic-env.sh file that is templated in elastic-env.xml (which consists only of JAVA_HOME and PATH).  Therefore we could also eliminate elastic-env.sh and hence entirely remove elastic-env.xml, or we could choose to keep the small elastic-env.sh file and its template, just to remind people that it is necessary to have JAVA_HOME defined.
* In METRON/0.3.0/configuration/metron-env.xml and METRON/0.3.0/package/scripts/params/params_linux.py, the value "metron_apps_indexed_hdfs_dir" does not need to be settable by admin; it is appropriate to require it to be subordinate to "metron_apps_hdfs_dir".  Thus it can be removed from metron-env.xml and set to 
"\{metron_apps_hdfs_dir\}/indexing/indexed" in params_linux.py.  This also eliminates a really unacceptable use of "double format".




> Mpack bug fixes and improvements (not related to singlenode install)
> --------------------------------------------------------------------
>
>                 Key: METRON-634
>                 URL: https://issues.apache.org/jira/browse/METRON-634
>             Project: Metron
>          Issue Type: Bug
>    Affects Versions: 0.3.0
>         Environment: Centos7
>            Reporter: Matt Foley
>            Assignee: Matt Foley
>
> Multiple bug fixes and recommended improvements were found in the course of implementing METRON-608 that are unrelated to METRON-609 (singlenode install).  This jira provides the work items for merging those changes into the mainline Mpack.  Almost all items relate to Elasticsearch.
> Some changes impact the Ambari database, and should therefore be done with an Mpack version bump and database upgrade script.  These are in the second block below.
> h2. Not impacting Ambari database:
> * ES pid_dir specification and usage:
> ** Currently pid_dir is multiply specified in elastic-env.xml and params.py.  The config parameter should not be over-ridden in params.py.
> ** PID_DIR failed to be included in /etc/sysconfig/elasticsearch.  It needs to be added to the template in elastic-sysconfig, as it must be provided to ES at launch-time (else the default directory will be used).
> ** pid_file is specified in params.py, but is not used anywhere.  (The ES internal launcher synthesizes it from PID_DIR, and this is appropriate.)
> ** status_params.py, which redundantly defines pid_dir as a python variable, is unnecessary and unused by the ES portion of the Mpack.  It can be removed.
> * JAVA_HOME needs to be provided in /etc/sysconfig/elasticsearch (templated in elastic-sysconfig.xml).  Its absence causes Centos7 systemctl to fail the ES launch, unless /bin/java is defined (which it isn't necessarily).
> * Also in the /etc/sysconfig/elasticsearch template in elastic-sysconfig.xml, the value of ES_JAVA_OPTS incorrectly spans 3 lines.  The lines must be terminated with backslashes to effectively become a single line.  The current inclusion of newlines in the long string value is acceptable (although unusual) in shellscript, but not in a systemd EnvironmentFile.  /etc/sysconfig/elasticsearch must function as both.
> * Also in ES_JAVA_OPTS, the two instances of {{log_dir}} needs to be followed by a slash '/'
> * Recommend adding QuickLinks for Elasticsearch.  This is a low-impact change that is very helpful to validate ES health.
> ** ES health status
> ** ES indexes list
> * In elastic.py, when directories are being pre-created and permissions set, the file $CONF_DIR/scripts should also be pre-created.  I intermittently hit permissions issues with this directory being created later by root, and not properly assigned to elastic_user.
> * In several places in elastic.py, "params.elastic_user" is incorrectly used when "params.user_group" should be used.
> * Undefined "format()" method is used in elastic.py, unnecessarily in File(format("/etc/sysconfig/elasticsearch")...
> * Undefined "format()" method is similarly used several times unnecessarily in elastic_master.py
> * The comments and descriptions in elastic-site.xml have multiple suggested improvements.
> * The comment description in common-services/METRON/0.3.0/configuration/metron-env.xml for "es_hosts" should clarify that this should be a list of ES Master hosts, not any ES hosts.
> h2. Affects Ambari database, needs db upgrade script:
> * pid_dir SHOULD be specified in elastic-sysconfig.xml, rather than elastic-env.xml, as it is a parameter that must be provided to ES at launch-time, but is not something there's any reason for the admin to change in usual circumstances.
> * conf_dir SHOULD be specified in elastic-env.xml or elastic-site.xml, not in elastic-sysconfig.xml.  While it too is a parameter that must be provided to ES at launch-time, it is typically left to the installing admin where to put the config files.
> * The configuration parameter names in elastic-site.xml should be improved in several instances to make the semantics more obvious to the human reader (who may not be real familiar with Elasticsearch configuration).  Mouse-over documentation will continue to provide the ES config parameter equivalents.  In particular, suggest:
> {code}
> cluster_name -> es_cluster_name  (to distinguish ES cluster from Stack cluster)
> zen_discovery_ping_unicast_hosts -> es_cluster_hosts
> network_host -> network_bindings  (these are in fact interface names, not host names)
> {code}
> * There are at least two places in elasticsearch.master.yaml.j2 (zen_discovery_ping_unicast_hosts and network_host) where needed square brackets are either missing or included in the configuration string.  To be consistent with other usages, and less prone to human error, the square brackets should not be in the configuration string but rather should be provided in the template text.
> * "data_dir" apparently should be eliminated (from elastic-sysconfig) in preference for "path_data" (in elastic-site.xml).  The latter value ends up overriding the former anyway, but the existence of the former is confusing and unnecessary.
> * All four configuration parameters in elastic-env.xml should be moved to elastic-site.xml, because they are all reasonable to set in a "site.xml" file and do end up in the .yml file that ES uses instead of "site.xml", and do NOT end up in environment variables.  The only parameters that end up in env vars are set in elastic-sysconfig, and the ES launch process in fact ignores the elastic-env.sh file that is templated in elastic-env.xml (which consists only of JAVA_HOME and PATH).  Therefore we could also eliminate elastic-env.sh and hence entirely remove elastic-env.xml, or we could choose to keep the small elastic-env.sh file and its template, just to remind people that it is necessary to have JAVA_HOME defined.
> * In METRON/0.3.0/configuration/metron-env.xml and METRON/0.3.0/package/scripts/params/params_linux.py, the value "metron_apps_indexed_hdfs_dir" does not need to be settable by admin; it is appropriate to require it to be subordinate to "metron_apps_hdfs_dir".  Thus it can be removed from metron-env.xml and set to 
> "\{metron_apps_hdfs_dir\}/indexing/indexed" in params_linux.py.  This also eliminates a really unacceptable use of "double format".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)