You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@metron.apache.org by "Justin Leet (JIRA)" <ji...@apache.org> on 2018/05/22 19:35:08 UTC

[jira] [Updated] (METRON-1326) Metron deploy with Kerberos fails on Ambari 2.5 during ES service stop

     [ https://issues.apache.org/jira/browse/METRON-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Justin Leet updated METRON-1326:
--------------------------------
    Fix Version/s: 0.5.0

> Metron deploy with Kerberos fails on Ambari 2.5 during ES service stop
> ----------------------------------------------------------------------
>
>                 Key: METRON-1326
>                 URL: https://issues.apache.org/jira/browse/METRON-1326
>             Project: Metron
>          Issue Type: Bug
>         Environment: 12 node VM cluster running CentOS 7
>            Reporter: Anand Subramanian
>            Assignee: Michael Miklavcic
>            Priority: Major
>             Fix For: 0.5.0
>
>
> I am noticing that Metron deploy is failing when enabling Kerberos on a 12-node VM cluster managed by Ambari 2.5.2.
> The error is seen during the "Stop Services" step while kerberizing for Elasticsearch Master and Elasticsearch Data Node services.
> I confirmed that the same deployment goes through fine for Ambari 2.4.2 version. I am able to setup the Kerberized cluster fine.
> For Ambari 2.4, for the "Elasticsearch Data Node Stop" step, we stop the slave, and do not check on the status of the service after the 'service stop' command was issued. But with Ambari 2.5, we attempt to check the status after the service stop command was issued.
> *In Ambari 2.4*
> {code}
>  stdout:
> Stop the Slave
> 2017-11-07 10:21:27,755 - Execute['service elasticsearch stop'] {}
> Command completed successfully!
> {code}
> *In Ambari 2.5*
> {code}
> Stop the Slave
> 2017-11-07 10:12:48,481 - Execute['service elasticsearch stop'] {}
> 2017-11-07 10:12:48,599 - Waiting for actual component stop
> Status of the Slave
> 2017-11-07 10:12:48,600 - Execute['service elasticsearch status'] {}
> Command failed after 1 tries
> {code}
> Apparently the status command is returning a result with error code 3, which the ambari agent is not liking and hence calling the step as a failure. 
> I am not sure entirely if this is something to be handled by Metron or by Ambari. Please feel free to close this defect in case this is deemed out of scope of Metron.
> Here is the full error log from the UI
> {code}
> stderr:
> Traceback (most recent call last):
>   File "/var/lib/ambari-agent/cache/common-services/ELASTICSEARCH/2.3.3/package/scripts/elastic_slave.py", line 71, in <module>
>     Elasticsearch().execute()
>   File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 332, in execute
>     self.execute_prefix_function(self.command_name, 'after', env)
>   File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 350, in execute_prefix_function
>     method(env)
>   File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 398, in after_stop
>     status_method(env)
>   File "/var/lib/ambari-agent/cache/common-services/ELASTICSEARCH/2.3.3/package/scripts/elastic_slave.py", line 59, in status
>     Execute(status_cmd)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__
>     self.env.run()
>   File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
>     self.run_action(resource, action)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
>     provider_action()
>   File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run
>     tries=self.resource.tries, try_sleep=self.resource.try_sleep)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
>     result = function(command, **kwargs)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
>     tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
>     result = _call(command, **kwargs_copy)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
>     raise ExecutionFailed(err_msg, code, out, err)
> resource_management.core.exceptions.ExecutionFailed: Execution of 'service elasticsearch status' returned 3. ● elasticsearch.service - Elasticsearch
>    Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; disabled; vendor preset: disabled)
>    Active: inactive (dead)
>      Docs: http://www.elastic.co
> Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07 10:12:47,340][INFO ][cluster.service          ] [metron-12.openstacklocal] removed {{metron-9.openstacklocal}{lTJDzEA6Sp6_6ryTY8XSJQ}{172.22.97.19}{172.22.97.19:9300}{master=false},}, reason: zen-disco-node_left({metron-9.openstacklocal}{lTJDzEA6Sp6_6ryTY8XSJQ}{172.22.97.19}{172.22.97.19:9300}{master=false})
> Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07 10:12:47,466][INFO ][cluster.service          ] [metron-12.openstacklocal] removed {{metron-8.openstacklocal}{Q7pgb5LLSj-oHMxld-DFfw}{172.22.97.188}{172.22.97.188:9300}{master=false},}, reason: zen-disco-node_left({metron-8.openstacklocal}{Q7pgb5LLSj-oHMxld-DFfw}{172.22.97.188}{172.22.97.188:9300}{master=false})
> Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07 10:12:47,548][INFO ][cluster.service          ] [metron-12.openstacklocal] removed {{metron-2.openstacklocal}{8JdEI93MQPeDxD63tMKrRQ}{172.22.96.83}{172.22.96.83:9300}{master=false},}, reason: zen-disco-node_left({metron-2.openstacklocal}{8JdEI93MQPeDxD63tMKrRQ}{172.22.96.83}{172.22.96.83:9300}{master=false})
> Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07 10:12:47,713][INFO ][cluster.service          ] [metron-12.openstacklocal] removed {{metron-5.openstacklocal}{643SMG8xSLOuFEZpuMNeQg}{172.22.97.119}{172.22.97.119:9300}{master=false},}, reason: zen-disco-node_left({metron-5.openstacklocal}{643SMG8xSLOuFEZpuMNeQg}{172.22.97.119}{172.22.97.119:9300}{master=false})
> Nov 07 10:12:48 metron-12 systemd[1]: Stopping Elasticsearch...
> Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07 10:12:48,417][INFO ][node                     ] [metron-12.openstacklocal] stopping ...
> Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07 10:12:48,456][INFO ][node                     ] [metron-12.openstacklocal] stopped
> Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07 10:12:48,456][INFO ][node                     ] [metron-12.openstacklocal] closing ...
> Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07 10:12:48,491][INFO ][node                     ] [metron-12.openstacklocal] closed
> Nov 07 10:12:48 metron-12 systemd[1]: Stopped Elasticsearch.
>  stdout:
> Stop the Slave
> 2017-11-07 10:12:49,025 - Execute['service elasticsearch stop'] {}
> 2017-11-07 10:12:49,089 - Waiting for actual component stop
> Status of the Slave
> 2017-11-07 10:12:49,090 - Execute['service elasticsearch status'] {}
> Command failed after 1 tries
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)