You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@metron.apache.org by "Anand Subramanian (JIRA)" <ji...@apache.org> on 2017/11/22 05:48:00 UTC

[jira] [Created] (METRON-1326) Metron deploy with Kerberos fails on Ambari 2.5 during ES service stop

Anand Subramanian created METRON-1326:
-----------------------------------------

             Summary: Metron deploy with Kerberos fails on Ambari 2.5 during ES service stop
                 Key: METRON-1326
                 URL: https://issues.apache.org/jira/browse/METRON-1326
             Project: Metron
          Issue Type: Bug
         Environment: 12 node VM cluster running CentOS 7
            Reporter: Anand Subramanian


I am noticing that Metron deploy is failing when enabling Kerberos on a 12-node VM cluster managed by Ambari 2.5.2.

The error is seen during the "Stop Services" step while kerberizing for Elasticsearch Master and Elasticsearch Data Node services.

I confirmed that the same deployment goes through fine for Ambari 2.4.2 version. I am able to setup the Kerberized cluster fine.

For Ambari 2.4, for the "Elasticsearch Data Node Stop" step, we stop the slave, and do not check on the status of the service after the 'service stop' command was issued. But with Ambari 2.5, we attempt to check the status after the service stop command was issued.

*In Ambari 2.4*
{code}
 stdout:
Stop the Slave
2017-11-07 10:21:27,755 - Execute['service elasticsearch stop'] {}

Command completed successfully!
{code}

*In Ambari 2.5*
{code}
Stop the Slave
2017-11-07 10:12:48,481 - Execute['service elasticsearch stop'] {}
2017-11-07 10:12:48,599 - Waiting for actual component stop
Status of the Slave
2017-11-07 10:12:48,600 - Execute['service elasticsearch status'] {}

Command failed after 1 tries
{code}

Apparently the status command is returning a result with error code 3, which the ambari agent is not liking and hence calling the step as a failure. 

I am not sure entirely if this is something to be handled by Metron or by Ambari. Please feel free to close this defect in case this is deemed out of scope of Metron.

Here is the full error log from the UI
{code}
stderr:
Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/ELASTICSEARCH/2.3.3/package/scripts/elastic_slave.py", line 71, in <module>
    Elasticsearch().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 332, in execute
    self.execute_prefix_function(self.command_name, 'after', env)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 350, in execute_prefix_function
    method(env)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 398, in after_stop
    status_method(env)
  File "/var/lib/ambari-agent/cache/common-services/ELASTICSEARCH/2.3.3/package/scripts/elastic_slave.py", line 59, in status
    Execute(status_cmd)
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
    tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
    raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'service elasticsearch status' returned 3. ● elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
     Docs: http://www.elastic.co

Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07 10:12:47,340][INFO ][cluster.service          ] [metron-12.openstacklocal] removed {{metron-9.openstacklocal}{lTJDzEA6Sp6_6ryTY8XSJQ}{172.22.97.19}{172.22.97.19:9300}{master=false},}, reason: zen-disco-node_left({metron-9.openstacklocal}{lTJDzEA6Sp6_6ryTY8XSJQ}{172.22.97.19}{172.22.97.19:9300}{master=false})
Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07 10:12:47,466][INFO ][cluster.service          ] [metron-12.openstacklocal] removed {{metron-8.openstacklocal}{Q7pgb5LLSj-oHMxld-DFfw}{172.22.97.188}{172.22.97.188:9300}{master=false},}, reason: zen-disco-node_left({metron-8.openstacklocal}{Q7pgb5LLSj-oHMxld-DFfw}{172.22.97.188}{172.22.97.188:9300}{master=false})
Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07 10:12:47,548][INFO ][cluster.service          ] [metron-12.openstacklocal] removed {{metron-2.openstacklocal}{8JdEI93MQPeDxD63tMKrRQ}{172.22.96.83}{172.22.96.83:9300}{master=false},}, reason: zen-disco-node_left({metron-2.openstacklocal}{8JdEI93MQPeDxD63tMKrRQ}{172.22.96.83}{172.22.96.83:9300}{master=false})
Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07 10:12:47,713][INFO ][cluster.service          ] [metron-12.openstacklocal] removed {{metron-5.openstacklocal}{643SMG8xSLOuFEZpuMNeQg}{172.22.97.119}{172.22.97.119:9300}{master=false},}, reason: zen-disco-node_left({metron-5.openstacklocal}{643SMG8xSLOuFEZpuMNeQg}{172.22.97.119}{172.22.97.119:9300}{master=false})
Nov 07 10:12:48 metron-12 systemd[1]: Stopping Elasticsearch...
Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07 10:12:48,417][INFO ][node                     ] [metron-12.openstacklocal] stopping ...
Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07 10:12:48,456][INFO ][node                     ] [metron-12.openstacklocal] stopped
Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07 10:12:48,456][INFO ][node                     ] [metron-12.openstacklocal] closing ...
Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07 10:12:48,491][INFO ][node                     ] [metron-12.openstacklocal] closed
Nov 07 10:12:48 metron-12 systemd[1]: Stopped Elasticsearch.
 stdout:
Stop the Slave
2017-11-07 10:12:49,025 - Execute['service elasticsearch stop'] {}
2017-11-07 10:12:49,089 - Waiting for actual component stop
Status of the Slave
2017-11-07 10:12:49,090 - Execute['service elasticsearch status'] {}

Command failed after 1 tries
{code}





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)