You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "Andrew Onischuk (JIRA)" <ji...@apache.org> on 2018/06/27 12:15:00 UTC
[jira] [Created] (AMBARI-24201) Command reschedule does not work
causing blueprint deployments to timeout
Andrew Onischuk created AMBARI-24201:
----------------------------------------
Summary: Command reschedule does not work causing blueprint deployments to timeout
Key: AMBARI-24201
URL: https://issues.apache.org/jira/browse/AMBARI-24201
Project: Ambari
Issue Type: Bug
Reporter: Andrew Onischuk
Assignee: Andrew Onischuk
Fix For: 2.7.0
Attachments: AMBARI-24201.patch, AMBARI-24201.patch
During stage timeout/failure of devilery during blueprint install server
usually reschedules running command. By sending cancel command along with
repeated execution command.
The bug is that agent cancels the command which needs to be newly scheduled.
2018-06-27 01:34:58,105 WARN [agent-message-retry-0] MessageEmitter:255 - Reschedule execution command emitting, retry: 1, messageId: 19
..., u'cancelCommands': [{u'commandType': u'CANCEL_COMMAND', u'target_task_id': 145, u'reason': u'Stage timeout'}]}}, u'requiredConfigTimestamp': 1530060845474}
INFO 2018-06-27 01:34:58,121 ActionQueue.py:115 - Canceling command with taskId = 145
INFO 2018-06-27 01:34:58,121 ActionQueue.py:134 - Canceling EXECUTION_COMMAND for service ZOOKEEPER and role ZOOKEEPER_CLIENT with taskId 145
WARNING 2018-06-27 01:34:58,121 CustomServiceOrchestrator.py:129 - Unable to find process associated with taskId = 145
INFO 2018-06-27 01:34:58,122 ActionQueue.py:103 - Adding EXECUTION_COMMAND for role ZOOKEEPER_CLIENT for service ZOOKEEPER of cluster_id 2 to the queue.
INFO 2018-06-27 01:34:58,122 security.py:135 - Event to server at /reports/responses (correlation_id=870): {'status': 'OK', 'messageId': '19'}
INFO 2018-06-27 01:34:58,142 __init__.py:57 - Event from server at /user/ (correlation_id=870): {u'status': u'OK'}
INFO 2018-06-27 01:34:59,293 ActionQueue.py:238 - Executing command with id = 10-0, taskId = 145 for role = ZOOKEEPER_CLIENT of cluster_id 2.
INFO 2018-06-27 01:34:59,294 security.py:135 - Event to server at /reports/commands_status (correlation_id=871): {'clusters': {u'2': [{'status': 'IN_PROGRESS', 'taskId': 145, 'tmpout': '/var/lib/ambari-agent/data/output-145.txt', 'roleCommand': u'INSTALL', 'structuredOut': '/var/lib/ambari-agent/data/structured-out-145.json', 'clusterId': u'2', 'serviceName': u'ZOOKEEPER', 'role': u'ZOOKEEPER_CLIENT', 'actionId': u'10-0', 'tmperr': '/var/lib/ambari-agent/data/errors-145.txt'}]}}
INFO 2018-06-27 01:34:59,295 ActionQueue.py:279 - Command execution metadata - taskId = 145, retry enabled = True, max retry duration (sec) = 1200, log_output = True
INFO 2018-06-27 01:34:59,296 ActionQueue.py:285 - Command with taskId = 145 canceled
ERROR 2018-06-27 01:34:59,296 ActionQueue.py:221 - Exception while processing EXECUTION_COMMAND command
Traceback (most recent call last):
File "/usr/lib/ambari-agent/lib/ambari_agent/ActionQueue.py", line 214, in process_command
self.execute_command(command)
File "/usr/lib/ambari-agent/lib/ambari_agent/ActionQueue.py", line 354, in execute_command
commandresult['stdout'] += '\n\nCommand completed successfully!\n' if status == self.COMPLETED_STATUS else '\n\nCommand failed after ' + str(numAttempts) + ' tries\n'
UnboundLocalError: local variable 'commandresult' referenced before assignment
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)