You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "Andrew Onischuk (JIRA)" <ji...@apache.org> on 2018/06/27 12:15:00 UTC

[jira] [Updated] (AMBARI-24201) Command reschedule does not work causing blueprint deployments to timeout

     [ https://issues.apache.org/jira/browse/AMBARI-24201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Onischuk updated AMBARI-24201:
-------------------------------------
    Attachment: AMBARI-24201.patch

> Command reschedule does not work causing blueprint deployments to timeout  
> ---------------------------------------------------------------------------
>
>                 Key: AMBARI-24201
>                 URL: https://issues.apache.org/jira/browse/AMBARI-24201
>             Project: Ambari
>          Issue Type: Bug
>            Reporter: Andrew Onischuk
>            Assignee: Andrew Onischuk
>            Priority: Major
>             Fix For: 2.7.0
>
>         Attachments: AMBARI-24201.patch, AMBARI-24201.patch
>
>
> During stage timeout/failure of devilery during blueprint install server
> usually reschedules running command. By sending cancel command along with
> repeated execution command.
> The bug is that agent cancels the command which needs to be newly scheduled.
>     
>     
>     2018-06-27 01:34:58,105  WARN [agent-message-retry-0] MessageEmitter:255 - Reschedule execution command emitting, retry: 1, messageId: 19
>     
>     
>     
>     ..., u'cancelCommands': [{u'commandType': u'CANCEL_COMMAND', u'target_task_id': 145, u'reason': u'Stage timeout'}]}}, u'requiredConfigTimestamp': 1530060845474}
>     INFO 2018-06-27 01:34:58,121 ActionQueue.py:115 - Canceling command with taskId = 145
>     INFO 2018-06-27 01:34:58,121 ActionQueue.py:134 - Canceling EXECUTION_COMMAND for service ZOOKEEPER and role ZOOKEEPER_CLIENT with taskId 145
>     WARNING 2018-06-27 01:34:58,121 CustomServiceOrchestrator.py:129 - Unable to find process associated with taskId = 145
>     INFO 2018-06-27 01:34:58,122 ActionQueue.py:103 - Adding EXECUTION_COMMAND for role ZOOKEEPER_CLIENT for service ZOOKEEPER of cluster_id 2 to the queue.
>     INFO 2018-06-27 01:34:58,122 security.py:135 - Event to server at /reports/responses (correlation_id=870): {'status': 'OK', 'messageId': '19'}
>     INFO 2018-06-27 01:34:58,142 __init__.py:57 - Event from server at /user/ (correlation_id=870): {u'status': u'OK'}
>     INFO 2018-06-27 01:34:59,293 ActionQueue.py:238 - Executing command with id = 10-0, taskId = 145 for role = ZOOKEEPER_CLIENT of cluster_id 2.
>     INFO 2018-06-27 01:34:59,294 security.py:135 - Event to server at /reports/commands_status (correlation_id=871): {'clusters': {u'2': [{'status': 'IN_PROGRESS', 'taskId': 145, 'tmpout': '/var/lib/ambari-agent/data/output-145.txt', 'roleCommand': u'INSTALL', 'structuredOut': '/var/lib/ambari-agent/data/structured-out-145.json', 'clusterId': u'2', 'serviceName': u'ZOOKEEPER', 'role': u'ZOOKEEPER_CLIENT', 'actionId': u'10-0', 'tmperr': '/var/lib/ambari-agent/data/errors-145.txt'}]}}
>     INFO 2018-06-27 01:34:59,295 ActionQueue.py:279 - Command execution metadata - taskId = 145, retry enabled = True, max retry duration (sec) = 1200, log_output = True
>     INFO 2018-06-27 01:34:59,296 ActionQueue.py:285 - Command with taskId = 145 canceled
>     ERROR 2018-06-27 01:34:59,296 ActionQueue.py:221 - Exception while processing EXECUTION_COMMAND command
>     Traceback (most recent call last):
>       File "/usr/lib/ambari-agent/lib/ambari_agent/ActionQueue.py", line 214, in process_command
>         self.execute_command(command)
>       File "/usr/lib/ambari-agent/lib/ambari_agent/ActionQueue.py", line 354, in execute_command
>         commandresult['stdout'] += '\n\nCommand completed successfully!\n' if status == self.COMPLETED_STATUS else '\n\nCommand failed after ' + str(numAttempts) + ' tries\n'
>     UnboundLocalError: local variable 'commandresult' referenced before assignment
>     



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)