You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "Andrew Onischuk (JIRA)" <ji...@apache.org> on 2018/06/27 12:15:00 UTC

[jira] [Created] (AMBARI-24201) Command reschedule does not work causing blueprint deployments to timeout

Andrew Onischuk created AMBARI-24201:
----------------------------------------

             Summary: Command reschedule does not work causing blueprint deployments to timeout  
                 Key: AMBARI-24201
                 URL: https://issues.apache.org/jira/browse/AMBARI-24201
             Project: Ambari
          Issue Type: Bug
            Reporter: Andrew Onischuk
            Assignee: Andrew Onischuk
             Fix For: 2.7.0
         Attachments: AMBARI-24201.patch, AMBARI-24201.patch

During stage timeout/failure of devilery during blueprint install server
usually reschedules running command. By sending cancel command along with
repeated execution command.

The bug is that agent cancels the command which needs to be newly scheduled.

    
    
    2018-06-27 01:34:58,105  WARN [agent-message-retry-0] MessageEmitter:255 - Reschedule execution command emitting, retry: 1, messageId: 19
    
    
    
    ..., u'cancelCommands': [{u'commandType': u'CANCEL_COMMAND', u'target_task_id': 145, u'reason': u'Stage timeout'}]}}, u'requiredConfigTimestamp': 1530060845474}
    INFO 2018-06-27 01:34:58,121 ActionQueue.py:115 - Canceling command with taskId = 145
    INFO 2018-06-27 01:34:58,121 ActionQueue.py:134 - Canceling EXECUTION_COMMAND for service ZOOKEEPER and role ZOOKEEPER_CLIENT with taskId 145
    WARNING 2018-06-27 01:34:58,121 CustomServiceOrchestrator.py:129 - Unable to find process associated with taskId = 145
    INFO 2018-06-27 01:34:58,122 ActionQueue.py:103 - Adding EXECUTION_COMMAND for role ZOOKEEPER_CLIENT for service ZOOKEEPER of cluster_id 2 to the queue.
    INFO 2018-06-27 01:34:58,122 security.py:135 - Event to server at /reports/responses (correlation_id=870): {'status': 'OK', 'messageId': '19'}
    INFO 2018-06-27 01:34:58,142 __init__.py:57 - Event from server at /user/ (correlation_id=870): {u'status': u'OK'}
    INFO 2018-06-27 01:34:59,293 ActionQueue.py:238 - Executing command with id = 10-0, taskId = 145 for role = ZOOKEEPER_CLIENT of cluster_id 2.
    INFO 2018-06-27 01:34:59,294 security.py:135 - Event to server at /reports/commands_status (correlation_id=871): {'clusters': {u'2': [{'status': 'IN_PROGRESS', 'taskId': 145, 'tmpout': '/var/lib/ambari-agent/data/output-145.txt', 'roleCommand': u'INSTALL', 'structuredOut': '/var/lib/ambari-agent/data/structured-out-145.json', 'clusterId': u'2', 'serviceName': u'ZOOKEEPER', 'role': u'ZOOKEEPER_CLIENT', 'actionId': u'10-0', 'tmperr': '/var/lib/ambari-agent/data/errors-145.txt'}]}}
    INFO 2018-06-27 01:34:59,295 ActionQueue.py:279 - Command execution metadata - taskId = 145, retry enabled = True, max retry duration (sec) = 1200, log_output = True
    INFO 2018-06-27 01:34:59,296 ActionQueue.py:285 - Command with taskId = 145 canceled
    ERROR 2018-06-27 01:34:59,296 ActionQueue.py:221 - Exception while processing EXECUTION_COMMAND command
    Traceback (most recent call last):
      File "/usr/lib/ambari-agent/lib/ambari_agent/ActionQueue.py", line 214, in process_command
        self.execute_command(command)
      File "/usr/lib/ambari-agent/lib/ambari_agent/ActionQueue.py", line 354, in execute_command
        commandresult['stdout'] += '\n\nCommand completed successfully!\n' if status == self.COMPLETED_STATUS else '\n\nCommand failed after ' + str(numAttempts) + ' tries\n'
    UnboundLocalError: local variable 'commandresult' referenced before assignment
    





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)