You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by jo...@nielsen.com, jo...@nielsen.com on 2018/04/17 20:33:02 UTC

Possible bug in ECS Operator

I think I found a bug in ECSOperator, but I'm not sure.

Here is the relevant part of the response I got back from the ECS task run:
..
'tasks': [{
        'lastStatus': 'STOPPED',
        'desiredStatus': 'STOPPED',
        'stoppedReason': 'HostEC2(instancei-abc)terminated.',
        'containers': [{
            'containerArn': 'arn:abc',
            'taskArn': 'arn:abc',
            'lastStatus': 'RUNNING',
            'name': 'abc',
            'networkBindings': []
        }],
    }],
..

So this task was stopped before it finished in ECS, but airflow considered it a pass because it gets through this if else block:
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/ecs_operator.py#L101

for container in containers:
    if container.get('lastStatus') == 'STOPPED' and container['exitCode'] != 0:
        raise AirflowException('This task is not in success state {}'.format(task))
    elif container.get('lastStatus') == 'PENDING':
        raise AirflowException('This task is still pending {}'.format(task))
    elif 'error' in container.get('reason', '').lower():
        raise AirflowException('This containers encounter an error during launching : {}'.
                               format(container.get('reason', '').lower()))
							   
So because the container was in running status when the instance was terminated, the task within airflow was considered a success.
I think that's probably not right, but maybe someone can correct me.
If it isn't right I will open a bug in Jira, and with any luck try to do a pull request to fix it.

In a way I feel like this is an AWS bug, because they gave back no exit code, no error in the container part of the json...

Any ideas?