You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by jo...@nielsen.com,
jo...@nielsen.com on 2018/04/17 20:33:02 UTC
Possible bug in ECS Operator
I think I found a bug in ECSOperator, but I'm not sure.
Here is the relevant part of the response I got back from the ECS task run:
..
'tasks': [{
'lastStatus': 'STOPPED',
'desiredStatus': 'STOPPED',
'stoppedReason': 'HostEC2(instancei-abc)terminated.',
'containers': [{
'containerArn': 'arn:abc',
'taskArn': 'arn:abc',
'lastStatus': 'RUNNING',
'name': 'abc',
'networkBindings': []
}],
}],
..
So this task was stopped before it finished in ECS, but airflow considered it a pass because it gets through this if else block:
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/ecs_operator.py#L101
for container in containers:
if container.get('lastStatus') == 'STOPPED' and container['exitCode'] != 0:
raise AirflowException('This task is not in success state {}'.format(task))
elif container.get('lastStatus') == 'PENDING':
raise AirflowException('This task is still pending {}'.format(task))
elif 'error' in container.get('reason', '').lower():
raise AirflowException('This containers encounter an error during launching : {}'.
format(container.get('reason', '').lower()))
So because the container was in running status when the instance was terminated, the task within airflow was considered a success.
I think that's probably not right, but maybe someone can correct me.
If it isn't right I will open a bug in Jira, and with any luck try to do a pull request to fix it.
In a way I feel like this is an AWS bug, because they gave back no exit code, no error in the container part of the json...
Any ideas?