You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2018/09/02 17:59:03 UTC

[jira] [Commented] (AIRFLOW-2706) AWS Batch Operator doesn't detect failure if there were no job attempts

    [ https://issues.apache.org/jira/browse/AIRFLOW-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601344#comment-16601344 ] 

Apache Spark commented on AIRFLOW-2706:
---------------------------------------

User 'craigforster' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3567

> AWS Batch Operator doesn't detect failure if there were no job attempts
> -----------------------------------------------------------------------
>
>                 Key: AIRFLOW-2706
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2706
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: aws
>            Reporter: Craig Forster
>            Assignee: Craig Forster
>            Priority: Major
>             Fix For: 2.0.0
>
>
> During initial deployment testing of our AWS Batch environment using Airflow to co-ordinate, we had a few false starts while we fixed IAM roles.  However, these failed jobs weren't detected as failed by Airflow.
> I believe the issue lies in _check_success_task; the failure check loops over the attempts array, but in this case there are no attempts to check.
> Logs:
> {noformat}
> {awsbatch_operator.py:150} INFO - AWS Batch stopped, check status: 
> {
>   "ResponseMetadata": {
>     "RequestId": "51084897-7d90-11e8-be75-7b511f9b010d",
>     "HTTPStatusCode": 200,
>     "HTTPHeaders": {
>       "date": "Mon, 02 Jul 2018 00:39:02 GMT",
>       "content-type": "application/json",
>       "content-length": "1142",
>       "connection": "keep-alive",
>       "x-amzn-requestid": "51084897-7d90-11e8-be75-7b511f9b010d",
>       "x-amz-apigw-id": "JX8V_HOyPHcF5KA=",
>       "x-amzn-trace-id": "Root=1-5b397426-058a6d1ce4d7569273c05bd4"
>     },
>     "RetryAttempts": 0
>   },
>   "jobs": [
>     {
>       "jobName": "snip-20180317",
>       "jobId": "2ea0def8-1e7f-4a5c-bd1e-3f0a3acc035c",
>       "jobQueue":
>         "arn:aws:batch:us-west-2:snip:job-queue/snip-829f351459741d3",
>       "status": "FAILED",
>       "attempts": [],
>       "statusReason": "Role is not valid",
>       "createdAt": 1530491934164,
>       "retryStrategy": { "attempts": 1 },
>       "dependsOn": [],
>       "jobDefinition":
>         "arn:aws:batch:us-west-2:snip:job-definition/snip-job-definition:4",
>       "parameters": {},
>       "container": {
>         "image":
>           "snip.dkr.ecr.eu-central-1.amazonaws.com/snip:latest",
>         "vcpus": 1,
>         "memory": 2048,
>         "command": [],
>         "jobRoleArn":
>           "arn:aws:iam::snip:instance-profile/common-instance-profile-us2-sandbox",
>         "volumes": [],
>         "environment": [
>           { SNIP }
>         ],
>         "mountPoints": [],
>         "ulimits": [],
>         "privileged": True
>       }
>     }
>   ]
> }
> {awsbatch_operator.py:110} INFO - AWS Batch Job has been successfully executed: 
> {
>   "ResponseMetadata": {
>     "RequestId": "4c255dd7-7d90-11e8-988b-c9ea0b25c469",
>     "HTTPStatusCode": 200,
>     "HTTPHeaders": {
>       "date": "Mon, 02 Jul 2018 00:38:54 GMT",
>       "content-type": "application/json",
>       "content-length": "111",
>       "connection": "keep-alive",
>       "x-amzn-requestid": "4c255dd7-7d90-11e8-988b-c9ea0b25c469",
>       "x-amz-apigw-id": "JX8UtH6VvHcFcVg=",
>       "x-amzn-trace-id": "Root=1-5b39741e-577ea13c82751664daac335e"
>     },
>     "RetryAttempts": 0
>   },
>   "jobName": "snip-20180317",
>   "jobId": "2ea0def8-1e7f-4a5c-bd1e-3f0a3acc035c"
> }
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)