You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2018/09/02 17:59:03 UTC
[jira] [Commented] (AIRFLOW-2706) AWS Batch Operator doesn't detect
failure if there were no job attempts
[ https://issues.apache.org/jira/browse/AIRFLOW-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601344#comment-16601344 ]
Apache Spark commented on AIRFLOW-2706:
---------------------------------------
User 'craigforster' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3567
> AWS Batch Operator doesn't detect failure if there were no job attempts
> -----------------------------------------------------------------------
>
> Key: AIRFLOW-2706
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2706
> Project: Apache Airflow
> Issue Type: Bug
> Components: aws
> Reporter: Craig Forster
> Assignee: Craig Forster
> Priority: Major
> Fix For: 2.0.0
>
>
> During initial deployment testing of our AWS Batch environment using Airflow to co-ordinate, we had a few false starts while we fixed IAM roles. However, these failed jobs weren't detected as failed by Airflow.
> I believe the issue lies in _check_success_task; the failure check loops over the attempts array, but in this case there are no attempts to check.
> Logs:
> {noformat}
> {awsbatch_operator.py:150} INFO - AWS Batch stopped, check status:
> {
> "ResponseMetadata": {
> "RequestId": "51084897-7d90-11e8-be75-7b511f9b010d",
> "HTTPStatusCode": 200,
> "HTTPHeaders": {
> "date": "Mon, 02 Jul 2018 00:39:02 GMT",
> "content-type": "application/json",
> "content-length": "1142",
> "connection": "keep-alive",
> "x-amzn-requestid": "51084897-7d90-11e8-be75-7b511f9b010d",
> "x-amz-apigw-id": "JX8V_HOyPHcF5KA=",
> "x-amzn-trace-id": "Root=1-5b397426-058a6d1ce4d7569273c05bd4"
> },
> "RetryAttempts": 0
> },
> "jobs": [
> {
> "jobName": "snip-20180317",
> "jobId": "2ea0def8-1e7f-4a5c-bd1e-3f0a3acc035c",
> "jobQueue":
> "arn:aws:batch:us-west-2:snip:job-queue/snip-829f351459741d3",
> "status": "FAILED",
> "attempts": [],
> "statusReason": "Role is not valid",
> "createdAt": 1530491934164,
> "retryStrategy": { "attempts": 1 },
> "dependsOn": [],
> "jobDefinition":
> "arn:aws:batch:us-west-2:snip:job-definition/snip-job-definition:4",
> "parameters": {},
> "container": {
> "image":
> "snip.dkr.ecr.eu-central-1.amazonaws.com/snip:latest",
> "vcpus": 1,
> "memory": 2048,
> "command": [],
> "jobRoleArn":
> "arn:aws:iam::snip:instance-profile/common-instance-profile-us2-sandbox",
> "volumes": [],
> "environment": [
> { SNIP }
> ],
> "mountPoints": [],
> "ulimits": [],
> "privileged": True
> }
> }
> ]
> }
> {awsbatch_operator.py:110} INFO - AWS Batch Job has been successfully executed:
> {
> "ResponseMetadata": {
> "RequestId": "4c255dd7-7d90-11e8-988b-c9ea0b25c469",
> "HTTPStatusCode": 200,
> "HTTPHeaders": {
> "date": "Mon, 02 Jul 2018 00:38:54 GMT",
> "content-type": "application/json",
> "content-length": "111",
> "connection": "keep-alive",
> "x-amzn-requestid": "4c255dd7-7d90-11e8-988b-c9ea0b25c469",
> "x-amz-apigw-id": "JX8UtH6VvHcFcVg=",
> "x-amzn-trace-id": "Root=1-5b39741e-577ea13c82751664daac335e"
> },
> "RetryAttempts": 0
> },
> "jobName": "snip-20180317",
> "jobId": "2ea0def8-1e7f-4a5c-bd1e-3f0a3acc035c"
> }
> {noformat}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)