You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Dan Sedov (JIRA)" <ji...@apache.org> on 2018/05/31 17:37:00 UTC

[jira] [Comment Edited] (AIRFLOW-2549) GCP DataProc Workflow Template operators report success when jobs fail

    [ https://issues.apache.org/jira/browse/AIRFLOW-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496893#comment-16496893 ] 

Dan Sedov edited comment on AIRFLOW-2549 at 5/31/18 5:36 PM:
-------------------------------------------------------------

[~mchalek] thanks for the report!

I do not believe this is a bug in Airflow _operator_, instead, this is a bug in _Dataproc_ itself. Your observation about checking Done and Error fields are correct, those fields are part of the Operation contract and what signals that Workflow has failed. In this case, the final Operation should have looked like this:

{ done: true, error: \{message="something bad happened"} , ...}

I'll file a bug and fix it on our end.

If you'd like to track resolution, could you file a bug here: [https://issuetracker.google.com/issues/new?component=187133&template=0]


was (Author: dansedov):
[~mchalek] thanks for the report!

I do not believe this is a bug in Airflow _operator_, instead, this is a bug in _Dataproc_ itself. Clients should not have to walk through each Job and Operation status and check for error. Instead, the final Operation should look like this: \{ done: true, error: {message="something bad happened"}, ...}. I am not seeing that here.

I'll file a bug and fix it on our end.

If you'd like to track resolution, could you file a bug here: https://issuetracker.google.com/issues/new?component=187133&template=0

> GCP DataProc Workflow Template operators report success when jobs fail
> ----------------------------------------------------------------------
>
>                 Key: AIRFLOW-2549
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2549
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: Kevin McHale
>            Assignee: Kevin McHale
>            Priority: Major
>
> cc: [~DanSedov] [~fenglu]
>  
> The Google DataProc workflow template operators use the[_DataProcOperator|https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/gcp_dataproc_hook.py#L149] class for analyzing the outcome of the workflow template instance, but that class does not properly detect errors.
>  
> Specifically, when any one of the jobs in the template fails, the operator should report an error, but it always reports success because it does not properly analyze the API responses.
>  
> The outcomes of individual jobs are indicated in the API responses under the {{metadata.graph.nodes}} path in the API response, and this field needs to be checked for errors.  However, the existing {{_DataProcOperator}} class only checks for the existence of the {{done}} and {{error}} fields.
>  
> Below is an example of the API response object for a failed DataProc workflow template operation, to illustrate this.  I pulled this directly from the DataProc API and anonymized it:
> {code:java}
> {
>   "response": {
>     "@type": "type.googleapis.com/google.protobuf.Empty"
>   },
>   "done": true,
>   "name": "projects/my-project/regions/us-central1/operations/dddddddd-dddd-dddd-dddd-dddddddddddd",
>   "metadata": {
>     "createCluster": {
>       "done": true,
>       "operationId": "projects/my-project/regions/us-central1/operations/1111111-0000-aaaa-bbbb-ffffffffffff"
>     },
>     "clusterName": "fake-dataproc-cluster",
>     "graph": {
>       "nodes": [
>         {
>           "state": "FAILED",
>           "jobId": "my-job-abcdefghijklm",
>           "stepId": "my-job",
>           "error": "Google Cloud Dataproc Agent reports job failure. If logs are available, they can be found in 'gs://dataproc-00000000-0000-0000-0000-000000000000-us-central1/google-cloud-dataproc-metainfo/cccccccc-cccc-cccc-cccc-cccccccccccc/jobs/my-job-abcdefghijklm/driveroutput'."
>         }
>       ]
>     },
>     "state": "DONE",
>     "deleteCluster": {
>       "done": true,
>       "operationId": "projects/my-project/regions/us-central1/operations/1111111-1111-aaaa-bbbb-ffffffffffff"
>     },
>     "@type": "type.googleapis.com/google.cloud.dataproc.v1beta2.WorkflowMetadata"
>   }
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)