You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "jack (JIRA)" <ji...@apache.org> on 2019/05/02 08:16:00 UTC
[jira] [Commented] (AIRFLOW-2549) GCP DataProc Workflow Template
operators report success when jobs fail
[ https://issues.apache.org/jira/browse/AIRFLOW-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831473#comment-16831473 ]
jack commented on AIRFLOW-2549:
-------------------------------
[~kaxilnaik] There has been many releases and updates on dataproc since this issue. Did it resolve the problem?
> GCP DataProc Workflow Template operators report success when jobs fail
> ----------------------------------------------------------------------
>
> Key: AIRFLOW-2549
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2549
> Project: Apache Airflow
> Issue Type: Bug
> Reporter: Kevin McHale
> Assignee: Kevin McHale
> Priority: Major
>
> cc: [~DanSedov] [~fenglu]
>
> The Google DataProc workflow template operators use the[_DataProcOperator|https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/gcp_dataproc_hook.py#L149] class for analyzing the outcome of the workflow template instance, but that class does not properly detect errors.
>
> Specifically, when any one of the jobs in the template fails, the operator should report an error, but it always reports success because it does not properly analyze the API responses.
>
> The outcomes of individual jobs are indicated in the API responses under the {{metadata.graph.nodes}} path in the API response, and this field needs to be checked for errors. However, the existing {{_DataProcOperator}} class only checks for the existence of the {{done}} and {{error}} fields.
>
> Below is an example of the API response object for a failed DataProc workflow template operation, to illustrate this. I pulled this directly from the DataProc API and anonymized it:
> {code:java}
> {
> "response": {
> "@type": "type.googleapis.com/google.protobuf.Empty"
> },
> "done": true,
> "name": "projects/my-project/regions/us-central1/operations/dddddddd-dddd-dddd-dddd-dddddddddddd",
> "metadata": {
> "createCluster": {
> "done": true,
> "operationId": "projects/my-project/regions/us-central1/operations/1111111-0000-aaaa-bbbb-ffffffffffff"
> },
> "clusterName": "fake-dataproc-cluster",
> "graph": {
> "nodes": [
> {
> "state": "FAILED",
> "jobId": "my-job-abcdefghijklm",
> "stepId": "my-job",
> "error": "Google Cloud Dataproc Agent reports job failure. If logs are available, they can be found in 'gs://dataproc-00000000-0000-0000-0000-000000000000-us-central1/google-cloud-dataproc-metainfo/cccccccc-cccc-cccc-cccc-cccccccccccc/jobs/my-job-abcdefghijklm/driveroutput'."
> }
> ]
> },
> "state": "DONE",
> "deleteCluster": {
> "done": true,
> "operationId": "projects/my-project/regions/us-central1/operations/1111111-1111-aaaa-bbbb-ffffffffffff"
> },
> "@type": "type.googleapis.com/google.cloud.dataproc.v1beta2.WorkflowMetadata"
> }
> }
> {code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)