You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/05/01 18:18:50 UTC

[GitHub] [airflow] jaketf opened a new issue #8672: Data Fusion Hook Start pipeline will succeed before pipeline is in RUNNING state

jaketf opened a new issue #8672:
URL: https://github.com/apache/airflow/issues/8672


   [Here](https://github.com/apache/airflow/blob/4421f011eeec2d1022a39933e27f530fb9f9c1b1/airflow/providers/google/cloud/hooks/datafusion.py#L392) the `start_pipeline` method of data fusion hook will succeed if they get a 200 from the CDAP API.
   This is a misleading success signal as this indicates at best that this pipeline entered the PENDING state. However start pipeline should not succeed until the pipline has reached the RUNNING state.
   Note the Happy path is PENDING > STARTING> RUNNING ([ProgramStatus](https://github.com/cdapio/cdap/blob/1d62163faaecb5b888f4bccd0fcf4a8d27bbd549/cdap-proto/src/main/java/io/cdap/cdap/proto/ProgramRunStatus.java)) Many CDAP pipelines using Dataproc Provisioner spend a significant amount of time in the STARTING state because they  also have tick through the various [ProgramRunClusterStatus](https://github.com/cdapio/cdap/blob/1d62163faaecb5b888f4bccd0fcf4a8d27bbd549/cdap-proto/src/main/java/io/cdap/cdap/proto/ProgramRunClusterStatus.java) for provisioning the dataproc cluster.
   
   Unfortunately making the start call to CDAP does not return a run_id.
   
   This hook could work around this by adding a special runtime arg called `__faux_airflow_id__` which can be used to "look up" the real run id by this special runtime arg. the value of this runtime arg could be the dag_run_id or something. If using this workaround or CDAP API can return run id, then a more useful operator than start pipeline would be one that actually waits til the job reaches a success state (much like the existing dataflow and dataproc operators).
   
   Example in golang for terraform provider resource that manages a streaming pipeline.
   [Creating with faux id](https://github.com/GoogleCloudPlatform/terraform-provider-cdap/blob/master/cdap/resource_streaming_program_run.go#L108) 
   
   And [looking up the real CDAP run id by faux id](https://github.com/GoogleCloudPlatform/terraform-provider-cdap/blob/master/cdap/resource_streaming_program_run.go#L216)
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jaketf commented on issue #8672: Data Fusion Hook Start pipeline will succeed before pipeline is in RUNNING state

Posted by GitBox <gi...@apache.org>.
jaketf commented on issue #8672:
URL: https://github.com/apache/airflow/issues/8672#issuecomment-622503529


   CC: @turbaszek 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jaketf commented on issue #8672: Data Fusion Hook Start pipeline will succeed before pipeline is in RUNNING state

Posted by GitBox <gi...@apache.org>.
jaketf commented on issue #8672:
URL: https://github.com/apache/airflow/issues/8672#issuecomment-622505290


   apologies closing to follow  issue tempate


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #8672: Data Fusion Hook Start pipeline will succeed before pipeline is in RUNNING state

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #8672:
URL: https://github.com/apache/airflow/issues/8672#issuecomment-622502093


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org