You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/02/14 22:27:28 UTC

[GitHub] [airflow] digger edited a comment on issue #6371: [AIRFLOW-5691] Rewrite Dataproc operators to use python library

digger edited a comment on issue #6371: [AIRFLOW-5691] Rewrite Dataproc operators to use python library
URL: https://github.com/apache/airflow/pull/6371#issuecomment-586502594
 
 
   @dossett, the functionality added in AIRFLOW-3211 actually broke the behavior of the dataproc hook and made a few 1.10.x releases unusable for dataproc users. The problem is that the hook only uses the task ID part of the dataproc job ID when looking for previous invocations of the job, so if dataproc history still has jobs corresponding to any of the previous dag runs, the dataproc hook doesn't execute the job. A proper way to implement this would be to associate dataproc jobs with particular dag runs by e.g. embedding a dag run id hash in the dataproc job id. In any case the functionality added in  AIRFLOW-3211 has to be optional. In our experience, users expect dataproc jobs to be re-executed when they re-execute the task, and this new behavior creates a lot of confusion.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services