You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/06/09 16:27:02 UTC

[GitHub] [airflow] edejong commented on issue #8903: BigQueryHook refactor + deterministic BQ Job ID

edejong commented on issue #8903:
URL: https://github.com/apache/airflow/issues/8903#issuecomment-641387021


   I should really check GitHub more often, I only saw the notification now.
   
   Let me know if I understand the question correctly: should the BigQueryHook's interface rely on classes from the Google API client library, or should all data be passed in as dictionaries?
   
   I think it's one thing to have the Airflow hooks/operators coupled to the BigQuery REST interface which I guess is what you get passing in the config in a Python dict. This allows you to translate any only example very easily to a DAG.
   
   But it's a much bigger step to rely on the Google client library in the API because that introduces a tight coupling to this specific library. It would only look good if the Airflow code can stay 100% agnostic about what is passed to the library. Can we guarantee that, even for the future? And does that align with other GCP products?
   
   So my personal opinion is stick with the dict :)
   
   As for generating a job id for all job types, I agree that would be a very good move. Without it you would have to wait for a response to even have something to check up on after. That works most of the time, but in cases where it goes wrong it makes it harder to troubleshoot.
   
   I love the suggested job ids string. One small thing I would change is to add a prefix such as `airflow_` or even just `af_` to make it even easier to spot these in Stackdriver for example. I would generate some well defined job id string every time one wasn't provided by the user.
   
   See https://cloud.google.com/bigquery/docs/running-jobs#generate-jobid for recommendations.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org