You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/05 17:26:07 UTC

[GitHub] [airflow] turbaszek opened a new pull request #11287: Improve idempotency of BigQuery operators

turbaszek opened a new pull request #11287:
URL: https://github.com/apache/airflow/pull/11287


   Previously the job_id was always automatically generated but in
   refactor this was lost.
   
   Closes: #11282
   Closes: #11280
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] TobKed commented on a change in pull request #11287: Improve handling of job_id in BigQuery operators

Posted by GitBox <gi...@apache.org>.
TobKed commented on a change in pull request #11287:
URL: https://github.com/apache/airflow/pull/11287#discussion_r500037926



##########
File path: airflow/providers/google/cloud/hooks/bigquery.py
##########
@@ -1443,6 +1445,12 @@ def get_job(
         job = client.get_job(job_id=job_id, project=project_id, location=location)
         return job
 
+    @staticmethod
+    def _custom_job_id(configuration: Dict[str, Any]) -> str:
+        hash_base = json.dumps(configuration, sort_keys=True)
+        uniqueness_suffix = hashlib.md5(hash_base.encode()).hexdigest()
+        return f"airflow_{int(time.time())}_{uniqueness_suffix}"

Review comment:
       Is it not safer to use microseconds from epoch than seconds?
   
   ```suggestion
           microseconds_from_epoch = int((datetime.now() - datetime.fromtimestamp(0)) / timedelta(microseconds=1))
           return f"airflow_{microseconds_from_epoch}_{uniqueness_suffix}"
   ```
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] TobKed commented on a change in pull request #11287: Improve handling of job_id in BigQuery operators

Posted by GitBox <gi...@apache.org>.
TobKed commented on a change in pull request #11287:
URL: https://github.com/apache/airflow/pull/11287#discussion_r500109244



##########
File path: airflow/providers/google/cloud/hooks/bigquery.py
##########
@@ -1443,6 +1445,12 @@ def get_job(
         job = client.get_job(job_id=job_id, project=project_id, location=location)
         return job
 
+    @staticmethod
+    def _custom_job_id(configuration: Dict[str, Any]) -> str:
+        hash_base = json.dumps(configuration, sort_keys=True)
+        uniqueness_suffix = hashlib.md5(hash_base.encode()).hexdigest()
+        return f"airflow_{int(time.time())}_{uniqueness_suffix}"

Review comment:
       imports required for suggestion above `from datetime import datetime, timedelta`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek merged pull request #11287: Improve handling of job_id in BigQuery operators

Posted by GitBox <gi...@apache.org>.
turbaszek merged pull request #11287:
URL: https://github.com/apache/airflow/pull/11287


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org