You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/02/07 12:55:06 UTC

[GitHub] [airflow] davidpr91 opened a new pull request #21391: Optionally raise an error if source file does not exist in GCSToGCSOperator

davidpr91 opened a new pull request #21391:
URL: https://github.com/apache/airflow/pull/21391


   Right now when using GCSToGCSOperator to copy a file from one bucket to another, if the source file does not exist, nothing happens and the task is considered successful. This could be good for some use cases, for example, when you want to copy all the files from a directory or that match a specific pattern.
   But for some other cases, like when you only want to copy one specific blob, it might be useful to raise an exception if the source file can't be found. Otherwise, the task would be failing silently.
   
   This PR adds the flag "source_object_required" to GCSToGCSOperator to enable this feature. By default, for backward compatibility, the value set to False. If we pass set the flag as True, then, if the source_objects are blobs (not folders or patterns) and they don't exist, the task will fail.
   
   closes: [https://github.com/apache/airflow/issues/21388)]
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #21391: Optionally raise an error if source file does not exist in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #21391:
URL: https://github.com/apache/airflow/pull/21391#issuecomment-1031438108


   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, mypy and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, itโ€™s a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better ๐Ÿš€.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed pull request #21391: Optionally raise an error if source file does not exist in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk closed pull request #21391:
URL: https://github.com/apache/airflow/pull/21391


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] raphaelauv commented on a change in pull request #21391: Optionally raise an error if source file does not exist in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
raphaelauv commented on a change in pull request #21391:
URL: https://github.com/apache/airflow/pull/21391#discussion_r801042926



##########
File path: airflow/providers/google/cloud/transfers/gcs_to_gcs.py
##########
@@ -313,6 +318,14 @@ def _copy_source_without_wildcard(self, hook, prefix):
                 self._copy_single_object(
                     hook=hook, source_object=prefix, destination_object=self.destination_object
                 )
+            else:
+                msg = (
+                    f'{prefix} does not exist in bucket {self.source_bucket}'
+                )
+                self.log.warning(msg)

Review comment:
       maybe move the log inside the if, so current users do not see logs with WARNING level




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] davidpr91 commented on pull request #21391: Optionally raise an error if source file does not exist in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
davidpr91 commented on pull request #21391:
URL: https://github.com/apache/airflow/pull/21391#issuecomment-1035184801


   Hello. I see there is a failing check, but I'm not sure how this is related to my change. Could you help me? Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21391: Optionally raise an error if source file does not exist in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21391:
URL: https://github.com/apache/airflow/pull/21391#issuecomment-1035391963


   It worked indeed :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #21391: Optionally raise an error if source file does not exist in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #21391:
URL: https://github.com/apache/airflow/pull/21391


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #21391: Optionally raise an error if source file does not exist in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #21391:
URL: https://github.com/apache/airflow/pull/21391#issuecomment-1032939056


   The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on a change in pull request #21391: Optionally raise an error if source file does not exist in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
eladkal commented on a change in pull request #21391:
URL: https://github.com/apache/airflow/pull/21391#discussion_r801997888



##########
File path: airflow/providers/google/cloud/transfers/gcs_to_gcs.py
##########
@@ -90,6 +90,9 @@ class GCSToGCSOperator(BaseOperator):
         If set as a sequence, the identities from the list must grant
         Service Account Token Creator IAM role to the directly preceding identity, with first
         account from the list granting this role to the originating account (templated).
+    :param source_object_required: When source_object_required is True, if you want to copy / move a specific blob
+        and it doesn't exist, an exception is raised and the task is marked as failed.
+        This parameter doesn't have any effect when the source_object that you pass is a folder or pattern.

Review comment:
       can we simplify it? `When ___ is true, if ....` is a sentence that is hard to follow.
   You need to fix the sentence any way as it fails the static checks (line to long) so please try to simplify it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] raphaelauv commented on a change in pull request #21391: Optionally raise an error if source file does not exist in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
raphaelauv commented on a change in pull request #21391:
URL: https://github.com/apache/airflow/pull/21391#discussion_r801042926



##########
File path: airflow/providers/google/cloud/transfers/gcs_to_gcs.py
##########
@@ -313,6 +318,14 @@ def _copy_source_without_wildcard(self, hook, prefix):
                 self._copy_single_object(
                     hook=hook, source_object=prefix, destination_object=self.destination_object
                 )
+            else:
+                msg = (
+                    f'{prefix} does not exist in bucket {self.source_bucket}'
+                )
+                self.log.warning(msg)

Review comment:
       maybe move the log inside the if, so current users do not log with WARNING level




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] davidpr91 commented on a change in pull request #21391: Optionally raise an error if source file does not exist in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
davidpr91 commented on a change in pull request #21391:
URL: https://github.com/apache/airflow/pull/21391#discussion_r801802051



##########
File path: airflow/providers/google/cloud/transfers/gcs_to_gcs.py
##########
@@ -313,6 +318,14 @@ def _copy_source_without_wildcard(self, hook, prefix):
                 self._copy_single_object(
                     hook=hook, source_object=prefix, destination_object=self.destination_object
                 )
+            else:
+                msg = (
+                    f'{prefix} does not exist in bucket {self.source_bucket}'
+                )
+                self.log.warning(msg)

Review comment:
       you're right! I've just modified it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] davidpr91 commented on a change in pull request #21391: Optionally raise an error if source file does not exist in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
davidpr91 commented on a change in pull request #21391:
URL: https://github.com/apache/airflow/pull/21391#discussion_r802131217



##########
File path: airflow/providers/google/cloud/transfers/gcs_to_gcs.py
##########
@@ -90,6 +90,9 @@ class GCSToGCSOperator(BaseOperator):
         If set as a sequence, the identities from the list must grant
         Service Account Token Creator IAM role to the directly preceding identity, with first
         account from the list granting this role to the originating account (templated).
+    :param source_object_required: When source_object_required is True, if you want to copy / move a specific blob
+        and it doesn't exist, an exception is raised and the task is marked as failed.
+        This parameter doesn't have any effect when the source_object that you pass is a folder or pattern.

Review comment:
       I've simplified the description. Thanks for the feedback!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] davidpr91 commented on pull request #21391: Optionally raise an error if source file does not exist in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
davidpr91 commented on pull request #21391:
URL: https://github.com/apache/airflow/pull/21391#issuecomment-1035210888


   @potiuk It worked. Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] davidpr91 commented on pull request #21391: Optionally raise an error if source file does not exist in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
davidpr91 commented on pull request #21391:
URL: https://github.com/apache/airflow/pull/21391#issuecomment-1032785990


   @potiuk I've added a unit test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21391: Optionally raise an error if source file does not exist in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21391:
URL: https://github.com/apache/airflow/pull/21391#issuecomment-1035338818


   Not entirely it seems:). Let me close/reopen to rebuild. I think we need to add a trrigger for another even in our CI :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #21391: Optionally raise an error if source file does not exist in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #21391:
URL: https://github.com/apache/airflow/pull/21391#issuecomment-1035392189


   Awesome work, congrats on your first merged pull request!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21391: Optionally raise an error if source file does not exist in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21391:
URL: https://github.com/apache/airflow/pull/21391#issuecomment-1031445763


   Could you please add a unit test for it ? You can use the existing tests as a base.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21391: Optionally raise an error if source file does not exist in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21391:
URL: https://github.com/apache/airflow/pull/21391#issuecomment-1032963743


   Any chance to fix the static check @davidpr91 (shortly as I am preparing RC2 release tonight)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] davidpr91 commented on pull request #21391: Optionally raise an error if source file does not exist in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
davidpr91 commented on pull request #21391:
URL: https://github.com/apache/airflow/pull/21391#issuecomment-1033147323


   @potiuk static checks should be fixed now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21391: Optionally raise an error if source file does not exist in GCSToGCSOperator

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21391:
URL: https://github.com/apache/airflow/pull/21391#issuecomment-1035202074


   > Hello. I see there is a failing check, but I'm not sure how this is related to my change. Could you help me? Thank you!
   
   Just rebase. There is this new feature of GitHub that you can try:
   
   ![image](https://user-images.githubusercontent.com/595491/153462366-499e2d77-fea0-4eb9-aa75-3ba608bef7ac.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org