You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/04/01 16:25:48 UTC

[GitHub] [airflow] Yao-ATG opened a new issue #22675: GCSToGCSOperator cannot copy a single file/folder without copying other files/folders with that prefix

Yao-ATG opened a new issue #22675:
URL: https://github.com/apache/airflow/issues/22675


   ### Apache Airflow Provider(s)
   
   google
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Apache Airflow version
   
   2.2.4 (latest released)
   
   ### Operating System
   
   MacOS 12.2.1
   
   ### Deployment
   
   Composer
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   I have file "hourse.jpeg" and "hourse.jpeg.copy" and a folder "hourse.jpeg.folder" in source bucket.
   I use the following code to try to copy only "hourse.jpeg" to another bucket.
    gcs_to_gcs_op = GCSToGCSOperator(
           task_id="gcs_to_gcs",
           source_bucket=my_source_bucket,
           source_object="hourse.jpeg",
           destination_bucket=my_destination_bucket
       )
   
   The result is the two files and one folder mentioned above are copied.
   From the source code it seems there is no way to do what i want.
   
   ### What you think should happen instead
   
   Only the file specified should be copied, that means we should treat source_object as exact match instead of prefix.
   To accomplish the current behavior as prefix, the user can/should use wild char
       source_object="hourse.jpeg*"
   
   
   ### How to reproduce
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #22675: GCSToGCSOperator cannot copy a single file/folder without copying other files/folders with that prefix

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #22675:
URL: https://github.com/apache/airflow/issues/22675#issuecomment-1085984856


   See the documentation (docstring) where you can have examples.
   
   As unitutitive as it is, source_object is a wildcard specification by default. If you want to copy single object you need specify it like that: 
   
   ```
   source_objects = [ 'your_object' ] 
   ```
   
   See examples here: https://airflow.apache.org/docs/apache-airflow-providers-google/stable/_api/airflow/providers/google/cloud/transfers/gcs_to_gcs/index.html
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Yao-ATG commented on issue #22675: GCSToGCSOperator cannot copy a single file/folder without copying other files/folders with that prefix

Posted by GitBox <gi...@apache.org>.
Yao-ATG commented on issue #22675:
URL: https://github.com/apache/airflow/issues/22675#issuecomment-1086027371


   > See the documentation (docstring) where you can have examples.
   > 
   > As unitutitive as it is, source_object is a wildcard specification by default. If you want to copy single object you need specify it like that:
   > 
   > ```
   > source_objects = [ 'your_object' ] 
   > ```
   > 
   > See examples here: https://airflow.apache.org/docs/apache-airflow-providers-google/stable/_api/airflow/providers/google/cloud/transfers/gcs_to_gcs/index.html
   
   Unfortunately using source_objects instead of source_object doesn't help.
   I verified it, by running the DAG,  both before opening the issue and after your reply.
   Also from source code we can see there is no difference between specifying an file in source_object and in source_objects.
   https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/transfers/gcs_to_gcs.py, line 246 we have
           if self.source_object:
               self.source_objects = [self.source_object]
   and we work only on source_objects afterwords, treating the object as prefix.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #22675: GCSToGCSOperator cannot copy a single file/folder without copying other files/folders with that prefix

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #22675:
URL: https://github.com/apache/airflow/issues/22675#issuecomment-1086135446


   I think flag with "exact_match_when_no_wildcard" might be a solution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #22675: GCSToGCSOperator cannot copy a single file/folder without copying other files/folders with that prefix

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #22675:
URL: https://github.com/apache/airflow/issues/22675


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #22675: GCSToGCSOperator cannot a copy a single file/folder without copying other files/folders with that prefix

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #22675:
URL: https://github.com/apache/airflow/issues/22675#issuecomment-1085483843


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #22675: GCSToGCSOperator cannot copy a single file/folder without copying other files/folders with that prefix

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #22675:
URL: https://github.com/apache/airflow/issues/22675#issuecomment-1086113871


   I see I looked at the code and ideed. I marked it as good first issue and maybe someone woudl like to work on it.
   
   Note that the fastest and surest way to get it implemented is if you make a PR yourself and lead it to completion. Would you like to contribute such a change ? Happy to review the code. If not then ti will have to wait for someone to pick it up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Yao-ATG commented on issue #22675: GCSToGCSOperator cannot copy a single file/folder without copying other files/folders with that prefix

Posted by GitBox <gi...@apache.org>.
Yao-ATG commented on issue #22675:
URL: https://github.com/apache/airflow/issues/22675#issuecomment-1086127397


   Before anybody try to fix it, can we clarify the expected behavior?
   
   Let's limit the discussion only on objects without wild char.
   The current behavior is to treat the object as a prefix, no matter specified in source_object or source_objects.
   My expected behavior is to treat object as exact match, also no matter specified in source_object or source_objects, and let users to add wildchar if they want the current result.
   Maybe the intended behavior is to treat source_object as prefix and source_objects as exact match?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #22675: GCSToGCSOperator cannot copy a single file/folder without copying other files/folders with that prefix

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #22675:
URL: https://github.com/apache/airflow/issues/22675#issuecomment-1086134980


   For sure it shoudl be backwards compatible ideally though. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #22675: GCSToGCSOperator cannot copy a single file/folder without copying other files/folders with that prefix

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #22675:
URL: https://github.com/apache/airflow/issues/22675#issuecomment-1086135446


   I think flag with "exact_match_when_no_wildcard" (default False) might be a good solution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Yao-ATG edited a comment on issue #22675: GCSToGCSOperator cannot copy a single file/folder without copying other files/folders with that prefix

Posted by GitBox <gi...@apache.org>.
Yao-ATG edited a comment on issue #22675:
URL: https://github.com/apache/airflow/issues/22675#issuecomment-1086127397


   Before anybody try to fix it, can we clarify the expected behavior?
   
   Let's limit the discussion only on objects without wild char.
   The current behavior is to treat the object as a prefix, no matter specified in source_object or source_objects.
   For the correct behavior we have two options: 
   (1) treat object as exact match, also no matter specified in source_object or source_objects, and let users to add wildchar if they want the current result.
   (2) treat source_object as prefix and source_objects as exact match.
   
   Which option is the one we should take?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #22675: GCSToGCSOperator cannot copy a single file/folder without copying other files/folders with that prefix

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #22675:
URL: https://github.com/apache/airflow/issues/22675#issuecomment-1086133501


   I think that can be discussed when PR is opened. I have no opinion. Maybe you can pick something as a proposal, and the reviewer reviewing the PR migh decide which one is ok.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org