You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/03/25 13:07:18 UTC

[GitHub] [airflow] XD-DENG opened a new issue #15005: `GCSToLocalFilesystemOperator` unnecessarily downloads objects when it checks object size

XD-DENG opened a new issue #15005:
URL: https://github.com/apache/airflow/issues/15005


   `GCSToLocalFilesystemOperator` in `airflow/providers/google/cloud/transfers/gcs_to_local.py` checks the file size if `store_to_xcom_key` is `True`.
   
   https://github.com/apache/airflow/blob/b40dffa08547b610162f8cacfa75847f3c4ca364/airflow/providers/google/cloud/transfers/gcs_to_local.py#L137-L142
   
   How it checks size is to download the object as `bytes` then check the size. This unnecessarily download the objects, because `google.cloud.storage.blob.Blob` itself already has a `size` property ([documentation reference](https://googleapis.dev/python/storage/1.30.0/blobs.html#google.cloud.storage.blob.Blob.size)).
   
   In extreme cases, if the object is big size, it adds unnecessary burden on the instance resources.
   
   A new method, `object_size()`, can be added to `GCSHook`, then this can be addressed in `GCSToLocalFilesystemOperator`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil closed issue #15005: `GCSToLocalFilesystemOperator` unnecessarily downloads objects when it checks object size

Posted by GitBox <gi...@apache.org>.
kaxil closed issue #15005:
URL: https://github.com/apache/airflow/issues/15005


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] XD-DENG commented on issue #15005: `GCSToLocalFilesystemOperator` unnecessarily downloads objects when it checks object size

Posted by GitBox <gi...@apache.org>.
XD-DENG commented on issue #15005:
URL: https://github.com/apache/airflow/issues/15005#issuecomment-806736519


   Sorry @potiuk I think I was wrong. You were not the original author of this chunk, and this code has been implemented 4+ years ago.
   
   But it would still be appreciated if you have time to help share your inputs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal edited a comment on issue #15005: `GCSToLocalFilesystemOperator` unnecessarily downloads objects when it checks object size

Posted by GitBox <gi...@apache.org>.
eladkal edited a comment on issue #15005:
URL: https://github.com/apache/airflow/issues/15005#issuecomment-818654098






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] XD-DENG commented on issue #15005: `GCSToLocalFilesystemOperator` unnecessarily downloads objects when it checks object size

Posted by GitBox <gi...@apache.org>.
XD-DENG commented on issue #15005:
URL: https://github.com/apache/airflow/issues/15005#issuecomment-806710661


   CC @potiuk as you were the original author if I'm wrong :)
   
   Please correct me if anything in my analysis above is incorrect, or maybe there was some reasons for what you implemented in this way? 
   
   Cheers!
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #15005: `GCSToLocalFilesystemOperator` unnecessarily downloads objects when it checks object size

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #15005:
URL: https://github.com/apache/airflow/issues/15005#issuecomment-818654098


   > A new method, object_size(), can be added to GCSHook, then this can be addressed in GCSToLocalFilesystemOperator.
   
   We already have that method in the hook:
   
   https://github.com/apache/airflow/blob/1a85ba9e93d44601a322546e31814bd9ef11c125/airflow/providers/google/cloud/hooks/gcs.py#L776-L793
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] XD-DENG edited a comment on issue #15005: `GCSToLocalFilesystemOperator` unnecessarily downloads objects when it checks object size

Posted by GitBox <gi...@apache.org>.
XD-DENG edited a comment on issue #15005:
URL: https://github.com/apache/airflow/issues/15005#issuecomment-806710661


   CC @potiuk as you were the original author if I'm not wrong :)
   
   Please correct me if anything in my analysis above is incorrect, or maybe there was some reasons for what you implemented in this way? 
   
   Cheers!
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org