You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/11/03 17:26:57 UTC

[GitHub] [airflow] billsmithatg opened a new issue, #27488: GCSSynchronizeBucketsOperator fails on 30-second timeout on large files

billsmithatg opened a new issue, #27488:
URL: https://github.com/apache/airflow/issues/27488

   ### Apache Airflow Provider(s)
   
   google
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-google==6.7.0
   
   ### Apache Airflow version
   
   v2.2.3+composer
   
   ### Operating System
   
   whatever Linux version Cloud Composer uses
   
   ### Deployment
   
   Composer
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   Using GCSSynchronizeBucketsOperator to clone a folder between two buckets.  There are several thousand objects, and they range in size from 50MB to 10GB.  
   
   At some point the operation fails with this error: `google.api_core.exceptions.GoogleAPICallError: 413 POST https://storage.googleapis.com/storage/v1/b/<REDACTED>/o/zoominfo%2FZI_COMP_DESCRIPTION_20211001.csv.gz/copyTo/b/<REDACTED>/o/zoominfo2%2FZI_COMP_DESCRIPTION_20211001.csv.gz?prettyPrint=false: Copy spanning locations and/or storage classes could not complete within 30 seconds. Please use the Rewrite method in the JSON API (https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite) instead.
   
   
   
   
   ### What you think should happen instead
   
   GCSSynchronizeBucketsOperator should use the Rewrite method so that it can deal with copies that do not complete within 30 seconds.
   
   ### How to reproduce
   
   Use GCSSynchronizeBucketsOperator to copy a 10GB file.
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #27488: GCSSynchronizeBucketsOperator fails on 30-second timeout on large files

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #27488:
URL: https://github.com/apache/airflow/issues/27488#issuecomment-1302442702

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on issue #27488: GCSSynchronizeBucketsOperator fails on 30-second timeout on large files

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #27488:
URL: https://github.com/apache/airflow/issues/27488#issuecomment-1304611094

   feel free to submit PR.
   It seems like the operator is using the [copy](https://github.com/apache/airflow/blob/9ab1a6a3e70b32a3cddddf0adede5d2f3f7e29ea/airflow/providers/google/cloud/hooks/gcs.py#L1064) function though the [rewrite](https://github.com/apache/airflow/blob/9ab1a6a3e70b32a3cddddf0adede5d2f3f7e29ea/airflow/providers/google/cloud/hooks/gcs.py#L205) function is available on the hook. It looks simple enough to make this change.
   
   For your PR please provide detail explanation (in the PR) about why this change is better and clarify what is expected to happen to users who relay on current operator behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] GCSSynchronizeBucketsOperator fails on 30-second timeout on large files [airflow]

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk closed issue #27488: GCSSynchronizeBucketsOperator fails on 30-second timeout on large files
URL: https://github.com/apache/airflow/issues/27488


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on issue #27488: GCSSynchronizeBucketsOperator fails on 30-second timeout on large files

Posted by "eladkal (via GitHub)" <gi...@apache.org>.
eladkal commented on issue #27488:
URL: https://github.com/apache/airflow/issues/27488#issuecomment-1476393969

   Sure. Assigned to you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jmelot commented on issue #27488: GCSSynchronizeBucketsOperator fails on 30-second timeout on large files

Posted by "jmelot (via GitHub)" <gi...@apache.org>.
jmelot commented on issue #27488:
URL: https://github.com/apache/airflow/issues/27488#issuecomment-1476214139

   Hi @eladkal, I ran into what seemed like a similar issue the other day. If it's still useful, I'll try to submit a PR for this within the next week or so.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] GCSSynchronizeBucketsOperator fails on 30-second timeout on large files [airflow]

Posted by "jmelot (via GitHub)" <gi...@apache.org>.
jmelot commented on issue #27488:
URL: https://github.com/apache/airflow/issues/27488#issuecomment-1936815665

   @kevgeo I never was able to get a chance to work on this, glad you have a fix!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] GCSSynchronizeBucketsOperator fails on 30-second timeout on large files [airflow]

Posted by "kevgeo (via GitHub)" <gi...@apache.org>.
kevgeo commented on issue #27488:
URL: https://github.com/apache/airflow/issues/27488#issuecomment-1922233990

   Hi @jmelot, are you still working on this issue? I was able to reproduce to it and can provide a fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org