You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/11/03 17:26:57 UTC
[GitHub] [airflow] billsmithatg opened a new issue, #27488: GCSSynchronizeBucketsOperator fails on 30-second timeout on large files
billsmithatg opened a new issue, #27488:
URL: https://github.com/apache/airflow/issues/27488
### Apache Airflow Provider(s)
google
### Versions of Apache Airflow Providers
apache-airflow-providers-google==6.7.0
### Apache Airflow version
v2.2.3+composer
### Operating System
whatever Linux version Cloud Composer uses
### Deployment
Composer
### Deployment details
_No response_
### What happened
Using GCSSynchronizeBucketsOperator to clone a folder between two buckets. There are several thousand objects, and they range in size from 50MB to 10GB.
At some point the operation fails with this error: `google.api_core.exceptions.GoogleAPICallError: 413 POST https://storage.googleapis.com/storage/v1/b/<REDACTED>/o/zoominfo%2FZI_COMP_DESCRIPTION_20211001.csv.gz/copyTo/b/<REDACTED>/o/zoominfo2%2FZI_COMP_DESCRIPTION_20211001.csv.gz?prettyPrint=false: Copy spanning locations and/or storage classes could not complete within 30 seconds. Please use the Rewrite method in the JSON API (https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite) instead.
### What you think should happen instead
GCSSynchronizeBucketsOperator should use the Rewrite method so that it can deal with copies that do not complete within 30 seconds.
### How to reproduce
Use GCSSynchronizeBucketsOperator to copy a 10GB file.
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #27488: GCSSynchronizeBucketsOperator fails on 30-second timeout on large files
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #27488:
URL: https://github.com/apache/airflow/issues/27488#issuecomment-1302442702
Thanks for opening your first issue here! Be sure to follow the issue template!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] eladkal commented on issue #27488: GCSSynchronizeBucketsOperator fails on 30-second timeout on large files
Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #27488:
URL: https://github.com/apache/airflow/issues/27488#issuecomment-1304611094
feel free to submit PR.
It seems like the operator is using the [copy](https://github.com/apache/airflow/blob/9ab1a6a3e70b32a3cddddf0adede5d2f3f7e29ea/airflow/providers/google/cloud/hooks/gcs.py#L1064) function though the [rewrite](https://github.com/apache/airflow/blob/9ab1a6a3e70b32a3cddddf0adede5d2f3f7e29ea/airflow/providers/google/cloud/hooks/gcs.py#L205) function is available on the hook. It looks simple enough to make this change.
For your PR please provide detail explanation (in the PR) about why this change is better and clarify what is expected to happen to users who relay on current operator behavior.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] GCSSynchronizeBucketsOperator fails on 30-second timeout on large files [airflow]
Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk closed issue #27488: GCSSynchronizeBucketsOperator fails on 30-second timeout on large files
URL: https://github.com/apache/airflow/issues/27488
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] eladkal commented on issue #27488: GCSSynchronizeBucketsOperator fails on 30-second timeout on large files
Posted by "eladkal (via GitHub)" <gi...@apache.org>.
eladkal commented on issue #27488:
URL: https://github.com/apache/airflow/issues/27488#issuecomment-1476393969
Sure. Assigned to you
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jmelot commented on issue #27488: GCSSynchronizeBucketsOperator fails on 30-second timeout on large files
Posted by "jmelot (via GitHub)" <gi...@apache.org>.
jmelot commented on issue #27488:
URL: https://github.com/apache/airflow/issues/27488#issuecomment-1476214139
Hi @eladkal, I ran into what seemed like a similar issue the other day. If it's still useful, I'll try to submit a PR for this within the next week or so.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] GCSSynchronizeBucketsOperator fails on 30-second timeout on large files [airflow]
Posted by "jmelot (via GitHub)" <gi...@apache.org>.
jmelot commented on issue #27488:
URL: https://github.com/apache/airflow/issues/27488#issuecomment-1936815665
@kevgeo I never was able to get a chance to work on this, glad you have a fix!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] GCSSynchronizeBucketsOperator fails on 30-second timeout on large files [airflow]
Posted by "kevgeo (via GitHub)" <gi...@apache.org>.
kevgeo commented on issue #27488:
URL: https://github.com/apache/airflow/issues/27488#issuecomment-1922233990
Hi @jmelot, are you still working on this issue? I was able to reproduce to it and can provide a fix.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org