You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/20 06:53:43 UTC
[GitHub] [airflow] gfelot opened a new issue #11673: Issue with GCS and GCStoGCSOperator when dropping file
gfelot opened a new issue #11673:
URL: https://github.com/apache/airflow/issues/11673
**Apache Airflow version**: 1.10.9
**Environment**:
- **Cloud provider or hardware configuration**: GCP
- **OS** (e.g. from /etc/os-release): Debian
**What happened**:
It has been 2 weeks that I'm facing a strange issue and my DAG. Most of them start like this. A colleague upload manually a file into a GCS bucket, a cloud fonction is triggered at the end to launch from the API an Airflow DAG. The first task of the DAG is to transfert the file from a "landing zone" into a "save zone", then there rest of the DAG continue.
I move the file from bucket A to B with the GoogleCloudStorageToGoogleCloudStorageOperator. Everything was working since 2 or 3 weeks ago. The oldest DAG is 6 months old and even if we have change some stuff it's on the other part of the DAG, so we never touch that part for long with correct behaviour.
Now, most of the time the first task, the transfert, fail. The file is well moved, but I don't know what happen but I have this error 2 or 3 time in a row if I retry, and the very next try, with the EXACT same file... it's working. I cannot find the factor that gives me this issue. I'm getting crazy.
```
[2020-10-15 09:34:50,599] {taskinstance.py:868} INFO -
--------------------------------------------------------------------------------
[2020-10-15 09:34:50,620] {taskinstance.py:887} INFO - Executing <Task(GoogleCloudStorageToGoogleCloudStorageOperator): transfer-landing-to-safe> on 2020-10-15T07:34:40+00:00
[2020-10-15 09:34:50,626] {standard_task_runner.py:53} INFO - Started process 31555 to run task
[2020-10-15 09:34:50,775] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: geometrie-preprocessing.transfer-landing-to-safe 2020-10-15T07:34:40+00:00 [running]> blablabla.internal
[2020-10-15 09:34:50,860] {gcs_to_gcs.py:193} INFO - Executing copy of gs://blablabla-landing/geometrie/Track_Geometry-20201005_032915.csv to gs://blablabla-safe/geometrie/original/track_geometry_20201005_032915.csv
[2020-10-15 09:34:50,861] {logging_mixin.py:112} INFO - [2020-10-15 09:34:50,860] {gcp_api_base_hook.py:146} INFO - Getting connection using `google.auth.default()` since no key file is defined for hook.
[2020-10-15 09:34:50,980] {taskinstance.py:1128} ERROR - 404 POST https://storage.googleapis.com/storage/v1/b/blablabla-landing/o/geometrie%2FTrack_Geometry-20201005_032915.csv/rewriteTo/b/blablabla-safe/o/geometrie%2Foriginal%2Ftrack_geometry_20201005_032915.csv: No such object: blablabla-landing/geometrie/Track_Geometry-20201005_032915.csv
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/airflow/models/taskinstance.py", line 966, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.7/dist-packages/airflow/contrib/operators/gcs_to_gcs.py", line 178, in execute
destination_object=self.destination_object)
File "/usr/local/lib/python3.7/dist-packages/airflow/contrib/operators/gcs_to_gcs.py", line 196, in _copy_single_object
self.destination_bucket, destination_object)
File "/usr/local/lib/python3.7/dist-packages/airflow/contrib/hooks/gcs_hook.py", line 135, in rewrite
source=source_object
File "/usr/local/lib/python3.7/dist-packages/google/cloud/storage/blob.py", line 2098, in rewrite
timeout=timeout,
File "/usr/local/lib/python3.7/dist-packages/google/cloud/_http.py", line 423, in api_request
raise exceptions.from_http_response(response)
google.api_core.exceptions.NotFound: 404 POST https://storage.googleapis.com/storage/v1/b/blablabla-landing/o/geometrie%2FTrack_Geometry-20201005_032915.csv/rewriteTo/b/blablabla-safe/o/geometrie%2Foriginal%2Ftrack_geometry_20201005_032915.csv: No such object: blablabla-landing/geometrie/Track_Geometry-20201005_032915.csv
[2020-10-15 09:34:50,984] {taskinstance.py:1185} INFO - Marking task as FAILED.dag_id=geometrie-preprocessing, task_id=transfer-landing-to-safe, execution_date=20201015T073440, start_date=20201015T073450, end_date=20201015T073450
[2020-10-15 09:35:00,556] {logging_mixin.py:112} INFO - [2020-10-15 09:35:00,556] {local_task_job.py:103} INFO - Task exited with return code 1
```
The operator part :
```
transfer_landing_to_safe = GoogleCloudStorageToGoogleCloudStorageOperator(
task_id=f"transfer-landing-to-safe{env_extension}",
source_bucket=f"blablabla-landing{env_extension}",
source_object="{{ dag_run.conf['file_name'] }}",
destination_bucket=f"blablabla-safe{env_extension}",
destination_object="geometrie/original/track_geometry_{{ dag_run.conf['file_name'][-19:] }}",
move_object=True,
google_cloud_storage_conn_id="gcp_conn"
)
```
I added "en masse" last night 25 files from python (so using the API instead of dropping manually) and no errors.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] turbaszek commented on issue #11673: Issue with GCS and GCStoGCSOperator when dropping file
Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #11673:
URL: https://github.com/apache/airflow/issues/11673#issuecomment-712799580
I would suggest migrating to airflow backport of providers packages and using `GCSToGCSOperator`
https://airflow.readthedocs.io/en/latest/backport-providers.html
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] gfelot commented on issue #11673: Issue with GCS and GCStoGCSOperator when dropping file
Posted by GitBox <gi...@apache.org>.
gfelot commented on issue #11673:
URL: https://github.com/apache/airflow/issues/11673#issuecomment-713514422
Thank for pointing this out. I'm looking forward to migrate to 2.0 when available and when it's a bit tested.
But I'm not sure at all that's an airflow issue (not the though I had when I wrote this issue) but I think there is something wrong with the GCS web site itself when dropping a file. I have imported 25 files from a python script that triggered the exact same DAG (so using the API to upload the file) and I had no error !
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] eladkal closed issue #11673: Issue with GCS and GCStoGCSOperator when dropping file
Posted by GitBox <gi...@apache.org>.
eladkal closed issue #11673:
URL: https://github.com/apache/airflow/issues/11673
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] turbaszek commented on issue #11673: Issue with GCS and GCStoGCSOperator when dropping file
Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #11673:
URL: https://github.com/apache/airflow/issues/11673#issuecomment-713608560
@TobKed may be interested in it
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] eladkal commented on issue #11673: Issue with GCS and GCStoGCSOperator when dropping file
Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #11673:
URL: https://github.com/apache/airflow/issues/11673#issuecomment-1041148030
This issue is reported against old version of Airflow and on backport provider.
Please check against latest provider version. If the issue still happens please add reproduce steps
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org