You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/20 06:53:43 UTC

[GitHub] [airflow] gfelot opened a new issue #11673: Issue with GCS and GCStoGCSOperator when dropping file

gfelot opened a new issue #11673:
URL: https://github.com/apache/airflow/issues/11673


   **Apache Airflow version**: 1.10.9
   
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**: GCP
   - **OS** (e.g. from /etc/os-release): Debian
   
   
   **What happened**:
   
   It has been 2 weeks that I'm facing a strange issue and my DAG. Most of them start like this. A colleague upload manually a file into a GCS bucket, a cloud fonction is triggered at the end to launch from the API an Airflow DAG. The first task of the DAG is to transfert the file from a "landing zone" into a "save zone", then there rest of the DAG continue.
   
   I move the file from bucket A to B with the GoogleCloudStorageToGoogleCloudStorageOperator. Everything was working since 2 or 3 weeks ago. The oldest DAG is 6 months old and even if we have change some stuff it's on the other part of the DAG, so we never touch that part for long with correct behaviour.
   
   Now, most of the time the first task, the transfert, fail. The file is well moved, but I don't know what happen but I have this error 2 or 3 time in a row if I retry, and the very next try, with the EXACT same file... it's working. I cannot find the factor that gives me this issue. I'm getting crazy.
   
   ```
   [2020-10-15 09:34:50,599] {taskinstance.py:868} INFO - 
   --------------------------------------------------------------------------------
   [2020-10-15 09:34:50,620] {taskinstance.py:887} INFO - Executing <Task(GoogleCloudStorageToGoogleCloudStorageOperator): transfer-landing-to-safe> on 2020-10-15T07:34:40+00:00
   [2020-10-15 09:34:50,626] {standard_task_runner.py:53} INFO - Started process 31555 to run task
   [2020-10-15 09:34:50,775] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: geometrie-preprocessing.transfer-landing-to-safe 2020-10-15T07:34:40+00:00 [running]> blablabla.internal
   [2020-10-15 09:34:50,860] {gcs_to_gcs.py:193} INFO - Executing copy of gs://blablabla-landing/geometrie/Track_Geometry-20201005_032915.csv to gs://blablabla-safe/geometrie/original/track_geometry_20201005_032915.csv
   [2020-10-15 09:34:50,861] {logging_mixin.py:112} INFO - [2020-10-15 09:34:50,860] {gcp_api_base_hook.py:146} INFO - Getting connection using `google.auth.default()` since no key file is defined for hook.
   [2020-10-15 09:34:50,980] {taskinstance.py:1128} ERROR - 404 POST https://storage.googleapis.com/storage/v1/b/blablabla-landing/o/geometrie%2FTrack_Geometry-20201005_032915.csv/rewriteTo/b/blablabla-safe/o/geometrie%2Foriginal%2Ftrack_geometry_20201005_032915.csv: No such object: blablabla-landing/geometrie/Track_Geometry-20201005_032915.csv
   Traceback (most recent call last):
     File "/usr/local/lib/python3.7/dist-packages/airflow/models/taskinstance.py", line 966, in _run_raw_task
       result = task_copy.execute(context=context)
     File "/usr/local/lib/python3.7/dist-packages/airflow/contrib/operators/gcs_to_gcs.py", line 178, in execute
       destination_object=self.destination_object)
     File "/usr/local/lib/python3.7/dist-packages/airflow/contrib/operators/gcs_to_gcs.py", line 196, in _copy_single_object
       self.destination_bucket, destination_object)
     File "/usr/local/lib/python3.7/dist-packages/airflow/contrib/hooks/gcs_hook.py", line 135, in rewrite
       source=source_object
     File "/usr/local/lib/python3.7/dist-packages/google/cloud/storage/blob.py", line 2098, in rewrite
       timeout=timeout,
     File "/usr/local/lib/python3.7/dist-packages/google/cloud/_http.py", line 423, in api_request
       raise exceptions.from_http_response(response)
   google.api_core.exceptions.NotFound: 404 POST https://storage.googleapis.com/storage/v1/b/blablabla-landing/o/geometrie%2FTrack_Geometry-20201005_032915.csv/rewriteTo/b/blablabla-safe/o/geometrie%2Foriginal%2Ftrack_geometry_20201005_032915.csv: No such object: blablabla-landing/geometrie/Track_Geometry-20201005_032915.csv
   [2020-10-15 09:34:50,984] {taskinstance.py:1185} INFO - Marking task as FAILED.dag_id=geometrie-preprocessing, task_id=transfer-landing-to-safe, execution_date=20201015T073440, start_date=20201015T073450, end_date=20201015T073450
   [2020-10-15 09:35:00,556] {logging_mixin.py:112} INFO - [2020-10-15 09:35:00,556] {local_task_job.py:103} INFO - Task exited with return code 1
   ```
   
   The operator part :
   
   ```
   transfer_landing_to_safe = GoogleCloudStorageToGoogleCloudStorageOperator(
           task_id=f"transfer-landing-to-safe{env_extension}",
           source_bucket=f"blablabla-landing{env_extension}",
           source_object="{{ dag_run.conf['file_name'] }}",
           destination_bucket=f"blablabla-safe{env_extension}",
           destination_object="geometrie/original/track_geometry_{{ dag_run.conf['file_name'][-19:] }}",
           move_object=True,
           google_cloud_storage_conn_id="gcp_conn"
       )
   
   ```
   
   I added  "en masse" last night 25 files from python (so using the API instead of dropping manually) and no errors.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #11673: Issue with GCS and GCStoGCSOperator when dropping file

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #11673:
URL: https://github.com/apache/airflow/issues/11673#issuecomment-712799580


   I would suggest migrating to airflow backport of providers packages and using `GCSToGCSOperator`
   https://airflow.readthedocs.io/en/latest/backport-providers.html


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] gfelot commented on issue #11673: Issue with GCS and GCStoGCSOperator when dropping file

Posted by GitBox <gi...@apache.org>.
gfelot commented on issue #11673:
URL: https://github.com/apache/airflow/issues/11673#issuecomment-713514422


   Thank for pointing this out. I'm looking forward to migrate to 2.0 when available and when it's a bit tested.
   
   But I'm not sure at all that's an airflow issue (not the though I had when I wrote this issue) but I think there is something wrong with the GCS web site itself when dropping a file. I have imported 25 files from a python script that triggered the exact same DAG (so using the API to upload the file) and I had no error !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal closed issue #11673: Issue with GCS and GCStoGCSOperator when dropping file

Posted by GitBox <gi...@apache.org>.
eladkal closed issue #11673:
URL: https://github.com/apache/airflow/issues/11673


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #11673: Issue with GCS and GCStoGCSOperator when dropping file

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #11673:
URL: https://github.com/apache/airflow/issues/11673#issuecomment-713608560


   @TobKed may be interested in it 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #11673: Issue with GCS and GCStoGCSOperator when dropping file

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #11673:
URL: https://github.com/apache/airflow/issues/11673#issuecomment-1041148030


   This issue is reported against old version of Airflow and on backport provider.
   Please check against latest provider version. If the issue still happens please add reproduce steps


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org