You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "vksunilk (via GitHub)" <gi...@apache.org> on 2023/02/03 15:19:34 UTC

[GitHub] [airflow] vksunilk commented on pull request #29346: Check Absence of files or objects via GCSObjectExistenceSensor

vksunilk commented on PR #29346:
URL: https://github.com/apache/airflow/pull/29346#issuecomment-1416017293

   > I think the purpose of `GCSObjectExistenceSensor` is to wait for a file to present. Why do we need these changes? It does not fail for me if the file is not present it waits for the file to get created and then timeout based on my `poke_interval` and `timeout` param.
   > 
   > ```
   > [2023-02-03, 15:06:41 UTC] {taskinstance.py:1524} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='bq_check_op' AIRFLOW_CTX_TASK_ID='gcs_object_exists_task' AIRFLOW_CTX_EXECUTION_DATE='2023-02-03T15:06:08.323807+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='manual__2023-02-03T15:06:08.323807+00:00'
   > [2023-02-03, 15:06:41 UTC] {gcs.py:94} INFO - Sensor checks existence of : test-gcs-bucket-providers, example_gcs.py
   > [2023-02-03, 15:06:41 UTC] {base.py:73} INFO - Using connection ID 'google_cloud_default' for task execution.
   > [2023-02-03, 15:06:48 UTC] {gcs.py:94} INFO - Sensor checks existence of : test-gcs-bucket-providers, example_gcs.py
   > [2023-02-03, 15:06:48 UTC] {base.py:73} INFO - Using connection ID 'google_cloud_default' for task execution.
   > [2023-02-03, 15:06:56 UTC] {gcs.py:94} INFO - Sensor checks existence of : test-gcs-bucket-providers, example_gcs.py
   > [2023-02-03, 15:06:56 UTC] {base.py:73} INFO - Using connection ID 'google_cloud_default' for task execution.
   > [2023-02-03, 15:07:03 UTC] {gcs.py:94} INFO - Sensor checks existence of : test-gcs-bucket-providers, example_gcs.py
   > [2023-02-03, 15:07:03 UTC] {base.py:73} INFO - Using connection ID 'google_cloud_default' for task execution.
   > [2023-02-03, 15:07:05 UTC] {taskinstance.py:1798} ERROR - Task failed with exception
   > Traceback (most recent call last):
   >   File "/opt/airflow/airflow/sensors/base.py", line 216, in execute
   >     raise AirflowSensorTimeout(message)
   > airflow.exceptions.AirflowSensorTimeout: Sensor has timed out; run duration of 24.36089755300054 seconds exceeds the specified timeout of 20.
   > [2023-02-03, 15:07:05 UTC] {taskinstance.py:1338} INFO - Immediate failure requested. Marking task as FAILED. dag_id=bq_check_op, task_id=gcs_object_exists_task, execution_date=20230203T150608, start_date=20230203T150640, end_date=20230203T150705
   > [2023-02-03, 15:07:05 UTC] {standard_task_runner.py:105} ERROR - Failed to execute job 147 for task gcs_object_exists_task (Sensor has timed out; run duration of 24.36089755300054 seconds exceeds the specified timeout of 20.; 5779)
   > [2023-02-03, 15:07:05 UTC] {local_task_job.py:215} INFO - Task exited with return code 1
   > [2023-02-03, 15:07:05 UTC] {taskinstance.py:2616} INFO - 0 downstream tasks scheduled from follow-on schedule check
   > ```
   
   Yes. Incase, the user needs a case where he needs to wait for the file to be deleted by an external task. This can be useful. This is one such usecase.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org