You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Lee-W (via GitHub)" <gi...@apache.org> on 2023/03/10 08:27:25 UTC

[GitHub] [airflow] Lee-W opened a new pull request, #30014: Merge async logic from GCSObjectExistenceAsyncSensor to GCSObjectExistenceSens

Lee-W opened a new pull request, #30014:
URL: https://github.com/apache/airflow/pull/30014

   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of an existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   
   ### Why making this change?
   apache-airflow-providers-google treats the deferred execution of its operators and sensors in a different way which might cause confusion. For example, [BigQueryCheckOperator](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/operators/bigquery.py#L200) uses the [deferrable](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/operators/bigquery.py#L200) parameter to toggle deferred execution while we need to use another sensor for [GCSObjectExistenceSensor](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/sensors/gcs.py#L39) (i.e., [GCSObjectExistenceAsyncSensor](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/sensors/gcs.py#L103)) to achieve the same thing.
   
   ### What's changed?
   In this pull request, I move the deferred logic from [GCSObjectExistenceAsyncSensor](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/sensors/gcs.py#L103) to [GCSObjectExistenceSensor](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/sensors/gcs.py#L39) so that we can keep the consistency and reduce maintenance effort. Instead of removing [GCSObjectExistenceAsyncSensor](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/sensors/gcs.py#L103), I add a deprecation warning in case there're users using it.
   
   ---
   
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Lee-W commented on a diff in pull request #30014: Merge GCSObjectExistenceAsyncSensor logic to GCSObjectExistenceSensor

Posted by "Lee-W (via GitHub)" <gi...@apache.org>.
Lee-W commented on code in PR #30014:
URL: https://github.com/apache/airflow/pull/30014#discussion_r1142201296


##########
airflow/providers/google/cloud/sensors/gcs.py:
##########
@@ -99,10 +102,43 @@ def poke(self, context: Context) -> bool:
         )
         return hook.exists(self.bucket, self.object, self.retry)
 
+    def execute(self, context: Context) -> None:
+        """Airflow runs this method on the worker and defers using the trigger."""
+        if self.deferrable is False:

Review Comment:
   I just changed it into `if not self.deferrable`. As the default value of `self.deferrable` is `False`, I think it might be better to keep it as it is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Lee-W commented on pull request #30014: Merge GCSObjectExistenceAsyncSensor logic to GCSObjectExistenceSensor

Posted by "Lee-W (via GitHub)" <gi...@apache.org>.
Lee-W commented on PR #30014:
URL: https://github.com/apache/airflow/pull/30014#issuecomment-1477331581

   @uranusjr I noticed some of the CI steps are skipped. Should we restart them?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Lee-W commented on a diff in pull request #30014: Merge async logic from GCSObjectExistenceAsyncSensor to GCSObjectExistenceSens

Posted by "Lee-W (via GitHub)" <gi...@apache.org>.
Lee-W commented on code in PR #30014:
URL: https://github.com/apache/airflow/pull/30014#discussion_r1142012121


##########
docs/apache-airflow-providers-google/operators/cloud/gcs.rst:
##########
@@ -164,14 +164,20 @@ Use the :class:`~airflow.providers.google.cloud.sensors.gcs.GCSObjectExistenceSe
     :start-after: [START howto_sensor_object_exists_task]
     :end-before: [END howto_sensor_object_exists_task]
 
+Also you can use deferrable mode in this operator if you would like to free up the worker slots while the sensor is running.
+
+.. exampleinclude:: /../../tests/system/providers/google/cloud/gcs/example_gcs_sensor.py
+    :language: python
+    :dedent: 4
+    :start-after: [START howto_sensor_object_exists_task_defered]
+    :end-before: [END howto_sensor_object_exists_task_defered]
 
 .. _howto/sensor:GCSObjectExistenceAsyncSensor:
 
 GCSObjectExistenceAsyncSensor
 -----------------------------
 
-Use the :class:`~airflow.providers.google.cloud.sensors.gcs.GCSObjectExistenceAsyncSensor`
-(deferrable version) if you would like to free up the worker slots while the sensor is running.
+:class:`~airflow.providers.google.cloud.sensors.gcs.GCSObjectExistenceAsyncSensor` is deprecated and will be removed in a future release. Please use :class:`~airflow.providers.google.cloud.sensors.gcs.GCSObjectExistenceSensor` and use the deferrable mode in that operator.

Review Comment:
   If we remove the documentation here, the CI will not pass. It says we miss the example DAG for `GCSObjectExistenceAsyncSensor`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #30014: Merge async logic from GCSObjectExistenceAsyncSensor to GCSObjectExistenceSens

Posted by "uranusjr (via GitHub)" <gi...@apache.org>.
uranusjr commented on code in PR #30014:
URL: https://github.com/apache/airflow/pull/30014#discussion_r1141980826


##########
airflow/providers/google/cloud/sensors/gcs.py:
##########
@@ -99,10 +102,43 @@ def poke(self, context: Context) -> bool:
         )
         return hook.exists(self.bucket, self.object, self.retry)
 
+    def execute(self, context: Context) -> None:
+        """Airflow runs this method on the worker and defers using the trigger."""
+        if self.deferrable is False:

Review Comment:
   ```suggestion
           if not self.deferrable:
   ```
   
   (Maybe the entire block can be flipped to `if self.deferrable ... else`?)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on pull request #30014: Merge GCSObjectExistenceAsyncSensor logic to GCSObjectExistenceSensor

Posted by "uranusjr (via GitHub)" <gi...@apache.org>.
uranusjr commented on PR #30014:
URL: https://github.com/apache/airflow/pull/30014#issuecomment-1477337511

   Looks like the worker timed out. Retrying.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #30014: Merge GCSObjectExistenceAsyncSensor logic to GCSObjectExistenceSensor

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk merged PR #30014:
URL: https://github.com/apache/airflow/pull/30014


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #30014: Merge GCSObjectExistenceAsyncSensor logic to GCSObjectExistenceSensor

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on PR #30014:
URL: https://github.com/apache/airflow/pull/30014#issuecomment-1477754447

   Nice!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #30014: Merge async logic from GCSObjectExistenceAsyncSensor to GCSObjectExistenceSens

Posted by "uranusjr (via GitHub)" <gi...@apache.org>.
uranusjr commented on code in PR #30014:
URL: https://github.com/apache/airflow/pull/30014#discussion_r1141982376


##########
docs/apache-airflow-providers-google/operators/cloud/gcs.rst:
##########
@@ -164,14 +164,20 @@ Use the :class:`~airflow.providers.google.cloud.sensors.gcs.GCSObjectExistenceSe
     :start-after: [START howto_sensor_object_exists_task]
     :end-before: [END howto_sensor_object_exists_task]
 
+Also you can use deferrable mode in this operator if you would like to free up the worker slots while the sensor is running.
+
+.. exampleinclude:: /../../tests/system/providers/google/cloud/gcs/example_gcs_sensor.py
+    :language: python
+    :dedent: 4
+    :start-after: [START howto_sensor_object_exists_task_defered]
+    :end-before: [END howto_sensor_object_exists_task_defered]
 
 .. _howto/sensor:GCSObjectExistenceAsyncSensor:
 
 GCSObjectExistenceAsyncSensor
 -----------------------------
 
-Use the :class:`~airflow.providers.google.cloud.sensors.gcs.GCSObjectExistenceAsyncSensor`
-(deferrable version) if you would like to free up the worker slots while the sensor is running.
+:class:`~airflow.providers.google.cloud.sensors.gcs.GCSObjectExistenceAsyncSensor` is deprecated and will be removed in a future release. Please use :class:`~airflow.providers.google.cloud.sensors.gcs.GCSObjectExistenceSensor` and use the deferrable mode in that operator.

Review Comment:
   I don’t recall our deprecation strategy around providers; should the documentation simply be removed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org