You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/12/04 06:25:04 UTC

[GitHub] [airflow] xinbinhuang opened a new pull request #12803: Add sensors section to describe different modes of sensors

xinbinhuang opened a new pull request #12803:
URL: https://github.com/apache/airflow/pull/12803


   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #12803: Add sensors section to describe different modes of sensors

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #12803:
URL: https://github.com/apache/airflow/pull/12803#discussion_r536274198



##########
File path: docs/apache-airflow/concepts.rst
##########
@@ -448,7 +447,81 @@ Operators are only loaded by Airflow if they are assigned to a DAG.
 
 .. seealso::
     - :ref:`List Airflow operators <pythonapi:operators>`
-    - :doc:`How-to guides for some Airflow operators<howto/operator/index>`.
+    - :doc:`How-to guides for some Airflow operators<howto/operator/index>`
+
+.. _concepts:sensors:
+
+Sensors
+-------
+
+``Sensor`` is an Operator that waits (polls) for a certain time, file, database row, S3 key, , another DAG/task, etc...
+
+There are currently 3 different modes for how a sensor operates:
+
++--------------------+-----------------------+-----------------------+
+| Schedule Mode      | Description           | Use case              |
++====================+=======================+=======================+
+| ``poke`` (default) | The sensor is taking  | Use this mode if the  |
+|                    | up a worker slot for  | expected runtime of   |
+|                    | its whole execution   | the sensor is short   |
+|                    | time and sleeps       | or if a short poke    |
+|                    | between pokes.        | interval is required. |
+|                    |                       | Note that the sensor  |
+|                    |                       | will hold onto a      |
+|                    |                       | worker slot and a     |
+|                    |                       | pool slot for the     |
+|                    |                       | duration of the       |
+|                    |                       | sensor's runtime in   |
+|                    |                       | this mode.            |
++--------------------+-----------------------+-----------------------+
+| ``reschedule``     | The sensor task frees | Use this mode if the  |
+|                    | the worker slot when  | time before the       |
+|                    | the criteria is not   | criteria is met is    |
+|                    | yet met and it's      | expected to be quite  |
+|                    | rescheduled at a      | long. The poke        |
+|                    | later time.           | interval should be    |
+|                    |                       | more than one minute  |
+|                    |                       | to prevent too much   |
+|                    |                       | load on the           |
+|                    |                       | scheduler.            |
++--------------------+-----------------------+-----------------------+
+| ``smart sensor``   | smart sensor is a     | Use this mode if you  |
+|                    | service (run by a     | have a large amount   |
+|                    | builtin DAG) which    | of sensor tasks       |
+|                    | consolidate the       | running in your       |
+|                    | execution of sensors  | airflow cluster. This |
+|                    | in batches. Instead   | can largely reduce    |
+|                    | of holding a long     | airflow’s             |
+|                    | running process for   | infrastructure cost   |
+|                    | each sensor and       | and improve cluster   |
+|                    | poking periodically,  | stability - reduce    |
+|                    | a sensor will only    | meta database load.   |
+|                    | store poke context at |                       |
+|                    | ``sensor_instance``   |                       |
+|                    | table and then exits  |                       |
+|                    | with a 'sensing'      |                       |
+|                    | state.                |                       |
++--------------------+-----------------------+-----------------------+

Review comment:
       I'd probably suggest not making it a table then




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang commented on pull request #12803: Add sensors section to describe different modes of sensors

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on pull request #12803:
URL: https://github.com/apache/airflow/pull/12803#issuecomment-738593544


   cc: @ryw  @kaxil 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12803: Add sensors section to describe different modes of sensors

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12803:
URL: https://github.com/apache/airflow/pull/12803#issuecomment-739389309


   The PR is likely ready to be merged. No tests are needed as no important environment files, nor python files were modified by it. However, committers might decide that full test matrix is needed and add the 'full tests needed' label. Then you should rebase it to the latest master or amend the last commit of the PR, and push it with --force-with-lease.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12803: Add sensors section to describe different modes of sensors

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12803:
URL: https://github.com/apache/airflow/pull/12803#issuecomment-738595114


   [The Workflow run](https://github.com/apache/airflow/actions/runs/400075097) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang commented on pull request #12803: Add sensors section to describe different modes of sensors

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on pull request #12803:
URL: https://github.com/apache/airflow/pull/12803#issuecomment-738930444


   Also cc Airbnb: @YingboWang @KevinYang21


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang commented on a change in pull request #12803: Add sensors section to describe different modes of sensors

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on a change in pull request #12803:
URL: https://github.com/apache/airflow/pull/12803#discussion_r536273111



##########
File path: docs/apache-airflow/concepts.rst
##########
@@ -448,7 +447,81 @@ Operators are only loaded by Airflow if they are assigned to a DAG.
 
 .. seealso::
     - :ref:`List Airflow operators <pythonapi:operators>`
-    - :doc:`How-to guides for some Airflow operators<howto/operator/index>`.
+    - :doc:`How-to guides for some Airflow operators<howto/operator/index>`
+
+.. _concepts:sensors:
+
+Sensors
+-------
+
+``Sensor`` is an Operator that waits (polls) for a certain time, file, database row, S3 key, , another DAG/task, etc...
+
+There are currently 3 different modes for how a sensor operates:
+
++--------------------+-----------------------+-----------------------+
+| Schedule Mode      | Description           | Use case              |
++====================+=======================+=======================+
+| ``poke`` (default) | The sensor is taking  | Use this mode if the  |
+|                    | up a worker slot for  | expected runtime of   |
+|                    | its whole execution   | the sensor is short   |
+|                    | time and sleeps       | or if a short poke    |
+|                    | between pokes.        | interval is required. |
+|                    |                       | Note that the sensor  |
+|                    |                       | will hold onto a      |
+|                    |                       | worker slot and a     |
+|                    |                       | pool slot for the     |
+|                    |                       | duration of the       |
+|                    |                       | sensor's runtime in   |
+|                    |                       | this mode.            |
++--------------------+-----------------------+-----------------------+
+| ``reschedule``     | The sensor task frees | Use this mode if the  |
+|                    | the worker slot when  | time before the       |
+|                    | the criteria is not   | criteria is met is    |
+|                    | yet met and it's      | expected to be quite  |
+|                    | rescheduled at a      | long. The poke        |
+|                    | later time.           | interval should be    |
+|                    |                       | more than one minute  |
+|                    |                       | to prevent too much   |
+|                    |                       | load on the           |
+|                    |                       | scheduler.            |
++--------------------+-----------------------+-----------------------+
+| ``smart sensor``   | smart sensor is a     | Use this mode if you  |
+|                    | service (run by a     | have a large amount   |
+|                    | builtin DAG) which    | of sensor tasks       |
+|                    | consolidate the       | running in your       |
+|                    | execution of sensors  | airflow cluster. This |
+|                    | in batches. Instead   | can largely reduce    |
+|                    | of holding a long     | airflow’s             |
+|                    | running process for   | infrastructure cost   |
+|                    | each sensor and       | and improve cluster   |
+|                    | poking periodically,  | stability - reduce    |
+|                    | a sensor will only    | meta database load.   |
+|                    | store poke context at |                       |
+|                    | ``sensor_instance``   |                       |
+|                    | table and then exits  |                       |
+|                    | with a 'sensing'      |                       |
+|                    | state.                |                       |
++--------------------+-----------------------+-----------------------+

Review comment:
       For this table, it's shown with a scrollbar horizontally and it's a bit long. Is there a way to wrap the text inside the cell instead?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang commented on a change in pull request #12803: Add sensors section to describe different modes of sensors

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on a change in pull request #12803:
URL: https://github.com/apache/airflow/pull/12803#discussion_r536280091



##########
File path: docs/apache-airflow/concepts.rst
##########
@@ -448,7 +447,81 @@ Operators are only loaded by Airflow if they are assigned to a DAG.
 
 .. seealso::
     - :ref:`List Airflow operators <pythonapi:operators>`
-    - :doc:`How-to guides for some Airflow operators<howto/operator/index>`.
+    - :doc:`How-to guides for some Airflow operators<howto/operator/index>`
+
+.. _concepts:sensors:
+
+Sensors
+-------
+
+``Sensor`` is an Operator that waits (polls) for a certain time, file, database row, S3 key, , another DAG/task, etc...
+
+There are currently 3 different modes for how a sensor operates:
+
++--------------------+-----------------------+-----------------------+
+| Schedule Mode      | Description           | Use case              |
++====================+=======================+=======================+
+| ``poke`` (default) | The sensor is taking  | Use this mode if the  |
+|                    | up a worker slot for  | expected runtime of   |
+|                    | its whole execution   | the sensor is short   |
+|                    | time and sleeps       | or if a short poke    |
+|                    | between pokes.        | interval is required. |
+|                    |                       | Note that the sensor  |
+|                    |                       | will hold onto a      |
+|                    |                       | worker slot and a     |
+|                    |                       | pool slot for the     |
+|                    |                       | duration of the       |
+|                    |                       | sensor's runtime in   |
+|                    |                       | this mode.            |
++--------------------+-----------------------+-----------------------+
+| ``reschedule``     | The sensor task frees | Use this mode if the  |
+|                    | the worker slot when  | time before the       |
+|                    | the criteria is not   | criteria is met is    |
+|                    | yet met and it's      | expected to be quite  |
+|                    | rescheduled at a      | long. The poke        |
+|                    | later time.           | interval should be    |
+|                    |                       | more than one minute  |
+|                    |                       | to prevent too much   |
+|                    |                       | load on the           |
+|                    |                       | scheduler.            |
++--------------------+-----------------------+-----------------------+
+| ``smart sensor``   | smart sensor is a     | Use this mode if you  |
+|                    | service (run by a     | have a large amount   |
+|                    | builtin DAG) which    | of sensor tasks       |
+|                    | consolidate the       | running in your       |
+|                    | execution of sensors  | airflow cluster. This |
+|                    | in batches. Instead   | can largely reduce    |
+|                    | of holding a long     | airflow’s             |
+|                    | running process for   | infrastructure cost   |
+|                    | each sensor and       | and improve cluster   |
+|                    | poking periodically,  | stability - reduce    |
+|                    | a sensor will only    | meta database load.   |
+|                    | store poke context at |                       |
+|                    | ``sensor_instance``   |                       |
+|                    | table and then exits  |                       |
+|                    | with a 'sensing'      |                       |
+|                    | state.                |                       |
++--------------------+-----------------------+-----------------------+

Review comment:
       hmm... if that's the case, I will remove the table later 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil merged pull request #12803: Add sensors section to describe different modes of sensors

Posted by GitBox <gi...@apache.org>.
kaxil merged pull request #12803:
URL: https://github.com/apache/airflow/pull/12803


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #12803: Add sensors section to describe different modes of sensors

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #12803:
URL: https://github.com/apache/airflow/pull/12803#issuecomment-738925951


   /cc @jhtimmins Since you were complaining a bit about this too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org