You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/12/08 19:37:31 UTC

[GitHub] [airflow] dstandish opened a new pull request #20150: Deferrable operators doc clarification

dstandish opened a new pull request #20150:
URL: https://github.com/apache/airflow/pull/20150


   The language "when two tasks defer based on the same trigger" is a bit confusing. Many tasks can reuse the same trigger class.  But two tasks can't defer using the same trigger _instance_. I think what's important to call out here, and what I try to make clearer, is that the exact same instance of the trigger may have multiple copies of itself running.
   
   Additionally I clarify cleanup is not _only_ called "when this happens" (that is, when trigger is "suddenly removed"), but called every time the trigger instance exits, no matter the reason.
   
   cc @andrewgodwin 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil merged pull request #20150: Deferrable operators doc clarification

Posted by GitBox <gi...@apache.org>.
kaxil merged pull request #20150:
URL: https://github.com/apache/airflow/pull/20150


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #20150: Deferrable operators doc clarification

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #20150:
URL: https://github.com/apache/airflow/pull/20150#discussion_r765219354



##########
File path: docs/apache-airflow/concepts/deferring.rst
##########
@@ -109,13 +109,13 @@ There's also some design constraints to be aware of:
 
 * The ``run`` method *must be asynchronous* (using Python's asyncio), and correctly ``await`` whenever it does a blocking operation.
 * ``run`` must ``yield`` its TriggerEvents, not return them. If it returns before yielding at least one event, Airflow will consider this an error and fail any Task Instances waiting on it. If it throws an exception, Airflow will also fail any dependent task instances.
-* A Trigger *must be able to run in parallel* with other copies of itself. This can happen both when two tasks defer based on the same trigger, and also if a network partition happens and Airflow re-launches a trigger on a separated machine.
+* You must assume that duplicates of your trigger instance may run simultaneously.  This can happen if a network partition happens and Airflow re-launches a trigger on a separated machine.

Review comment:
       ```suggestion
   * You must assume that duplicates of your trigger instance may run simultaneously. This can happen if a network partition happens and Airflow re-launches a trigger on a separated machine.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #20150: Deferrable operators doc clarification

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #20150:
URL: https://github.com/apache/airflow/pull/20150#discussion_r765229343



##########
File path: docs/apache-airflow/concepts/deferring.rst
##########
@@ -109,13 +109,13 @@ There's also some design constraints to be aware of:
 
 * The ``run`` method *must be asynchronous* (using Python's asyncio), and correctly ``await`` whenever it does a blocking operation.
 * ``run`` must ``yield`` its TriggerEvents, not return them. If it returns before yielding at least one event, Airflow will consider this an error and fail any Task Instances waiting on it. If it throws an exception, Airflow will also fail any dependent task instances.
-* A Trigger *must be able to run in parallel* with other copies of itself. This can happen both when two tasks defer based on the same trigger, and also if a network partition happens and Airflow re-launches a trigger on a separated machine.
+* You must assume that duplicates of your trigger instance may run simultaneously.  This can happen if a network partition happens and Airflow re-launches a trigger on a separated machine.

Review comment:
       How about:
   
   > You must assume that your trigger instance may run more than once (this can happen if a network partition happens and Airflow re-launches a trigger on a separated machine).  So you must be mindful about side effects.  E.g. you might not want to use a trigger to insert database rows.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #20150: Deferrable operators doc clarification

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #20150:
URL: https://github.com/apache/airflow/pull/20150#discussion_r765221046



##########
File path: docs/apache-airflow/concepts/deferring.rst
##########
@@ -109,13 +109,13 @@ There's also some design constraints to be aware of:
 
 * The ``run`` method *must be asynchronous* (using Python's asyncio), and correctly ``await`` whenever it does a blocking operation.
 * ``run`` must ``yield`` its TriggerEvents, not return them. If it returns before yielding at least one event, Airflow will consider this an error and fail any Task Instances waiting on it. If it throws an exception, Airflow will also fail any dependent task instances.
-* A Trigger *must be able to run in parallel* with other copies of itself. This can happen both when two tasks defer based on the same trigger, and also if a network partition happens and Airflow re-launches a trigger on a separated machine.
+* You must assume that duplicates of your trigger instance may run simultaneously.  This can happen if a network partition happens and Airflow re-launches a trigger on a separated machine.
 * When events are emitted, and if your trigger is designed to emit more than one event, they *must* contain a payload that can be used to deduplicate events if the trigger is being run in multiple places. If you only fire one event, and don't want to pass information in the payload back to the Operator that deferred, you can just set the payload to ``None``.
-* A trigger may be suddenly removed from one process and started on a new one (if partitions are being changed, or a deployment is happening). You may provide an optional ``cleanup`` method that gets called when this happens.
+* A trigger may be suddenly removed from one triggerer service and started on a new one (e.g. if network partitions are being changed, or a deployment is happening). If desired you may implement ``cleanup`` method that is always called after ``run`` whether the trigger exits cleanly or otherwise.

Review comment:
       ```suggestion
   * A trigger may be suddenly removed from one triggerer service and started on a new one (e.g. if network partitions (Subnets for example) are being changed, or a deployment is happening). If desired you may implement ``cleanup`` method that is always called after ``run`` whether the trigger exits cleanly or otherwise.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #20150: Deferrable operators doc clarification

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #20150:
URL: https://github.com/apache/airflow/pull/20150#discussion_r765229343



##########
File path: docs/apache-airflow/concepts/deferring.rst
##########
@@ -109,13 +109,13 @@ There's also some design constraints to be aware of:
 
 * The ``run`` method *must be asynchronous* (using Python's asyncio), and correctly ``await`` whenever it does a blocking operation.
 * ``run`` must ``yield`` its TriggerEvents, not return them. If it returns before yielding at least one event, Airflow will consider this an error and fail any Task Instances waiting on it. If it throws an exception, Airflow will also fail any dependent task instances.
-* A Trigger *must be able to run in parallel* with other copies of itself. This can happen both when two tasks defer based on the same trigger, and also if a network partition happens and Airflow re-launches a trigger on a separated machine.
+* You must assume that duplicates of your trigger instance may run simultaneously.  This can happen if a network partition happens and Airflow re-launches a trigger on a separated machine.

Review comment:
       How about:
   
   > You must assume that your trigger instance may run more than once (this can happen if a network partition occurs and Airflow re-launches a trigger on a separated machine).  So you must be mindful about side effects.  E.g. you might not want to use a trigger to insert database rows.
   
   I think the "run more than once" makes it more obvious.  That's really the issue, more so than it running "in parallel" or "simultaneously" etc.
   
   lmkwyt
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #20150: Deferrable operators doc clarification

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #20150:
URL: https://github.com/apache/airflow/pull/20150#discussion_r765229343



##########
File path: docs/apache-airflow/concepts/deferring.rst
##########
@@ -109,13 +109,13 @@ There's also some design constraints to be aware of:
 
 * The ``run`` method *must be asynchronous* (using Python's asyncio), and correctly ``await`` whenever it does a blocking operation.
 * ``run`` must ``yield`` its TriggerEvents, not return them. If it returns before yielding at least one event, Airflow will consider this an error and fail any Task Instances waiting on it. If it throws an exception, Airflow will also fail any dependent task instances.
-* A Trigger *must be able to run in parallel* with other copies of itself. This can happen both when two tasks defer based on the same trigger, and also if a network partition happens and Airflow re-launches a trigger on a separated machine.
+* You must assume that duplicates of your trigger instance may run simultaneously.  This can happen if a network partition happens and Airflow re-launches a trigger on a separated machine.

Review comment:
       How about:
   
   > You must assume that your trigger instance may run more than once (this can happen if a network partition occurs and Airflow re-launches a trigger on a separated machine).  So you must be mindful about side effects.  E.g. you might not want to use a trigger to insert database rows.
   
   I think the "run more than once" makes it more obvious.  That's really the issue, more so than the question of timing i.e. running "in parallel" or "simultaneously" etc.
   
   lmkwyt
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #20150: Deferrable operators doc clarification

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #20150:
URL: https://github.com/apache/airflow/pull/20150#discussion_r765229343



##########
File path: docs/apache-airflow/concepts/deferring.rst
##########
@@ -109,13 +109,13 @@ There's also some design constraints to be aware of:
 
 * The ``run`` method *must be asynchronous* (using Python's asyncio), and correctly ``await`` whenever it does a blocking operation.
 * ``run`` must ``yield`` its TriggerEvents, not return them. If it returns before yielding at least one event, Airflow will consider this an error and fail any Task Instances waiting on it. If it throws an exception, Airflow will also fail any dependent task instances.
-* A Trigger *must be able to run in parallel* with other copies of itself. This can happen both when two tasks defer based on the same trigger, and also if a network partition happens and Airflow re-launches a trigger on a separated machine.
+* You must assume that duplicates of your trigger instance may run simultaneously.  This can happen if a network partition happens and Airflow re-launches a trigger on a separated machine.

Review comment:
       ```
   You must assume that your trigger instance may run more than once (this can happen if a network partition happens and Airflow re-launches a trigger on a separated machine).  So you must be mindful about side effects.  E.g. you might not want to use a trigger to insert database rows.
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #20150: Deferrable operators doc clarification

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #20150:
URL: https://github.com/apache/airflow/pull/20150#discussion_r765224398



##########
File path: docs/apache-airflow/concepts/deferring.rst
##########
@@ -109,13 +109,13 @@ There's also some design constraints to be aware of:
 
 * The ``run`` method *must be asynchronous* (using Python's asyncio), and correctly ``await`` whenever it does a blocking operation.
 * ``run`` must ``yield`` its TriggerEvents, not return them. If it returns before yielding at least one event, Airflow will consider this an error and fail any Task Instances waiting on it. If it throws an exception, Airflow will also fail any dependent task instances.
-* A Trigger *must be able to run in parallel* with other copies of itself. This can happen both when two tasks defer based on the same trigger, and also if a network partition happens and Airflow re-launches a trigger on a separated machine.
+* You must assume that duplicates of your trigger instance may run simultaneously.  This can happen if a network partition happens and Airflow re-launches a trigger on a separated machine.

Review comment:
       Which aspect isn't clear? I can try to clarify the wording




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #20150: Deferrable operators doc clarification

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #20150:
URL: https://github.com/apache/airflow/pull/20150#issuecomment-989214922


   > Agreed - just a couple of further requested changes on your changes!
   
   excellent  thank you @andrewgodwin 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #20150: Deferrable operators doc clarification

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #20150:
URL: https://github.com/apache/airflow/pull/20150#discussion_r765229343



##########
File path: docs/apache-airflow/concepts/deferring.rst
##########
@@ -109,13 +109,13 @@ There's also some design constraints to be aware of:
 
 * The ``run`` method *must be asynchronous* (using Python's asyncio), and correctly ``await`` whenever it does a blocking operation.
 * ``run`` must ``yield`` its TriggerEvents, not return them. If it returns before yielding at least one event, Airflow will consider this an error and fail any Task Instances waiting on it. If it throws an exception, Airflow will also fail any dependent task instances.
-* A Trigger *must be able to run in parallel* with other copies of itself. This can happen both when two tasks defer based on the same trigger, and also if a network partition happens and Airflow re-launches a trigger on a separated machine.
+* You must assume that duplicates of your trigger instance may run simultaneously.  This can happen if a network partition happens and Airflow re-launches a trigger on a separated machine.

Review comment:
       How about:
   
   > You must assume that your trigger instance may run more than once (this can happen if a network partition happens and Airflow re-launches a trigger on a separated machine).  So you must be mindful about side effects.  E.g. you might not want to use a trigger to insert database rows.
   
   I think the "run more than once" makes it more obvious.  That's really the issue, more so than it running "in parallel" or "simultaneously" etc.
   
   lmkwyt
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #20150: Deferrable operators doc clarification

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #20150:
URL: https://github.com/apache/airflow/pull/20150#discussion_r765229343



##########
File path: docs/apache-airflow/concepts/deferring.rst
##########
@@ -109,13 +109,13 @@ There's also some design constraints to be aware of:
 
 * The ``run`` method *must be asynchronous* (using Python's asyncio), and correctly ``await`` whenever it does a blocking operation.
 * ``run`` must ``yield`` its TriggerEvents, not return them. If it returns before yielding at least one event, Airflow will consider this an error and fail any Task Instances waiting on it. If it throws an exception, Airflow will also fail any dependent task instances.
-* A Trigger *must be able to run in parallel* with other copies of itself. This can happen both when two tasks defer based on the same trigger, and also if a network partition happens and Airflow re-launches a trigger on a separated machine.
+* You must assume that duplicates of your trigger instance may run simultaneously.  This can happen if a network partition happens and Airflow re-launches a trigger on a separated machine.

Review comment:
       How about:
   
   > You must assume that your trigger instance could run more than once (this can happen if a network partition happens and Airflow re-launches a trigger on a separated machine).  So you must be mindful about side effects.  E.g. you might not want to use a trigger to insert database rows.
   
   I think the "run more than once" makes it more obvious.  That's really the issue, more so than it running "in parallel" or "simultaneously" etc.
   
   lmkwyt
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #20150: Deferrable operators doc clarification

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #20150:
URL: https://github.com/apache/airflow/pull/20150#discussion_r765233233



##########
File path: docs/apache-airflow/concepts/deferring.rst
##########
@@ -109,13 +109,13 @@ There's also some design constraints to be aware of:
 
 * The ``run`` method *must be asynchronous* (using Python's asyncio), and correctly ``await`` whenever it does a blocking operation.
 * ``run`` must ``yield`` its TriggerEvents, not return them. If it returns before yielding at least one event, Airflow will consider this an error and fail any Task Instances waiting on it. If it throws an exception, Airflow will also fail any dependent task instances.
-* A Trigger *must be able to run in parallel* with other copies of itself. This can happen both when two tasks defer based on the same trigger, and also if a network partition happens and Airflow re-launches a trigger on a separated machine.
+* You must assume that duplicates of your trigger instance may run simultaneously.  This can happen if a network partition happens and Airflow re-launches a trigger on a separated machine.

Review comment:
       Yea that is better




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #20150: Deferrable operators doc clarification

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #20150:
URL: https://github.com/apache/airflow/pull/20150#discussion_r765224398



##########
File path: docs/apache-airflow/concepts/deferring.rst
##########
@@ -109,13 +109,13 @@ There's also some design constraints to be aware of:
 
 * The ``run`` method *must be asynchronous* (using Python's asyncio), and correctly ``await`` whenever it does a blocking operation.
 * ``run`` must ``yield`` its TriggerEvents, not return them. If it returns before yielding at least one event, Airflow will consider this an error and fail any Task Instances waiting on it. If it throws an exception, Airflow will also fail any dependent task instances.
-* A Trigger *must be able to run in parallel* with other copies of itself. This can happen both when two tasks defer based on the same trigger, and also if a network partition happens and Airflow re-launches a trigger on a separated machine.
+* You must assume that duplicates of your trigger instance may run simultaneously.  This can happen if a network partition happens and Airflow re-launches a trigger on a separated machine.

Review comment:
       Which aspect isn't clear? I can try to clarify the wording, e.g. maybe by adding an example

##########
File path: docs/apache-airflow/concepts/deferring.rst
##########
@@ -109,13 +109,13 @@ There's also some design constraints to be aware of:
 
 * The ``run`` method *must be asynchronous* (using Python's asyncio), and correctly ``await`` whenever it does a blocking operation.
 * ``run`` must ``yield`` its TriggerEvents, not return them. If it returns before yielding at least one event, Airflow will consider this an error and fail any Task Instances waiting on it. If it throws an exception, Airflow will also fail any dependent task instances.
-* A Trigger *must be able to run in parallel* with other copies of itself. This can happen both when two tasks defer based on the same trigger, and also if a network partition happens and Airflow re-launches a trigger on a separated machine.
+* You must assume that duplicates of your trigger instance may run simultaneously.  This can happen if a network partition happens and Airflow re-launches a trigger on a separated machine.

Review comment:
       Which aspect isn't clear? I can try to clarify the wording or maybe by adding an example

##########
File path: docs/apache-airflow/concepts/deferring.rst
##########
@@ -109,13 +109,13 @@ There's also some design constraints to be aware of:
 
 * The ``run`` method *must be asynchronous* (using Python's asyncio), and correctly ``await`` whenever it does a blocking operation.
 * ``run`` must ``yield`` its TriggerEvents, not return them. If it returns before yielding at least one event, Airflow will consider this an error and fail any Task Instances waiting on it. If it throws an exception, Airflow will also fail any dependent task instances.
-* A Trigger *must be able to run in parallel* with other copies of itself. This can happen both when two tasks defer based on the same trigger, and also if a network partition happens and Airflow re-launches a trigger on a separated machine.
+* You must assume that duplicates of your trigger instance may run simultaneously.  This can happen if a network partition happens and Airflow re-launches a trigger on a separated machine.

Review comment:
       Which aspect isn't clear? I can try to clarify the wording or maybe add an example




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #20150: Deferrable operators doc clarification

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #20150:
URL: https://github.com/apache/airflow/pull/20150#issuecomment-989182720


   The PR is likely ready to be merged. No tests are needed as no important environment files, nor python files were modified by it. However, committers might decide that full test matrix is needed and add the 'full tests needed' label. Then you should rebase it to the latest main or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #20150: Deferrable operators doc clarification

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #20150:
URL: https://github.com/apache/airflow/pull/20150#discussion_r765219916



##########
File path: docs/apache-airflow/concepts/deferring.rst
##########
@@ -109,13 +109,13 @@ There's also some design constraints to be aware of:
 
 * The ``run`` method *must be asynchronous* (using Python's asyncio), and correctly ``await`` whenever it does a blocking operation.
 * ``run`` must ``yield`` its TriggerEvents, not return them. If it returns before yielding at least one event, Airflow will consider this an error and fail any Task Instances waiting on it. If it throws an exception, Airflow will also fail any dependent task instances.
-* A Trigger *must be able to run in parallel* with other copies of itself. This can happen both when two tasks defer based on the same trigger, and also if a network partition happens and Airflow re-launches a trigger on a separated machine.
+* You must assume that duplicates of your trigger instance may run simultaneously.  This can happen if a network partition happens and Airflow re-launches a trigger on a separated machine.

Review comment:
       It is still not clearer to me tbh -- but I don't have a better suggestion either




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] andrewgodwin commented on a change in pull request #20150: Deferrable operators doc clarification

Posted by GitBox <gi...@apache.org>.
andrewgodwin commented on a change in pull request #20150:
URL: https://github.com/apache/airflow/pull/20150#discussion_r765257051



##########
File path: docs/apache-airflow/concepts/deferring.rst
##########
@@ -109,13 +109,13 @@ There's also some design constraints to be aware of:
 
 * The ``run`` method *must be asynchronous* (using Python's asyncio), and correctly ``await`` whenever it does a blocking operation.
 * ``run`` must ``yield`` its TriggerEvents, not return them. If it returns before yielding at least one event, Airflow will consider this an error and fail any Task Instances waiting on it. If it throws an exception, Airflow will also fail any dependent task instances.
-* A Trigger *must be able to run in parallel* with other copies of itself. This can happen both when two tasks defer based on the same trigger, and also if a network partition happens and Airflow re-launches a trigger on a separated machine.
-* When events are emitted, and if your trigger is designed to emit more than one event, they *must* contain a payload that can be used to deduplicate events if the trigger is being run in multiple places. If you only fire one event, and don't want to pass information in the payload back to the Operator that deferred, you can just set the payload to ``None``.
-* A trigger may be suddenly removed from one process and started on a new one (if partitions are being changed, or a deployment is happening). You may provide an optional ``cleanup`` method that gets called when this happens.
+* You should assume that your trigger instance may run more than once (this can happen if a network partition occurs and Airflow re-launches a trigger on a separated machine). So you must be mindful about side effects. E.g. you might not want to use a trigger to insert database rows.
+* If your trigger is designed to emit more than one event (not currently supported), then each emitted event *must* contain a payload that can be used to deduplicate events if the trigger is being run in multiple places. If you only fire one event and don't need to pass information back to the Operator, you can just set the payload to ``None``.
+* A trigger may be suddenly removed from one triggerer service and started on a new one (e.g. if network partitions (Subnets for example) are being changed, or a deployment is happening). If desired you may implement ``cleanup`` method that is always called after ``run`` whether the trigger exits cleanly or otherwise.

Review comment:
       Nested parentheses should not be needed here - suggested rewording:
   
   For example, if subnets are changed and a network partition results, or a deployment is in progress

##########
File path: docs/apache-airflow/concepts/deferring.rst
##########
@@ -109,13 +109,13 @@ There's also some design constraints to be aware of:
 
 * The ``run`` method *must be asynchronous* (using Python's asyncio), and correctly ``await`` whenever it does a blocking operation.
 * ``run`` must ``yield`` its TriggerEvents, not return them. If it returns before yielding at least one event, Airflow will consider this an error and fail any Task Instances waiting on it. If it throws an exception, Airflow will also fail any dependent task instances.
-* A Trigger *must be able to run in parallel* with other copies of itself. This can happen both when two tasks defer based on the same trigger, and also if a network partition happens and Airflow re-launches a trigger on a separated machine.
-* When events are emitted, and if your trigger is designed to emit more than one event, they *must* contain a payload that can be used to deduplicate events if the trigger is being run in multiple places. If you only fire one event, and don't want to pass information in the payload back to the Operator that deferred, you can just set the payload to ``None``.
-* A trigger may be suddenly removed from one process and started on a new one (if partitions are being changed, or a deployment is happening). You may provide an optional ``cleanup`` method that gets called when this happens.
+* You should assume that your trigger instance may run more than once (this can happen if a network partition occurs and Airflow re-launches a trigger on a separated machine). So you must be mindful about side effects. E.g. you might not want to use a trigger to insert database rows.

Review comment:
       We should replace `e.g.` with `for example`, as I've found not everyone knows what it means.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #20150: Deferrable operators doc clarification

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #20150:
URL: https://github.com/apache/airflow/pull/20150#discussion_r765229343



##########
File path: docs/apache-airflow/concepts/deferring.rst
##########
@@ -109,13 +109,13 @@ There's also some design constraints to be aware of:
 
 * The ``run`` method *must be asynchronous* (using Python's asyncio), and correctly ``await`` whenever it does a blocking operation.
 * ``run`` must ``yield`` its TriggerEvents, not return them. If it returns before yielding at least one event, Airflow will consider this an error and fail any Task Instances waiting on it. If it throws an exception, Airflow will also fail any dependent task instances.
-* A Trigger *must be able to run in parallel* with other copies of itself. This can happen both when two tasks defer based on the same trigger, and also if a network partition happens and Airflow re-launches a trigger on a separated machine.
+* You must assume that duplicates of your trigger instance may run simultaneously.  This can happen if a network partition happens and Airflow re-launches a trigger on a separated machine.

Review comment:
       How about:
   
   > You should assume that your trigger instance may run more than once (this can happen if a network partition occurs and Airflow re-launches a trigger on a separated machine).  So you must be mindful about side effects.  E.g. you might not want to use a trigger to insert database rows.
   
   I think the "run more than once" makes it more obvious.  That's really the issue, more so than the question of timing i.e. running "in parallel" or "simultaneously" etc.
   
   lmkwyt
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org