You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/05 23:00:41 UTC

[GitHub] [airflow] dstandish opened a new pull request #20699: Trigger respawn xfail

dstandish opened a new pull request #20699:
URL: https://github.com/apache/airflow/pull/20699


   closes: #18392
   
   alternative to #19546
   
   Currently triggers may be queued up as soon as they leave the "running" set `triggers`.  This results in duplicate triggers when a trigger has been removed from this set but not yet deleted from the database.  We resolve by checking against not just running triggers but all triggers known to the triggerer instance.
   
   Previously (#19546) I approached this in a different way.  But as I worked out how to reproduce the behavior in a test, I obtained a better understanding of the conditions, and realized that there was a simpler way.  
   
   Unioning all those sets is maybe a bit ugly, but it's better than a query, I think.
   
   cc @andrewgodwin 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #20699: Fix duplicate trigger creation race condition

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #20699:
URL: https://github.com/apache/airflow/pull/20699#discussion_r779646540



##########
File path: airflow/jobs/triggerer_job.py
##########
@@ -380,10 +380,16 @@ def update_triggers(self, requested_trigger_ids: Set[int]):
         # line's execution, but we consider that safe, since there's a strict
         # add -> remove -> never again lifecycle this function is already
         # handling.
-        current_trigger_ids = set(self.triggers.keys())
+        running_trigger_ids = set(self.triggers.keys())
+        known_trigger_ids = (
+            running_trigger_ids.union({x[0] for x in self.events})
+            .union(self.to_cancel)
+            .union({x[0] for x in self.to_create})
+            .union(self.failed_triggers)
+        )

Review comment:
       re
   
   > .union(x[0] for x in self.to_create)
   
   that's pretty cool, i don't recall ever noticing that you can do that that.   never occurred to me that the "comprehension" part emits a generator and is distinct from the "list" part.
   
   but it's interesting, there is  still something "magical" that happens when you put the generator expression in the list (or braces)
   
   maybe a bit surprising you can't do the below, given how it seems mosly the same as `a if x else b`
   
   ```
   a = x for x in [1,2,3]
   ```
   
   but in any case this is allowed
   
   ```
   a = (x for x in [1,2,3]) # generator
   [a] # list of generator
   ```
   
   but of course is not the same as this
   
   ```
   [x for x in [1,2,3]] # list of int
   ```
   
   anyway, cleaner this way -- thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #20699: Fix duplicate trigger creation race condition

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #20699:
URL: https://github.com/apache/airflow/pull/20699#discussion_r779249007



##########
File path: airflow/jobs/triggerer_job.py
##########
@@ -380,10 +380,16 @@ def update_triggers(self, requested_trigger_ids: Set[int]):
         # line's execution, but we consider that safe, since there's a strict
         # add -> remove -> never again lifecycle this function is already
         # handling.
-        current_trigger_ids = set(self.triggers.keys())
+        running_trigger_ids = set(self.triggers.keys())
+        known_trigger_ids = (
+            running_trigger_ids.union({x[0] for x in self.events})
+            .union(self.to_cancel)
+            .union({x[0] for x in self.to_create})
+            .union(self.failed_triggers)
+        )

Review comment:
       ```suggestion
           known_trigger_ids = (
               running_trigger_ids.union(x[0] for x in self.events)
               .union(self.to_cancel)
               .union(x[0] for x in self.to_create)
               .union(self.failed_triggers)
           )
   ```
   
   The thing I always like `union()` `difference()` etc. over `|` `-` etc. is the former does not require the rho to be a set, just any iterable.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #20699: Fix duplicate trigger creation race condition

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #20699:
URL: https://github.com/apache/airflow/pull/20699#issuecomment-1006872316


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish merged pull request #20699: Fix duplicate trigger creation race condition

Posted by GitBox <gi...@apache.org>.
dstandish merged pull request #20699:
URL: https://github.com/apache/airflow/pull/20699


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #20699: Fix duplicate trigger creation race condition

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #20699:
URL: https://github.com/apache/airflow/pull/20699#issuecomment-1006897770


   just tweaked docstrings a bit, and renamed the test `test_trigger_create_race_condition_18392` instead of `test_trigger_bad_respawn`
   
   18392 is the github issue where the condition is documented (and i mention this in the docstring)
   
   and i use pytests fixture `tmp_path` instead of manually managing temp file
   
   thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org