You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/28 10:25:08 UTC

[GitHub] [airflow] mhenc opened a new pull request #21181: Movie Zombie detection to SchedulerJob

mhenc opened a new pull request #21181:
URL: https://github.com/apache/airflow/pull/21181


   Moves zombie detection method from DagProcessor to SchedulerJob.
   
   More context:
   https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-43+DAG+Processor+separation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21181: Movie Zombie detection to SchedulerJob

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21181:
URL: https://github.com/apache/airflow/pull/21181#issuecomment-1028843540


   > If you believe it would be safer to keep 10s then we may use it. Although it sounds little to often for me. Nonetheless it's now a configuration option, so users are able to set it to the value they like.
   
   I think it's ok to change (I see no serious effects of such change) but note in UPDATING.md will be needed. @ashb  - was there a rationale behind the 10s initially ? Or was it just arbitrary chosen without much of a "reasoning" ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #21181: Movie Zombie detection to SchedulerJob

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #21181:
URL: https://github.com/apache/airflow/pull/21181#discussion_r798426655



##########
File path: airflow/models/dagbag.py
##########
@@ -89,7 +89,6 @@ class DagBag(LoggingMixin):
     """
 
     DAGBAG_IMPORT_TIMEOUT = conf.getfloat('core', 'DAGBAG_IMPORT_TIMEOUT')
-    SCHEDULER_ZOMBIE_TASK_THRESHOLD = conf.getint('scheduler', 'scheduler_zombie_task_threshold')

Review comment:
       I would be for removing it. There is one other place (local_task_job) where the threshold is used (but directly) so I think this is a leftover from separating this on out. I do not imagine any use of the constant by the users :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mhenc commented on a change in pull request #21181: Movie Zombie detection to SchedulerJob

Posted by GitBox <gi...@apache.org>.
mhenc commented on a change in pull request #21181:
URL: https://github.com/apache/airflow/pull/21181#discussion_r798332409



##########
File path: airflow/models/dagbag.py
##########
@@ -89,7 +89,6 @@ class DagBag(LoggingMixin):
     """
 
     DAGBAG_IMPORT_TIMEOUT = conf.getfloat('core', 'DAGBAG_IMPORT_TIMEOUT')
-    SCHEDULER_ZOMBIE_TASK_THRESHOLD = conf.getint('scheduler', 'scheduler_zombie_task_threshold')

Review comment:
       Right, I didn't find any reference to that in the codebase, so I just removed as unused code.
   
   But if you believe it should be left there, then of course I can revert this change




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #21181: Movie Zombie detection to SchedulerJob

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #21181:
URL: https://github.com/apache/airflow/pull/21181#discussion_r798334434



##########
File path: airflow/models/dagbag.py
##########
@@ -89,7 +89,6 @@ class DagBag(LoggingMixin):
     """
 
     DAGBAG_IMPORT_TIMEOUT = conf.getfloat('core', 'DAGBAG_IMPORT_TIMEOUT')
-    SCHEDULER_ZOMBIE_TASK_THRESHOLD = conf.getint('scheduler', 'scheduler_zombie_task_threshold')

Review comment:
       Let’s revert to be safe (and add a comment saying this is not actually used anywhere and can be removed in Airflow 3).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21181: Movie Zombie detection to SchedulerJob

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21181:
URL: https://github.com/apache/airflow/pull/21181#issuecomment-1028852105


   TIL! thanks @ashb !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a change in pull request #21181: Move Zombie detection to SchedulerJob

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #21181:
URL: https://github.com/apache/airflow/pull/21181#discussion_r802971738



##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -1259,3 +1267,39 @@ def check_trigger_timeouts(self, session: Session = None):
         )
         if num_timed_out_tasks:
             self.log.info("Timed out %i deferred tasks without fired triggers", num_timed_out_tasks)
+
+    @provide_session
+    def _find_zombies(self, session):
+        """
+        Find zombie task instances, which are tasks haven't heartbeated for too long
+        and update the current zombie list.
+        """
+        self.log.info("Finding 'running' jobs without a recent heartbeat")

Review comment:
       Can we change this to debug or remove it entirely because seeing this every 10 seconds seems much 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mhenc commented on a change in pull request #21181: Movie Zombie detection to SchedulerJob

Posted by GitBox <gi...@apache.org>.
mhenc commented on a change in pull request #21181:
URL: https://github.com/apache/airflow/pull/21181#discussion_r799242418



##########
File path: tests/jobs/test_scheduler_job.py
##########
@@ -75,8 +76,7 @@
 )
 PERF_DAGS_FOLDER = os.path.join(ROOT_FOLDER, "tests", "test_utils", "perf", "dags")
 ELASTIC_DAG_FILE = os.path.join(PERF_DAGS_FOLDER, "elastic_dag.py")
-
-TEST_DAG_FOLDER = os.environ['AIRFLOW__CORE__DAGS_FOLDER']
+TEST_DAG_FOLDER = '/airflow/github/dags'

Review comment:
       Good catch. It's just some leftover from my tests.
   Reverted.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #21181: Movie Zombie detection to SchedulerJob

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #21181:
URL: https://github.com/apache/airflow/pull/21181#issuecomment-1028714402


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #21181: Movie Zombie detection to SchedulerJob

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #21181:
URL: https://github.com/apache/airflow/pull/21181#discussion_r798314492



##########
File path: airflow/models/dagbag.py
##########
@@ -89,7 +89,6 @@ class DagBag(LoggingMixin):
     """
 
     DAGBAG_IMPORT_TIMEOUT = conf.getfloat('core', 'DAGBAG_IMPORT_TIMEOUT')
-    SCHEDULER_ZOMBIE_TASK_THRESHOLD = conf.getint('scheduler', 'scheduler_zombie_task_threshold')

Review comment:
       Hmm, I wonder if we can remove this. It’s not used anywhere (even before this PR) and is probably kept for backward compatibility?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #21181: Movie Zombie detection to SchedulerJob

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #21181:
URL: https://github.com/apache/airflow/pull/21181#issuecomment-1028846919


   No idea the source of the 10s -- it was from 2018 in #3873


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #21181: Move Zombie detection to SchedulerJob

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #21181:
URL: https://github.com/apache/airflow/pull/21181


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mhenc commented on a change in pull request #21181: Move Zombie detection to SchedulerJob

Posted by GitBox <gi...@apache.org>.
mhenc commented on a change in pull request #21181:
URL: https://github.com/apache/airflow/pull/21181#discussion_r803386232



##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -1259,3 +1267,39 @@ def check_trigger_timeouts(self, session: Session = None):
         )
         if num_timed_out_tasks:
             self.log.info("Timed out %i deferred tasks without fired triggers", num_timed_out_tasks)
+
+    @provide_session
+    def _find_zombies(self, session):
+        """
+        Find zombie task instances, which are tasks haven't heartbeated for too long
+        and update the current zombie list.
+        """
+        self.log.info("Finding 'running' jobs without a recent heartbeat")

Review comment:
       Sure, changed to debug.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #21181: Movie Zombie detection to SchedulerJob

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #21181:
URL: https://github.com/apache/airflow/pull/21181#issuecomment-1028714906


   Logic makes sense to me although I don’t have much real-world experience to comment on the configuration issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #21181: Movie Zombie detection to SchedulerJob

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #21181:
URL: https://github.com/apache/airflow/pull/21181#discussion_r798426655



##########
File path: airflow/models/dagbag.py
##########
@@ -89,7 +89,6 @@ class DagBag(LoggingMixin):
     """
 
     DAGBAG_IMPORT_TIMEOUT = conf.getfloat('core', 'DAGBAG_IMPORT_TIMEOUT')
-    SCHEDULER_ZOMBIE_TASK_THRESHOLD = conf.getint('scheduler', 'scheduler_zombie_task_threshold')

Review comment:
       I would be for removing it. There is one other place (local_task_job) where the threshold is used (but directly) so I think this is a leftover from separating this on out. I do not imagine any use of the constant :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #21181: Movie Zombie detection to SchedulerJob

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #21181:
URL: https://github.com/apache/airflow/pull/21181#issuecomment-1028850063


   If you aren't familiar with "git log pickaxe" it's amazing for things like this http://www.philandstuff.com/2014/02/09/git-pickaxe.html
   
   ```
   airflow ❯ git log -S _zombie_query_interval --oneline 
   897960736 Revert "[AIRFLOW-4797] Improve performance and behaviour of zombie detection (#5511)"
   2bdb053db [AIRFLOW-4797] Improve performance and behaviour of zombie detection (#5511)
   75e2288a3 [Airflow-2760] Decouple DAG parsing loop from scheduler loop (#3873)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21181: Movie Zombie detection to SchedulerJob

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21181:
URL: https://github.com/apache/airflow/pull/21181#issuecomment-1028848087


   > No idea the source of the 10s -- it was from 2018 in #3873
   
   Ah  good one. I should have checked before! @KevinYang21  - maybe you remember / have some insights if the 10s were chosen for a good reason?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #21181: Movie Zombie detection to SchedulerJob

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #21181:
URL: https://github.com/apache/airflow/pull/21181#issuecomment-1026750121


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a change in pull request #21181: Movie Zombie detection to SchedulerJob

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #21181:
URL: https://github.com/apache/airflow/pull/21181#discussion_r799230720



##########
File path: tests/jobs/test_scheduler_job.py
##########
@@ -75,8 +76,7 @@
 )
 PERF_DAGS_FOLDER = os.path.join(ROOT_FOLDER, "tests", "test_utils", "perf", "dags")
 ELASTIC_DAG_FILE = os.path.join(PERF_DAGS_FOLDER, "elastic_dag.py")
-
-TEST_DAG_FOLDER = os.environ['AIRFLOW__CORE__DAGS_FOLDER']
+TEST_DAG_FOLDER = '/airflow/github/dags'

Review comment:
       Why is this change?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #21181: Movie Zombie detection to SchedulerJob

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #21181:
URL: https://github.com/apache/airflow/pull/21181#discussion_r798426655



##########
File path: airflow/models/dagbag.py
##########
@@ -89,7 +89,6 @@ class DagBag(LoggingMixin):
     """
 
     DAGBAG_IMPORT_TIMEOUT = conf.getfloat('core', 'DAGBAG_IMPORT_TIMEOUT')
-    SCHEDULER_ZOMBIE_TASK_THRESHOLD = conf.getint('scheduler', 'scheduler_zombie_task_threshold')

Review comment:
       I would be for removing it. There is one other place (local_task_job) where the threshold is used (but directly) so I think this is a leftover from separating this on out. I do not imagine any use of the constant by the users :). Anyone who wants this value should do `conf.getint`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mhenc commented on pull request #21181: Movie Zombie detection to SchedulerJob

Posted by GitBox <gi...@apache.org>.
mhenc commented on pull request #21181:
URL: https://github.com/apache/airflow/pull/21181#issuecomment-1028735736


   If you believe it would be safer to keep 10s then we may use it. Although it sounds little to often for me.
   Nonetheless it's now a configuration option, so users are able to set it to the value they like.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mhenc commented on pull request #21181: Movie Zombie detection to SchedulerJob

Posted by GitBox <gi...@apache.org>.
mhenc commented on pull request #21181:
URL: https://github.com/apache/airflow/pull/21181#issuecomment-1029745893


   I reverted the interval time to 10 seconds to make it fully backward compatible.
   As it's configuration option now, then user may decide to bump up it to save some CPU/DB resources.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mhenc commented on pull request #21181: Movie Zombie detection to SchedulerJob

Posted by GitBox <gi...@apache.org>.
mhenc commented on pull request #21181:
URL: https://github.com/apache/airflow/pull/21181#issuecomment-1024144658


   cc: @potiuk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org