You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/06/23 14:04:51 UTC

[GitHub] [airflow] turbaszek opened a new pull request #9488: Improve queries number SchedulerJob._process_executor_events

turbaszek opened a new pull request #9488:
URL: https://github.com/apache/airflow/pull/9488


   ---
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Target Github ISSUE in description if exists
   - [x] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9488: Improve queries number SchedulerJob._process_executor_events

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9488:
URL: https://github.com/apache/airflow/pull/9488#discussion_r446853441



##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -1476,28 +1476,32 @@ def _process_executor_events(self, simple_dag_bag, session=None):
         """
         Respond to executor events.
         """
-        # TODO: this shares quite a lot of code with _manage_executor_state
-        for key, value in self.executor.get_event_buffer(simple_dag_bag.dag_ids).items():
+        event_buffer = self.executor.get_event_buffer(simple_dag_bag.dag_ids)
+        tis_with_right_state: List[TaskInstanceKeyType] = []
+
+        # Report execution
+        for key, value in event_buffer.items():
             state, info = value
             dag_id, task_id, execution_date, try_number = key
             self.log.info(
                 "Executor reports execution of %s.%s execution_date=%s "
                 "exited with status %s for try_number %s",
                 dag_id, task_id, execution_date, state, try_number
             )
-            if state not in (State.FAILED, State.SUCCESS):
-                continue
+            if state in (State.FAILED, State.SUCCESS):
+                tis_with_right_state.append(key)
 
-            # Process finished tasks
-            qry = session.query(TI).filter(
-                TI.dag_id == dag_id,
-                TI.task_id == task_id,
-                TI.execution_date == execution_date
-            )
-            ti = qry.first()
-            if not ti:
-                self.log.warning("TaskInstance %s went missing from the database", ti)
-                continue

Review comment:
       ```
                   self.log.warning("<TaskInstance(dag_id=%s, task_id=%s, execution_date=%s)> went missing from the database", dag_id, task_id, execution_date)
   ```
   What do you think to display filter parameters instead of an object when the object does not exist?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #9488: Improve queries number SchedulerJob._process_executor_events

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #9488:
URL: https://github.com/apache/airflow/pull/9488#issuecomment-648819429


   @mik-laj would you mind taking a look at the CI? Does `TestServeLogs.test_should_serve_file` seem flaky?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek merged pull request #9488: Improve queries number SchedulerJob._process_executor_events

Posted by GitBox <gi...@apache.org>.
turbaszek merged pull request #9488:
URL: https://github.com/apache/airflow/pull/9488


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9488: Improve queries number SchedulerJob._process_executor_events

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9488:
URL: https://github.com/apache/airflow/pull/9488#discussion_r447070473



##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -1476,28 +1476,32 @@ def _process_executor_events(self, simple_dag_bag, session=None):
         """
         Respond to executor events.
         """
-        # TODO: this shares quite a lot of code with _manage_executor_state
-        for key, value in self.executor.get_event_buffer(simple_dag_bag.dag_ids).items():
+        event_buffer = self.executor.get_event_buffer(simple_dag_bag.dag_ids)
+        tis_with_right_state: List[TaskInstanceKeyType] = []
+
+        # Report execution
+        for key, value in event_buffer.items():
             state, info = value
             dag_id, task_id, execution_date, try_number = key
             self.log.info(
                 "Executor reports execution of %s.%s execution_date=%s "
                 "exited with status %s for try_number %s",
                 dag_id, task_id, execution_date, state, try_number
             )
-            if state not in (State.FAILED, State.SUCCESS):
-                continue
+            if state in (State.FAILED, State.SUCCESS):
+                tis_with_right_state.append(key)
 
-            # Process finished tasks
-            qry = session.query(TI).filter(
-                TI.dag_id == dag_id,
-                TI.task_id == task_id,
-                TI.execution_date == execution_date
-            )
-            ti = qry.first()
-            if not ti:
-                self.log.warning("TaskInstance %s went missing from the database", ti)
-                continue

Review comment:
       Let's remove it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #9488: Improve queries number SchedulerJob._process_executor_events

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #9488:
URL: https://github.com/apache/airflow/pull/9488#discussion_r444818015



##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -1476,28 +1476,32 @@ def _process_executor_events(self, simple_dag_bag, session=None):
         """
         Respond to executor events.
         """
-        # TODO: this shares quite a lot of code with _manage_executor_state
-        for key, value in self.executor.get_event_buffer(simple_dag_bag.dag_ids).items():
+        event_buffer = self.executor.get_event_buffer(simple_dag_bag.dag_ids)
+        tis_with_right_state: List[TaskInstanceKeyType] = []
+
+        # Report execution
+        for key, value in event_buffer.items():
             state, info = value
             dag_id, task_id, execution_date, try_number = key
             self.log.info(
                 "Executor reports execution of %s.%s execution_date=%s "
                 "exited with status %s for try_number %s",
                 dag_id, task_id, execution_date, state, try_number
             )
-            if state not in (State.FAILED, State.SUCCESS):
-                continue
+            if state in (State.FAILED, State.SUCCESS):
+                tis_with_right_state.append(key)
 
-            # Process finished tasks
-            qry = session.query(TI).filter(
-                TI.dag_id == dag_id,
-                TI.task_id == task_id,
-                TI.execution_date == execution_date
-            )
-            ti = qry.first()
-            if not ti:
-                self.log.warning("TaskInstance %s went missing from the database", ti)
-                continue

Review comment:
       This log message is useless as the ti is `None` so I'm not sure if we have to keep it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #9488: Improve queries number SchedulerJob._process_executor_events

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #9488:
URL: https://github.com/apache/airflow/pull/9488#discussion_r446855765



##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -1476,28 +1476,32 @@ def _process_executor_events(self, simple_dag_bag, session=None):
         """
         Respond to executor events.
         """
-        # TODO: this shares quite a lot of code with _manage_executor_state
-        for key, value in self.executor.get_event_buffer(simple_dag_bag.dag_ids).items():
+        event_buffer = self.executor.get_event_buffer(simple_dag_bag.dag_ids)
+        tis_with_right_state: List[TaskInstanceKeyType] = []
+
+        # Report execution
+        for key, value in event_buffer.items():
             state, info = value
             dag_id, task_id, execution_date, try_number = key
             self.log.info(
                 "Executor reports execution of %s.%s execution_date=%s "
                 "exited with status %s for try_number %s",
                 dag_id, task_id, execution_date, state, try_number
             )
-            if state not in (State.FAILED, State.SUCCESS):
-                continue
+            if state in (State.FAILED, State.SUCCESS):
+                tis_with_right_state.append(key)
 
-            # Process finished tasks
-            qry = session.query(TI).filter(
-                TI.dag_id == dag_id,
-                TI.task_id == task_id,
-                TI.execution_date == execution_date
-            )
-            ti = qry.first()
-            if not ti:
-                self.log.warning("TaskInstance %s went missing from the database", ti)
-                continue

Review comment:
       That's what I currently do:
   ```
   self.log.warning("TaskInstance %s.%s went missing from the database", dag_id, task_id)
   ```
   However I'm not sure if this log is necessary (now L1522-L1526). The message was meaningless so I would assume that we can remove it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org