You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/09 09:54:16 UTC

[GitHub] [airflow] ashb opened a new pull request #11372: Reduce "start-up" time for tasks in CeleryExecutor

ashb opened a new pull request #11372:
URL: https://github.com/apache/airflow/pull/11372


   This is similar to #11327, but for Celery this time.
   
   The impact is not quite as pronounced here (for simple dags at least)
   but takes the average queued to start delay from 1.5s to 0.4s
   
   Closes #6905 - the config option added for LocalExecutor is used here too.
   
   Data on this for a simple 10-task sequential DAG:
   
   ```sql
   SELECT execution_date,
       min(start_date - queued_dttm) AS min_quued_delay,
       max(start_date - queued_dttm) AS max_queued_delay,
       avg(start_date - queued_dttm) AS avg
   FROM task_instance
   WHERE dag_id = 'scenario1_case2_03_1' GROUP BY execution_date;
   ```
   
   |         execution_date         | min_quued_delay | max_queued_delay  | avg | with change? |
   | ------------------------------- | ----------------- | ------------------ | -- | -- |
    2020-10-08 01:00:00+01 | 00:00:00.348837 | 00:00:00.473693  | 00:00:00.396751 | Yes |
    2020-10-08 02:00:00+01 | 00:00:01.432304 | 00:00:01.574801  | 00:00:01.478422 | No |
   
   
   This was discovered in my general benchmarking and profiling of the scheduler for AIP-15, but it's not tied to any of that work. There are more of these kind of improvements coming, each unrelated but all add up.
   
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #11372: Reduce "start-up" time for tasks in CeleryExecutor

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #11372:
URL: https://github.com/apache/airflow/pull/11372#discussion_r502378199



##########
File path: airflow/executors/celery_executor.py
##########
@@ -78,6 +80,45 @@ def execute_command(command_to_exec: CommandType) -> None:
     """Executes command."""
     BaseExecutor.validate_command(command_to_exec)
     log.info("Executing command in Celery: %s", command_to_exec)
+
+    if settings.EXECUTE_TASKS_NEW_PYTHON_INTERPRETER:
+        _execute_in_subprocees(command_to_exec)
+    else:
+        _execute_in_fork(command_to_exec)
+
+
+def _execute_in_fork(command_to_exec: CommandType) -> None:
+    pid = os.fork()

Review comment:
       Can't cos of the `logging.shutdown()` at the end of task_run (which we need to keep, as that's when remote logs are uploaded. https://github.com/apache/airflow/pull/11327#issuecomment-705139313




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb merged pull request #11372: Reduce "start-up" time for tasks in CeleryExecutor

Posted by GitBox <gi...@apache.org>.
ashb merged pull request #11372:
URL: https://github.com/apache/airflow/pull/11372


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb merged pull request #11372: Reduce "start-up" time for tasks in CeleryExecutor

Posted by GitBox <gi...@apache.org>.
ashb merged pull request #11372:
URL: https://github.com/apache/airflow/pull/11372


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #11372: Reduce "start-up" time for tasks in CeleryExecutor

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #11372:
URL: https://github.com/apache/airflow/pull/11372#discussion_r502373478



##########
File path: airflow/executors/celery_executor.py
##########
@@ -78,6 +80,45 @@ def execute_command(command_to_exec: CommandType) -> None:
     """Executes command."""
     BaseExecutor.validate_command(command_to_exec)
     log.info("Executing command in Celery: %s", command_to_exec)
+
+    if settings.EXECUTE_TASKS_NEW_PYTHON_INTERPRETER:
+        _execute_in_subprocees(command_to_exec)
+    else:
+        _execute_in_fork(command_to_exec)
+
+
+def _execute_in_fork(command_to_exec: CommandType) -> None:
+    pid = os.fork()

Review comment:
       Do we need to fork it? Shouldn't we just execute it in current process (celery worker process)?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #11372: Reduce "start-up" time for tasks in CeleryExecutor

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #11372:
URL: https://github.com/apache/airflow/pull/11372#discussion_r502378199



##########
File path: airflow/executors/celery_executor.py
##########
@@ -78,6 +80,45 @@ def execute_command(command_to_exec: CommandType) -> None:
     """Executes command."""
     BaseExecutor.validate_command(command_to_exec)
     log.info("Executing command in Celery: %s", command_to_exec)
+
+    if settings.EXECUTE_TASKS_NEW_PYTHON_INTERPRETER:
+        _execute_in_subprocees(command_to_exec)
+    else:
+        _execute_in_fork(command_to_exec)
+
+
+def _execute_in_fork(command_to_exec: CommandType) -> None:
+    pid = os.fork()

Review comment:
       Can't cos of the `logging.shutdown()` at the end of task_run (which we need to keep, as that's when remote logs are uploaded. https://github.com/apache/airflow/pull/11327#issuecomment-705139313




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #11372: Reduce "start-up" time for tasks in CeleryExecutor

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #11372:
URL: https://github.com/apache/airflow/pull/11372#discussion_r502373478



##########
File path: airflow/executors/celery_executor.py
##########
@@ -78,6 +80,45 @@ def execute_command(command_to_exec: CommandType) -> None:
     """Executes command."""
     BaseExecutor.validate_command(command_to_exec)
     log.info("Executing command in Celery: %s", command_to_exec)
+
+    if settings.EXECUTE_TASKS_NEW_PYTHON_INTERPRETER:
+        _execute_in_subprocees(command_to_exec)
+    else:
+        _execute_in_fork(command_to_exec)
+
+
+def _execute_in_fork(command_to_exec: CommandType) -> None:
+    pid = os.fork()

Review comment:
       Do we need to fork it? Shouldn't we just execute it in current process (celery worker process)?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org