You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/03/23 15:26:47 UTC

[GitHub] [airflow] leonsmith edited a comment on issue #7935: scheduler gets stuck without a trace

leonsmith edited a comment on issue #7935:
URL: https://github.com/apache/airflow/issues/7935#issuecomment-804991322


   +1 on this issue.
   
   Airflow 2.0.1
   
   CeleryExecutor.
   
   7000 dags~ seems to happen under load (when we have a bunch all dags all kick off at midnight)
   
   <details>
     <summary>py-spy dump --pid 132 --locals</summary>
   
     ```py-spy dump --pid 132 --locals
   Process 132: /usr/local/bin/python /usr/local/bin/airflow scheduler
   Python v3.8.3 (/usr/local/bin/python)
   Thread 132 (idle): "MainThread"
       _send (multiprocessing/connection.py:368)
           Arguments::
               self: <Connection at 0x7f5db7aac550>
               buf: <bytes at 0x5564f22e5260>
               write: <builtin_function_or_method at 0x7f5dbed8a540>
           Locals::
               remaining: 1213
       _send_bytes (multiprocessing/connection.py:411)
           Arguments::
               self: <Connection at 0x7f5db7aac550>
               buf: <memoryview at 0x7f5db66f4a00>
           Locals::
               n: 1209
               header: <bytes at 0x7f5dbc01fb10>
       send (multiprocessing/connection.py:206)
           Arguments::
               self: <Connection at 0x7f5db7aac550>
               obj: <TaskCallbackRequest at 0x7f5db7398940>
       send_callback_to_execute (airflow/utils/dag_processing.py:283)
           Arguments::
               self: <DagFileProcessorAgent at 0x7f5db7aac880>
               request: <TaskCallbackRequest at 0x7f5db7398940>
       _process_executor_events (airflow/jobs/scheduler_job.py:1242)
           Arguments::
               self: <SchedulerJob at 0x7f5dbed3dd00>
               session: <Session at 0x7f5db80cf6a0>
           Locals::
               ti_primary_key_to_try_number_map: {("redeacted", "redeacted", <datetime.datetime at 0x7f5db768b540>): 1, ...}
               event_buffer: {...}
               tis_with_right_state: [("redeacted", "redeacted", <datetime.datetime at 0x7f5db768b540>, 1), ...]
               ti_key: ("redeacted", "redeacted", ...)
               value: ("failed", None)
               state: "failed"
               _: None
               filter_for_tis: <BooleanClauseList at 0x7f5db7427df0>
               tis: [<TaskInstance at 0x7f5dbbfd77c0>, <TaskInstance at 0x7f5dbbfd7880>, <TaskInstance at 0x7f5dbbfdd820>, ...]
               ti: <TaskInstance at 0x7f5dbbffba90>
               try_number: 1
               buffer_key: ("redeacted", ...)
               info: None
               msg: "Executor reports task instance %s finished (%s) although the task says its %s. (Info: %s) Was the task killed externally?"
               request: <TaskCallbackRequest at 0x7f5db7398940>
       wrapper (airflow/utils/session.py:62)
           Locals::
               args: (<SchedulerJob at 0x7f5dbed3dd00>)
               kwargs: {"session": <Session at 0x7f5db80cf6a0>}
       _run_scheduler_loop (airflow/jobs/scheduler_job.py:1386)
           Arguments::
               self: <SchedulerJob at 0x7f5dbed3dd00>
           Locals::
               is_unit_test: False
               call_regular_interval: <function at 0x7f5db7ac3040>
               loop_count: 1
               timer: <Timer at 0x7f5db76808b0>
               session: <Session at 0x7f5db80cf6a0>
               num_queued_tis: 17
       _execute (airflow/jobs/scheduler_job.py:1280)
           Arguments::
               self: <SchedulerJob at 0x7f5dbed3dd00>
           Locals::
               pickle_dags: False
               async_mode: True
               processor_timeout_seconds: 600
               processor_timeout: <datetime.timedelta at 0x7f5db7ab9300>
               execute_start_time: <datetime.datetime at 0x7f5db7727510>
       run (airflow/jobs/base_job.py:237)
           Arguments::
               self: <SchedulerJob at 0x7f5dbed3dd00>
           Locals::
               session: <Session at 0x7f5db80cf6a0>
       scheduler (airflow/cli/commands/scheduler_command.py:63)
           Arguments::
               args: <Namespace at 0x7f5db816f6a0>
           Locals::
               job: <SchedulerJob at 0x7f5dbed3dd00>
       wrapper (airflow/utils/cli.py:89)
           Locals::
               args: (<Namespace at 0x7f5db816f6a0>)
               kwargs: {}
               metrics: {"sub_command": "scheduler", "start_datetime": <datetime.datetime at 0x7f5db80f5db0>, ...}
       command (airflow/cli/cli_parser.py:48)
           Locals::
               args: (<Namespace at 0x7f5db816f6a0>)
               kwargs: {}
               func: <function at 0x7f5db8090790>
       main (airflow/__main__.py:40)
           Locals::
               parser: <DefaultHelpParser at 0x7f5dbec13700>
               args: <Namespace at 0x7f5db816f6a0>
       <module> (airflow:8)
   ```
   </details>
   
   <details>
     <summary>py-spy dump --pid 134 --locals</summary>
   
     ```Process 134: airflow scheduler -- DagFileProcessorManager
   Python v3.8.3 (/usr/local/bin/python)
   Thread 134 (idle): "MainThread"
       _send (multiprocessing/connection.py:368)
           Arguments::
               self: <Connection at 0x7f5db77274f0>
               buf: <bytes at 0x5564f1a76590>
               write: <builtin_function_or_method at 0x7f5dbed8a540>
           Locals::
               remaining: 2276
       _send_bytes (multiprocessing/connection.py:411)
           Arguments::
               self: <Connection at 0x7f5db77274f0>
               buf: <memoryview at 0x7f5db77d7c40>
           Locals::
               n: 2272
               header: <bytes at 0x7f5db6eb1f60>
       send (multiprocessing/connection.py:206)
           Arguments::
               self: <Connection at 0x7f5db77274f0>
               obj: (...)
       _run_parsing_loop (airflow/utils/dag_processing.py:698)
           Locals::
               poll_time: 0.9996239839999816
               loop_start_time: 690.422146969
               ready: [<Connection at 0x7f5db77274f0>]
               agent_signal: <TaskCallbackRequest at 0x7f5db678c8e0>
               sentinel: <Connection at 0x7f5db77274f0>
               processor: <DagFileProcessorProcess at 0x7f5db6eb1910>
               all_files_processed: False
               max_runs_reached: False
               dag_parsing_stat: (...)
               loop_duration: 0.0003760160000183532
       start (airflow/utils/dag_processing.py:596)
           Arguments::
               self: <DagFileProcessorManager at 0x7f5dbcb9c880>
       _run_processor_manager (airflow/utils/dag_processing.py:365)
           Arguments::
               dag_directory: "/code/src/dags"
               max_runs: -1
               processor_factory: <function at 0x7f5db7b30ee0>
               processor_timeout: <datetime.timedelta at 0x7f5db7ab9300>
               signal_conn: <Connection at 0x7f5db77274f0>
               dag_ids: []
               pickle_dags: False
               async_mode: True
           Locals::
               processor_manager: <DagFileProcessorManager at 0x7f5dbcb9c880>
       run (multiprocessing/process.py:108)
           Arguments::
               self: <ForkProcess at 0x7f5db7727220>
       _bootstrap (multiprocessing/process.py:315)
           Arguments::
               self: <ForkProcess at 0x7f5db7727220>
               parent_sentinel: 8
           Locals::
               util: <module at 0x7f5db8011e00>
               context: <module at 0x7f5dbcb8ba90>
       _launch (multiprocessing/popen_fork.py:75)
           Arguments::
               self: <Popen at 0x7f5db7727820>
               process_obj: <ForkProcess at 0x7f5db7727220>
           Locals::
               code: 1
               parent_r: 6
               child_w: 7
               child_r: 8
               parent_w: 9
       __init__ (multiprocessing/popen_fork.py:19)
           Arguments::
               self: <Popen at 0x7f5db7727820>
               process_obj: <ForkProcess at 0x7f5db7727220>
       _Popen (multiprocessing/context.py:276)
           Arguments::
               process_obj: <ForkProcess at 0x7f5db7727220>
           Locals::
               Popen: <type at 0x5564f1a439e0>
       start (multiprocessing/process.py:121)
           Arguments::
               self: <ForkProcess at 0x7f5db7727220>
       start (airflow/utils/dag_processing.py:248)
           Arguments::
               self: <DagFileProcessorAgent at 0x7f5db7aac880>
           Locals::
               mp_start_method: "fork"
               context: <ForkContext at 0x7f5dbcb9ce80>
               child_signal_conn: <Connection at 0x7f5db77274f0>
               process: <ForkProcess at 0x7f5db7727220>
       _execute (airflow/jobs/scheduler_job.py:1276)
           Arguments::
               self: <SchedulerJob at 0x7f5dbed3dd00>
           Locals::
               pickle_dags: False
               async_mode: True
               processor_timeout_seconds: 600
               processor_timeout: <datetime.timedelta at 0x7f5db7ab9300>
       run (airflow/jobs/base_job.py:237)
           Arguments::
               self: <SchedulerJob at 0x7f5dbed3dd00>
           Locals::
               session: <Session at 0x7f5db80cf6a0>
       scheduler (airflow/cli/commands/scheduler_command.py:63)
           Arguments::
               args: <Namespace at 0x7f5db816f6a0>
           Locals::
               job: <SchedulerJob at 0x7f5dbed3dd00>
       wrapper (airflow/utils/cli.py:89)
           Locals::
               args: (<Namespace at 0x7f5db816f6a0>)
               kwargs: {}
               metrics: {"sub_command": "scheduler", "start_datetime": <datetime.datetime at 0x7f5db80f5db0>, ...}
       command (airflow/cli/cli_parser.py:48)
           Locals::
               args: (<Namespace at 0x7f5db816f6a0>)
               kwargs: {}
               func: <function at 0x7f5db8090790>
       main (airflow/__main__.py:40)
           Locals::
               parser: <DefaultHelpParser at 0x7f5dbec13700>
               args: <Namespace at 0x7f5db816f6a0>
       <module> (airflow:8)
   ```
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org