You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/12/01 09:14:58 UTC

[GitHub] [airflow] potiuk commented on pull request #19860: Restore stability and unquarantine all test_scheduler_job tests

potiuk commented on pull request #19860:
URL: https://github.com/apache/airflow/pull/19860#issuecomment-983440489


   Hey @ashb @ephraimbuddy @uranusjr 
   
   I "caught it more" in the act..
   
   I have added some more debugging to the issue and I dumped both stack-trace and tree of processes at the moment of the "dag processor reload".
   
   We have now three stack-traces to analyse - see the latest errors.
   
   Findings: 
   1) Indeed, the `settings reload` that is causing the problem is caused  by dag processor manager.
   ```
     ----------------------------- Captured stderr call -----------------------------
       File "<string>", line 1, in <module>
       File "/usr/local/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
         exitcode = _main(fd, parent_sentinel)
       File "/usr/local/lib/python3.9/multiprocessing/spawn.py", line 129, in _main
         return self._bootstrap(parent_sentinel)
       File "/usr/local/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
         self.run()
       File "/usr/local/lib/python3.9/multiprocessing/process.py", line 108, in run
         self._target(*self._args, **self._kwargs)
       File "/opt/airflow/airflow/dag_processing/manager.py", line 268, in _run_processor_manager
         traceback.print_stack()
   ```
   
   2) The process tree is interesting. I did not know how Pytest manages separation between different tests, but it seems that it forks a separate process for each tests and runs one process at a time, but all the processes are loaded and waiting to start while one of the processes runs a test.  I have not looked in details yet but I think this could explain the behaviour observed - if those forked processes share some memory via SQL drivers then running import in dag processor manager could potentially reload some shared memory (for example mapping of object classes to actuall types of the entities). 
   
   I think I never saw it happening for sqlite, it only happens for the "real" databases, so there might be some clever handling of multi-processing that we are not aware of.
   
   Excerpt:
   
   ```
    ► 1     (root) [dumb-init] 02:14 /usr/bin/dumb-init -- /entrypoint
       ├─7     (root) [bash] 02:14 bash /entrypoint
       │ └─134   (root) [bash] 02:15 bash /opt/airflow/scripts/in_container/run_ci_tests.sh --verbosity=0 --strict-markers --durations=100 --maxfail=50 --color=yes --pythonwarnings=ignore::DeprecationWarning --pythonwarnings=ignore::PendingDeprecationWarning --junitxml=/files/test_result-Core-mssql.xml --timeouts-order moi --setup-timeout=60 --execution-timeout=60 --teardown-timeout=60 -rfEX --with-db-init tests/core tests/executors tests/jobs tests/models tests/serialization tests/ti_deps tests/utils
       │   └─185   (root) [pytest] 02:15 /usr/local/bin/python /usr/local/bin/pytest --verbosity=0 --strict-markers --durations=100 --maxfail=50 --color=yes --pythonwarnings=ignore::DeprecationWarning --pythonwarnings=ignore::PendingDeprecationWarning --junitxml=/files/test_result-Core-mssql.xml --timeouts-order moi --setup-timeout=60 --execution-timeout=60 --teardown-timeout=60 -rfEX --with-db-init tests/core tests/executors tests/jobs tests/models tests/serialization tests/ti_deps tests/utils
       │     ├─1448  (root) [python] 02:16 /usr/local/bin/python -B -c from multiprocessing.resource_tracker import main;main(30)
       │     ├─2127  (root) [pytest] 02:18 /usr/local/bin/python /usr/local/bin/pytest --verbosity=0 --strict-markers --durations=100 --maxfail=50 --color=yes --pythonwarnings=ignore::DeprecationWarning --pythonwarnings=ignore::PendingDeprecationWarning --junitxml=/files/test_result-Core-mssql.xml --timeouts-order moi --setup-timeout=60 --execution-timeout=60 --teardown-timeout=60 -rfEX --with-db-init tests/core tests/executors tests/jobs tests/models tests/serialization tests/ti_deps tests/utils
       │     ├─2141  (root) [pytest] 02:18 /usr/local/bin/python /usr/local/bin/pytest --verbosity=0 --strict-markers --durations=100 --maxfail=50 --color=yes --pythonwarnings=ignore::DeprecationWarning --pythonwarnings=ignore::PendingDeprecationWarning --junitxml=/files/test_result-Core-mssql.xml --timeouts-order moi --setup-timeout=60 --execution-timeout=60 --teardown-timeout=60 -rfEX --with-db-init tests/core tests/executors tests/jobs tests/models tests/serialization tests/ti_deps tests/utils
       │     ├─2157  (root) [pytest] 02:18 /usr/local/bin/python /usr/local/bin/pytest --verbosity=0 --strict-markers --durations=100 --maxfail=50 --color=yes --pythonwarnings=ignore::DeprecationWarning --pythonwarnings=ignore::PendingDeprecationWarning --junitxml=/files/test_result-Core-mssql.xml --timeouts-order moi --setup-timeout=60 --execution-timeout=60 --teardown-timeout=60 -rfEX --with-db-init tests/core tests/executors tests/jobs tests/models tests/serialization tests/ti_deps tests/utils
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org