You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "antonio-antuan (via GitHub)" <gi...@apache.org> on 2023/03/01 15:36:04 UTC

[GitHub] [airflow] antonio-antuan opened a new issue, #29841: high memory leak, cannot start even webserver

antonio-antuan opened a new issue, #29841:
URL: https://github.com/apache/airflow/issues/29841

   ### Apache Airflow version
   
   2.5.1
   
   ### What happened
   
   I'd used airflow 2.3.1 and everything was fine. 
   Then I decided to move to airflow 2.5.1.
   I can't start even webserver, airflow on my laptop consumes the entire memory (32Gb) and OOM killer comes.
   
   I investigated a bit. So it starts with airflow 2.3.4. Only using official docker image (apache/airflow:2.3.4) and only on linux laptop, mac is ok.
   
   Memory leak starts when source code tries to import for example `airflow.cli.commands.webserver_command` module using `airflow.utils.module_loading.import_string`.
   I dived deeply and found that it happens when "import daemon" is performed.
   You can reproduce it with this command: `docker run --rm --entrypoint="" apache/airflow:2.3.4 /bin/bash -c "python -c 'import daemon'"`. Once again, reproducec only on linux (my kernel is 6.1.12).
   That's weird considering `daemon` hasn't been changed since 2018.
   
   ### What you think should happen instead
   
   _No response_
   
   ### How to reproduce
   
   docker run --rm --entrypoint="" apache/airflow:2.3.4 /bin/bash -c "python -c 'import daemon'"
   
   ### Operating System
   
   Arch Linux (kernel 6.1.12)
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #29841: high memory leak, cannot start even webserver

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #29841:
URL: https://github.com/apache/airflow/issues/29841#issuecomment-1450518290

   Thanks for looking into it and finding out the root cause being python-daemon. So far we missed the root cause of it, but we knew the new containerd has the problem due to their changed settings. 
   
   This is a known issue with container.d changing their default setting - we had been discussing it recenty in https://github.com/apache/airflow/discussions/29731 and the discussion contains also some workarounds that you can use.
   
   The fact that we know it's python-daemon opens up a possibility that we can likely patch it somehow while waiting for either containerd reverting the change or Python daemon fixing their behaviour


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] antonio-antuan commented on issue #29841: high memory leak, cannot start even webserver

Posted by "antonio-antuan (via GitHub)" <gi...@apache.org>.
antonio-antuan commented on issue #29841:
URL: https://github.com/apache/airflow/issues/29841#issuecomment-1450434725

   created an issue for python-daemon: https://pagure.io/python-daemon/issue/72


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #29841: high memory leak, cannot start even webserver

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #29841:
URL: https://github.com/apache/airflow/issues/29841#issuecomment-1454756641

   The `python-daemon==3.0.0` has been releassed with the fix - you can upgrade it and it should fix the problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] antonio-antuan commented on issue #29841: high memory leak, cannot start even webserver

Posted by "antonio-antuan (via GitHub)" <gi...@apache.org>.
antonio-antuan commented on issue #29841:
URL: https://github.com/apache/airflow/issues/29841#issuecomment-1450419914

   ok, clarified
   
   this is the place where things happens: site-packages/daemon/daemon.py:868
   
   ```
   def get_maximum_file_descriptors():
       """ Get the maximum number of open file descriptors for this process.
   
           :return: The number (integer) to use as the maximum number of open
               files for this process.
   
           The maximum is the process hard resource limit of maximum number of
           open file descriptors. If the limit is “infinity”, a default value
           of ``MAXFD`` is returned.
           """
       (__, hard_limit) = resource.getrlimit(resource.RLIMIT_NOFILE)
   
       result = hard_limit
       if hard_limit == resource.RLIM_INFINITY:
           result = MAXFD
   
       return result
   
   
   _total_file_descriptor_range = (0, get_maximum_file_descriptors())
   _total_file_descriptor_set = set(range(*_total_file_descriptor_range))
   ```
   
   apache/airflow:2.3.1 (and above I think) doesn't have this version of library so nothing happens during initialization.
   I'd like to suggest better ulimits for image


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] antonio-antuan commented on issue #29841: high memory leak, cannot start even webserver

Posted by "antonio-antuan (via GitHub)" <gi...@apache.org>.
antonio-antuan commented on issue #29841:
URL: https://github.com/apache/airflow/issues/29841#issuecomment-1450531952

   Glad to help :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #29841: high memory leak, cannot start even webserver

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk closed issue #29841: high memory leak, cannot start even webserver
URL: https://github.com/apache/airflow/issues/29841


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org