You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/03/21 15:35:26 UTC

[GitHub] [airflow] itispankajsingh opened a new issue #14924: Scheduler Memory Leak in Airflow 2.0

itispankajsingh opened a new issue #14924:
URL: https://github.com/apache/airflow/issues/14924


   **Apache Airflow version**: 2.0.1
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`): v1.17.4
   
   **Environment**: Dev
   - **OS** (e.g. from /etc/os-release): RHEL7
   
   **What happened**:
   
   After running fine for some time my airflow tasks got stuck in scheduled state with below error in Task Instance Details:
   "All dependencies are met but the task instance is not running. In most cases this just means that the task will probably be scheduled soon unless: - The scheduler is down or under heavy load If this task instance does not start soon please contact your Airflow administrator for assistance."
   
   
   **What you expected to happen**:
   
   I restarted the scheduler then it started working fine. When i checked my metrics i realized the scheduler has a memory leak and over past 4 days it has reached up to 6GB of memory utilization
   
   In version >2.0 we don't even have the run_duration config option to restart scheduler periodically to avoid this issue until it is resolved.
   
   **How to reproduce it**:
   I saw this issue in multiple dev instances of mine all running Airflow 2.0.1 on kubernetes with KubernetesExecutor. 
   Below are the configs that i changed from the default config.
   max_active_dag_runs_per_dag=32
   parallelism=64
   dag_concurrency=32
   sql_Alchemy_pool_size=50
   sql_Alchemy_max_overflow=30
   
   **Anything else we need to know**:
   
   The scheduler memory leaks occurs consistently in all instances i have been running. The memory utilization keeps growing for scheduler.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suhanovv edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911914984


   > > my screenshot shows the metric container_memory_working_set_bytes, it decreased after 10 seconds (scraping time of metrics) after rm execution, scheduler did not restart
   > 
   > Ah. so investigation begins again then. @Suhanov , Need some more answers :).
   > 
   > Just to clarify: So the 4GB memory drop (out of 6GB) was all from `container_memory_working_set_bytes` ?
   
   ![image](https://user-images.githubusercontent.com/3032319/131887878-22acf4cf-12f3-4b36-a734-ee4b6f345e27.png)
   yes
   
   > 
   > Do you know which processes they were ? Do you have any other processes running in the container besides airflow ? Was there a drop in a number of processes when you deleted the files?
   
   after deleting files, the number of processes has not changed
   
   > 
   > Maybe you can even look now and show the memory usage of the processes you have now in the container after some time of running the scheduler after deletion.
   
   ![image](https://user-images.githubusercontent.com/3032319/131891322-c0396bdd-e9fe-419b-a030-83a3d0c1e7fc.png)
   
   
   > 
   > I'd love to get to the bottom of it, because I find it really surprising to find that removal of files causes memory drop. I think - besides the kernel cache - which is low level, you'd really have some kind of service that is subscribed to those files via FS "notify" kind of system to be able to free any memory as result of deleting a file.
   > 
   > Normally, if you have a file opened in linux and the file gets deleted, nothing special happens, the file is not actually removed until the last process that keeps the file opened is closed. So I find it really surprising to see such behaviour (that's why it is so interesting - because it is counter-intuitive).
   > 
   > Additional question: What KIND of filesystem you have for the logs in scheduler ? Is it a usual "local filesystem" or is it some kind of distributed, user-space kind of filesystem?
   
   local fs
   
   > 
   > Because if it is the latter, then it could be the (for example user-space run) filesystem that keeps the log files data in memory and frees them after they are deleted.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-910582468


   That is cool finding. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-912003732


   @suhanovv  - Is the `container_memory_working_set_bytes` growing still continuously now ? Can you please observe it for a while and when it grows, compare the memory used by processes and see which one is taking the GB of memory? I think knowing that would speed up any kind of hypothesis/investigation and make it waaaay easier.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911852026


   > I don't get why it would cache files it doesn't even need to look at (when I create dummy folder in the logs folder)
   
   Well you do create a cache at the moment you WRITE the file (and when it is flushed to disk). This is simply how linux Page Cache works. You can specifically prevent the file to be written to cache when you save files but this is low-level API and very few systems do it because it has almost no effect besides dropping the metrics.
   
   > Strange thing is that this memory increase is not seen with the web service
   
   With the webserver, you likely already reached the memory limits and the whole available memory is used for cache. This is pretty much that happens on any long-running system that creates or reads a lot of files. Maybe you can compare the limits you have there.
   
   > And finally, what you say is that it doesn't relate to the memory leak topic that's it ? or maybe the memory leak is a false flag for the freeze @itispankajsingh was concerned about
   
   I do not know that. I am just saying that IF you see `container_memory_cache` growing, this is pretty normal and expected and the `container_memory_working_set_bytes` is something that you should rather look at if you want to see if there is a memory leak.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-914143098


   So I guess the quest continues. Hmm. Interesting one that it wen't down indeed after some time. If that's the cache then this would be strange to have `container_memory_working_set_bytes`  (I presume the graph above is this?). 
   
   I have another hypothesis. Linux Kernel also has "dentries" and "inode" caches - it keeps in memory the used/opened directory structure and file node information. And I believe those caches would also be cleared whenever the log files are deleted.
   
   If this is a cache, you can very easily check it - you can force cleaning the cache and see the results:
   
   Cleaning just PageCache:
   ```
   sync; echo 1 > /proc/sys/vm/drop_caches
   ```
   Cleaning dentries and indoes:
   ```
   sync; echo 2 > /proc/sys/vm/drop_caches
   ```
   
   Can you make such experiment please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-913683637


   🀞 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911852026


   > I don't get why it would cache files it doesn't even need to look at (when I create dummy folder in the logs folder)
   
   Well you do create a cache at the moment you WRITE the file (and when it is flushed to disk). This is simply how linux Page Cache works. You can specifically prevent the file to be written to cache when you save files but this is low-level API and very few systems do it because it has no effect besides dropping the metrics.
   
   > Strange thing is that this memory increase is not seen with the web service
   
   With the webserver, you likely already reached the memory limits and the whole available memory is used for cache. This is pretty much that happens on any long-running system that creates or reads a lot of files. Maybe you can compare the limits you have there.
   
   > And finally, what you say is that it doesn't relate to the memory leak topic that's it ? or maybe the memory leak is a false flag for the freeze @itispankajsingh was concerned about
   
   I do not know that. I am just saying that IF you see `container_memory_cache` growing, this is pretty normal and expected and the `container_memory_working_set_bytes` is something that you should rather look at if you want to see if there is a memory leak.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-914234073


   Still - you can see whether it's process or cache memory that grows:
   
   For example here you can see how to check different types of memory used: https://phoenixnap.com/kb/linux-commands-check-memory-usage
   
   Could you check what kind of memory is growing ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] lixiaoyong12 commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
lixiaoyong12 commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-914241075


   > > I deployed the scheduler directly on the Linux operating system.
   > 
   > Still - you can see whether it's process or cache memory that grows:
   > 
   > For example here you can see how to check different types of memory used: https://phoenixnap.com/kb/linux-commands-check-memory-usage
   > 
   > Could you check what kind of memory is growing ?
   i use pmap -p 203557 | grep anon , i found `00007efdc9d0d000 115968K rw---   [ anon ]`  that grows
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-914267658


   Just to explain @lixiaoyong12 -> when you have a number of different dags and schedules, I think - depending on frequency etc. - this would be perfectly normal for scheduler to use more memory over time initially. Generally speaking it should stabilize after some time and then it will be fluctuating up/down dependning on what is happening. That's why I want to make sure this is not such a fluctuation, also if you could run periodically the cache cleanup and see if the memory is returning back to some more-or-less same value after some time. That would be most helpful!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] CapBananoid edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
CapBananoid edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-910517428


   Hi,
   if it can help I have this kind of leak on 2.1.2 celeryexecutor on kubernetes
   scheduler increased per approximatively 400M every day
   and  I noticed a drop (from 2G to 500M) as soon as I deleted old logs in /opt/airflow/logs/scheduler (deleted all folders but the one pointed by latest)...
   I had 2G of logs and now back to 300M and didn't need to restart the scheduler for that drop to occur
   I don't know if it's really related or whatever but I think I will add a CronJob container on this volume to periodically do this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-914234073


   > I deployed the scheduler directly on the Linux operating system.
   
   Still - you can see whether it's process or cache memory that grows:
   
   For example here you can see how to check different types of memory used: https://phoenixnap.com/kb/linux-commands-check-memory-usage
   
   Could you check what kind of memory is growing ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suxin1995 commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suxin1995 commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-810929124


   In my produce env.
   
   used :
   linux release: Debain
   airflow version:  2.0.1
   exeucutor = CeleryExecutor
   max_active_dag_runs_per_dag=32
   parallelism=32
   dag_concurrency=16
   sql_Alchemy_pool_size=16
   sql_Alchemy_max_overflow=16
   
   about 3 workers,40 dags, 1000 tasks. Many tasks keep `scheduled` status sometimes  and canot keep running. 
   when  I  call cron script to restart process every hour.The problem haved sloved.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911852026


   > I don't get why it would cache files it doesn't even need to look at (when I create dummy folder in the logs folder)
   
   Well you do create a cache at the moment you WRITE the file (and when it is flushed to disk). This is simply how linux Page Cache works. You can specifically prevent the file to be written to cache when you save files but this is low-level API and very few systems do it because it has almost no effect besides dropping the metrics. See the discussion under "Optimizing Page Cache" in : https://www.thomas-krenn.com/en/wiki/Linux_Page_Cache_Basics
   
   > Strange thing is that this memory increase is not seen with the web service
   
   With the webserver, you likely already reached the memory limits and the whole available memory is used for cache. This is pretty much that happens on any long-running system that creates or reads a lot of files. Maybe you can compare the limits you have there.
   
   > And finally, what you say is that it doesn't relate to the memory leak topic that's it ? or maybe the memory leak is a false flag for the freeze @itispankajsingh was concerned about
   
   I do not know that. I am just saying that IF you see `container_memory_cache` growing, this is pretty normal and expected and the `container_memory_working_set_bytes` is something that you should rather look at if you want to see if there is a memory leak.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911882319


   > my screenshot shows the metric container_memory_working_set_bytes, it decreased after 10 seconds (scraping time of metrics) after rm execution, scheduler did not restart
   
   Ah. so investigation begins again then. @suhanov , Need some more answers :). 
   
   Just to clarify: So the 4GB memory drop (out of 6GB) was all from `container_memory_working_set_bytes` ?  
   
   Do you know which processes they were ?  Do you have any other processes running in the container besides airflow ? Was there a drop in a number of processes when you deleted the files?
   
   Maybe you can even look now and show the memory usage of the processes you have now in the container after some time of running the scheduler after deletion. 
   
   I'd love to get to the bottom of it, because I find it really surprising to find that removal of files causes memory drop. I think - besides the kernel cache - which is low level, you'd really have some kind of service that is subscribed to those files via FS "notify" kind of system to be able to free any memory as result of deleting a file.
   
   Normally, if you have a file opened in linux and the file gets deleted, nothing special happens, the file is not actually removed until the last process that keeps the file opened is closed. So I find it really surprising to see such behaviour (that's why it is so interesting - because it is counter-intuitive).
   
   Additional question: What KIND of filesystem you have for the logs in scheduler ?  Is it a usual "local filesystem" or is it some kind of distributed, user-space kind of filesystem. Because if it is the latter, then it could be the (for example user-space run) filesystem that keeps the log  files data in  memory and frees them after they are deleted. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-914263145


   Can you please dump a few pmap outputs at different times and share it in .tar.gz or smth @lixiaoyong12  ? Without grep so that we can see everything. Ideally over of timespan of few hours so that we see that this is not a "temporary" fluctuation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911812067


   Ah right. The last line you wrote (container_memory_cache) is GOLD.
   
   That probably would explain it and it's NOT AN ISSUE.
   
   When you open many files Linux basically will use as much memory it can for file caches. Whenever you read or write a file, the blocks of disk are kept also in memory just in case the files needs to be accessed by any process. It also marks them dirty in case the blocks change and evicts such dirty blocks from memory. Also when some process needs  more memory than it has available, it will evict some unused pages from memory to free them. Basically for any system, that writes files to logs continuously and the logs are not modified later, the cache memory will grow CONTINUOUSLY until the limit set  by kernel configuration.
   
   So depending on what your Kernel configuration is (basically the Kernel of your Kubernetes Virtual machines under the hood), you will see the metrics growing continuously (up to the kernel limit). You can limit the memory available to your Scheduler container  to limit it "per container" (via giving it less memory resources) but basically as much memory you give to the scheduler container, it will be used for cache after some time (and will not be explicitly freed -  but it's not a problem because the memory is effectively "free" - it's just used for cache and it can be freed immediately when needed).
   
   That would PERFECTLY explain why the memory drops immediately after the files are deleted - those files are deleted so the cache for those files should also get deleted by the system immediately. 
   
   Instead of looking at total memory used you should look at the **container_memory_working_set_bytes** - metrics. It reflects the actually "actively used" memory.  You can read more here: https://blog.freshtracks.io/a-deep-dive-into-kubernetes-metrics-part-3-container-resource-metrics-361c5ee46e66
   
   You can also test it by running (from https://linuxhint.com/clear_cache_linux/):
   
   `echo 1 > /proc/sys/vm/drop_caches`
   
   In the container. This should drop your caches immediately without deleting the files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-913550223


   Hey @suhanovv - any results of the tests ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suhanovv commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-914153622


   @potiuk 
   I will be able to check this in the late afternoon or tomorrow, since we had work on the cluster and had to restart the container and now it has no cache.
   
   The fact that this is a cache, I'm sure, added to the chart container_memory_cache
   
   ![image](https://user-images.githubusercontent.com/3032319/132322142-7c60c4a9-567d-41fb-8549-bfbf54533321.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-912536196


   Ok. not 100% sure  this is the problem but I think it's very probable and I have potential fix in #18012. I believe we were not calling .close() method on FileHandler in case the FileProcessorHandler context changed (but I am not 100% sure if that's what happening).  @ashb - care to take a look ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-912562519


   So I _think_ that the file handler will be closed by the GC finalizer (and since Python uses both ref counting and a period GC to detect loops) assigning `self.handler` to something else should GC the old handler and old FH.
   
   _Should_. But it is entirely possible that the logging framework might have another reference to it somewhere hanging around.
   
   ```
   In [2]: x = {'y': open("/etc/passwd")}
   ```
   
   ```
   lrwx------ ash ash 64 B Fri Sep  3 14:48:12 2021  0 β‡’ /dev/pts/3
   lr-x------ ash ash 64 B Fri Sep  3 14:49:06 2021  12 β‡’ /etc/passwd
   ```
   
   ```
   x['y'] = open("/etc/group")
   ```
   
   ```
   lrwx------ ash ash 64 B Fri Sep  3 14:48:12 2021  0 β‡’ /dev/pts/3
   lr-x------ ash ash 64 B Fri Sep  3 14:49:26 2021  13 β‡’ /etc/group


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suhanovv commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-912506318


   As far as I know, no optimizations were made,
   and the problem is not with the logs of the dag processor manager, but the problem with the logs of DagFileProcessorProcess and the default airflow.utils.log.file_processor_handler.FileProcessorHandler is used there without any modifications and custom configurations.
   
   At the moment, I'm not sure than the problem is in the "dirty" memory,
   
   ![image](https://user-images.githubusercontent.com/3032319/132006246-244123d5-5399-4604-90c0-4a18dde22d9d.png)
   
   but I will check tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-912539637


   @suhanovv -> maybe you can apply the patch (maybe tomorrow) and test it ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911852026


   > I don't get why it would cache files it doesn't even need to look at (when I create dummy folder in the logs folder)
   
   Well you do create a cache at the moment you WRITE the file (and when it is flushed to disk). This is simply how linux Page Cache write. You can specifically prevent the file to be written to cache when you save files but this is low-level API and very few systems do it because it has no effect besides dropping the metrics.
   
   > Strange thing is that this memory increase is not seen with the web service
   
   With the webserver, you likely already reached the memory limits and the whole available memory is used for cache. This is pretty much that happens on any long-running system that creates or reads a lot of files. Maybe you can compare the limits you have there.
   
   > And finally, what you say is that it doesn't relate to the memory leak topic that's it ? or maybe the memory leak is a false flag for the freeze @itispankajsingh was concerned about
   
   I do not know that. I am just saying that IF you see `container_memory_cache` growing, this is pretty normal and expected and the `container_memory_working_set_bytes` is something that you should rather look at if you want to see if there is a memory leak.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suhanovv commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-913632721


   @potiuk 
   We have deployed today and while we are watching, tomorrow I will return with the results


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suhanovv commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-912445541


   sory it was wrong, fixed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
ashb edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-912067074


   When I was digging in to a similar issue I couldn't see the memory attributed to any particular process -- only the whole container via working_set_bytes -- I was testing/looking in `ps` and all of the counters I could see in `/proc/<pid>/` but didn't see any memory growth reflected in any of those
   
   So that led me to believe the problem was not a traditional memory leak from the python code, but something OS related.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911776556


   Few questions.
   
   Do you know which processes/containers keep the memory? Is it scheduler (and which container)? Maybe you can see the breakdown per process as well ? I understand this is whole cluster memory, and I am trying to wrap my head around it and see where it can come from, because it is super weird behaviour to get back memory after deleting files (?). 
   
   Dp you simply run "rm *"  in the "/opt/airflow/logs/scheduler" and it drops immediately after? Or is there some delay involved? Do you do anything else than `rm` ? I understand you do not restart scheduler. Can you check if maybe the scheduler  restarted itself by coincidence (or triggered by the deletion) ? 
   
   Maybe also you can see how many airflow related processes you have when scheduler runs? And maybe their number  grows and then drops when you delete the logs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-912003732


   @suhanovv  - Is the `container_memory_working_set_bytes` growing still continuously now ? Can you please observe it for a while and when it grows, compare the memory used by processes and see which one is taking the GB of memory? I think knowing that would speed up any kind of hypothesis/investigation waaaay easier.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-914263145


   Can you please dump a few pmap outputs at different times and share it in .tar.gz or smth @lixiaoyong12  ? Without grep so that we can see everything. Ideally over of timespan of few hours so that we see that this is not a "temporary" fluctuation and see the trend ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-912539637


   @suhanovv -> mayb you can apply the patch (maybe tomorrow) and test it ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] raphaelauv commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
raphaelauv commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-912434769


   Thanks @suhanovv for your investigations
   
   Do you have the aws S3 log export activated ? ( if it's the case , could you make a test where it's deactivated )
   
   Last time I add "weird" (non process related) memory consumption it was related to a "memory leak" cause by aws boto3
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suhanovv commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911868872


   my screenshot shows the metric container_memory_working_set_bytes, it decreased after 10 seconds (scraping time of metrics) after rm execution, scheduler did not restart


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-912435443


   > Do you have the aws S3 log export activated ? ( if it's the case , could you make a test where it's deactivated )
   
   My thought exactly, feels like some 3rd-party process running and accessing the logs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-914143098


   So I guess the quest continues. Hmm. Interesting one that it wen't down indeed after some time. If that's the cache then this would be strange to have `container_memory_working_set_bytes`  (I presume the graph above is this?). 
   
   I have another hypothesis. Linux Kernel also has "dentries" and "inode" caches - it keeps in memory the used/opened directory structure and file node information. And I believe those caches would also be cleared whenever the log files are deleted.
   
   If this is a cache, you can very easily check it - you can force cleaning the cache and see the results:
   
   Cleaning just PageCache:
   ```
   sync; echo 1 > /proc/sys/vm/drop_caches
   ```
   Cleaning dentries and inodes:
   ```
   sync; echo 2 > /proc/sys/vm/drop_caches
   ```
   
   Can you make such experiment please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-805597470


   Please keep this thread on topic with the scheduler memory issue. For common usage questions, please open threads in [Discussions](https://github.com/apache/airflow/discussions) instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suxin1995 commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suxin1995 commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-803733376


   i have same problem.  i looked for all the channels and methods but did not solve it!
   Thanks for your question and let me know what the problem is.
   I have to use crontab script to restart scheduler process regularly.  But this is stupid.
   If  can't solve it, I can only fall back to 1.10.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suxin1995 commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suxin1995 commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-803954778


   Thanks for yours comments.
   It seems that the memory leak problem has not been solved well.
   I can only call cron script to restart process every hour.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suhanovv edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-912430014


   @potiuk @ashb 
   Firstly, since yesterday's deletion of the logs, memory consumption has grown by about 500MB.
   
   Second: my research led me to the following:
   
   Since the processes during this time did not consume more memory, and there were no processes in /proc/*/fd that were holding descriptors to the old logs, I decided to see what the Container_memory_working_set_bytes metric is actually showing -
   "From the cAdvisor code, the working set memory is defined as: The amount of working set memory and it includes recently accessed memory, dirty memory, and kernel memory. Therefore, Working set is (lesser than or equal to) </ =" usage ".". After that, I conducted an experiment - with the removal of another day of logs with the output of vmstat before and after deletion. As it turned out, Container_memory_working_set_bytes - decreased by about 0.7-0.8 GB and in vmstat the system cache also decreased by 770 MB. 
   Before
   ![image](https://user-images.githubusercontent.com/3032319/131993690-72695267-e1e6-4c5f-98cf-af2af133aba4.png)
   After
   ![image](https://user-images.githubusercontent.com/3032319/131990200-fddac30d-c89b-4481-8dd8-61d28acc3471.png)
   
   Those. all these logs are stored in the cache, but it remains a mystery why they are in the cache for so long, our devops will also look from their side why this can happen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-914165032


   Ah cool. So at least we figured that one out. Then it should be no problem whatsoever. One thing we COULD do is we could potentially add this hint to kernel to not add the files to the cache if this is a Page Cache. It's not a harm in general to get this cache growing, but adding the hint might actually save us from diagnosing and investigating issues like this ;) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] CapBananoid commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
CapBananoid commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911827420


   I can set that launch with a Cron Job easily yes but even if I understand the cache thing, I don't get why it would cache files it doesn't even need to look at (when I create dummy folder in the logs folder)
   I will also try implementing limits in the scheduler container to see if it does the trick
   Strange thing is that this memory increase is not seen with the web service
   And finally, what you say is that it doesn't relate to the memory leak topic that's it ? or maybe the memory leak is a false flag for the freeze the @itispankajsingh was concerned about


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suhanovv edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-912430014


   @potiuk @ashb 
   Firstly, since yesterday's deletion of the logs, memory consumption has grown by about 500MB.
   
   Second: my research led me to the following:
   
   Since the processes during this time did not consume more memory, and there were no processes in /proc/*/fd that were holding descriptors to the old logs, I decided to see what the Container_memory_working_set_bytes metric is actually showing -
   "From the cAdvisor code, the working set memory is defined as: The amount of working set memory and it includes recently accessed memory, dirty memory, and kernel memory. Therefore, Working set is (lesser than or equal to) </ =" usage ".". After that, I conducted an experiment - with the removal of another day of logs with the output of vmstat before and after deletion. As it turned out, Container_memory_working_set_bytes - decreased by about 0.7-0.8 GB and in vmstat the system cache also decreased by 770 MB. 
   Before
   ![image](https://user-images.githubusercontent.com/3032319/131990427-ca38b986-e5f0-4cdd-b095-3aa5a8d153bd.png)
   After
   ![image](https://user-images.githubusercontent.com/3032319/131990200-fddac30d-c89b-4481-8dd8-61d28acc3471.png)
   
   Those. all these logs are stored in the cache, but it remains a mystery why they are in the cache for so long, our devops will also look from their side why this can happen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-914165032


   Ah cool. So at least we figured that one out. Then it should be no problem whatsoever. One thing we COULD do is we could potentially add this hint to kernel to not add the log files to the cache if this is a Page Cache. It's not a harm in general to get this cache growing, but adding the hint might actually save us from diagnosing and investigating issues like this ;) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911852026


   > I don't get why it would cache files it doesn't even need to look at (when I create dummy folder in the logs folder)
   
   Well you do create a cache at the moment you WRITE the file (and when it is flushed to disk). This is simply how linux Page Cache works. You can specifically prevent the file to be written to cache when you save files but this is low-level API and very few systems do it because it has almost no effect besides dropping the metrics. See the discussion under "Optimizing Page Cache" in : https://www.thomas-krenn.com/en/wiki/Linux_Page_Cache_Basics
   
   Basically - it is on a Kernel Level and the Kernel does not KNOW (unless it is explicitly told to) whether the file will be read after it is being written. So just in case it will cache it (because it is cost-free and might bring a lot of benefits - like 2s instead of 10s for reading a file that has been written recently). You might argue with it whether it is a good decision or not, but this is how it is (and I think it is a good design choice for performance).
   
   > Strange thing is that this memory increase is not seen with the web service
   
   With the webserver, you likely already reached the memory limits and the whole available memory is used for cache. This is pretty much that happens on any long-running system that creates or reads a lot of files. Maybe you can compare the limits you have there with reported memory used.
   
   > And finally, what you say is that it doesn't relate to the memory leak topic that's it ? or maybe the memory leak is a false flag for the freeze @itispankajsingh was concerned about
   
   I do not know that. I am just saying that IF you see `container_memory_cache` growing, this is pretty normal and expected and the `container_memory_working_set_bytes` is something that you should rather look at if you want to see if there is a memory leak.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-914263145


   Can you please dump a few pmap outputs at different times and share it in .tar.gz or smth @lixiaoyong12  ? Without grep so that we can see everything.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suhanovv commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911699848


   we have the same problem
   airflow 2.1.3 in k8s cluster with CeleryExecutor
   
   15 days of logs give us 
   ![image](https://user-images.githubusercontent.com/3032319/131854059-cd078fd8-7e1d-4da2-a0c5-c9c6ee0c65c4.png)
   ~ 6GB memory usage and ~1.7 after delete 12 days of logs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] lixiaoyong12 commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
lixiaoyong12 commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-914236434


   > > > We have deployed scheduler today and the memory is increased from 100 MB to 220 MB.
   > > 
   > > 
   > > @lixiaoyong12 - what kind of memory you are talking about ? Is it `container_memory_working_set_bytes` or `container_memory_cache` ?
   > > I deployed the scheduler directly on the Linux operating system.
   
   I use: ps auxww | grep airflow  at different times.  I found  the memory is increased from 100 MB to 220 MB.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] itispankajsingh commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
itispankajsingh commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-803809954


   This issue is also there in version 1.10.* however in version 2.0.* the issue is more severe and also we don't have option of run_duration hence have to deploy our own cron jobs to refresh scheduler regularly.
   https://issues.apache.org/jira/browse/AIRFLOW-4593


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-914197318


   @suhanovv -> this is the change you can try. https://github.com/apache/airflow/pull/18054 . While the ever growing cache is not a problem, possibly by implementing the advise to the kernel we can simply avoid this cache from growing in the first place.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suxin1995 commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suxin1995 commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-804629381






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911852026


   > I don't get why it would cache files it doesn't even need to look at (when I create dummy folder in the logs folder)
   
   Well you do create a cache at the moment you WRITE the file (and when it is flushed to disk). This is simply how linux Page Cache works. You can specifically prevent the file to be written to cache when you save files but this is low-level API and very few systems do it because it has almost no effect besides dropping the metrics. See the discussion under "Optimizing Page Cache" in : https://www.thomas-krenn.com/en/wiki/Linux_Page_Cache_Basics
   
   Basically - it is on a Kernel Level and the Kernel does not KNOW (unless it is explicitly told to) whether the file will be read after it is being written. So just in case it will cache it (because it is cost-free and might bring a lot of benefits - like 2s instead of 10s for reading a file that has been written recently). You might argue with it whether it is a good decision or not, but this is how it is (and I think it is a good design choice for performance).
   
   > Strange thing is that this memory increase is not seen with the web service
   
   With the webserver, you likely already reached the memory limits and the whole available memory is used for cache. This is pretty much that happens on any long-running system that creates or reads a lot of files. Maybe you can compare the limits you have there.
   
   > And finally, what you say is that it doesn't relate to the memory leak topic that's it ? or maybe the memory leak is a false flag for the freeze @itispankajsingh was concerned about
   
   I do not know that. I am just saying that IF you see `container_memory_cache` growing, this is pretty normal and expected and the `container_memory_working_set_bytes` is something that you should rather look at if you want to see if there is a memory leak.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suhanovv commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-912437521


   No, we don't use s3


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] CapBananoid commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
CapBananoid commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-910517428


   Hi,
   if it can help I have this kind of leak on 2.1.2 celeryexecutor on kubernetes
   scheduler increased per approximatively 400M every day
   and  I noticed a drop (from 2G to 500M) as soon as I deleted old logs in /opt/airflow/logs/scheduler (deleted all folders but the one pointed by latest)... I had 2G of logs and now back to 300M
   didn't need to restart the scheduler for that drop to occur
   don't know if it's really related or whatever but I think I will add a CronJob container on this volume to periodically do this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suhanovv commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911914984


   > > my screenshot shows the metric container_memory_working_set_bytes, it decreased after 10 seconds (scraping time of metrics) after rm execution, scheduler did not restart
   > 
   > Ah. so investigation begins again then. @Suhanov , Need some more answers :).
   > 
   > Just to clarify: So the 4GB memory drop (out of 6GB) was all from `container_memory_working_set_bytes` ?
   ![image](https://user-images.githubusercontent.com/3032319/131887878-22acf4cf-12f3-4b36-a734-ee4b6f345e27.png)
   yes
   
   > 
   > Do you know which processes they were ? Do you have any other processes running in the container besides airflow ? Was there a drop in a number of processes when you deleted the files?
   
   after deleting files, the number of processes has not changed
   > 
   > Maybe you can even look now and show the memory usage of the processes you have now in the container after some time of running the scheduler after deletion.
   
   ![image](https://user-images.githubusercontent.com/3032319/131891322-c0396bdd-e9fe-419b-a030-83a3d0c1e7fc.png)
   
   
   > 
   > I'd love to get to the bottom of it, because I find it really surprising to find that removal of files causes memory drop. I think - besides the kernel cache - which is low level, you'd really have some kind of service that is subscribed to those files via FS "notify" kind of system to be able to free any memory as result of deleting a file.
   > 
   > Normally, if you have a file opened in linux and the file gets deleted, nothing special happens, the file is not actually removed until the last process that keeps the file opened is closed. So I find it really surprising to see such behaviour (that's why it is so interesting - because it is counter-intuitive).
   > 
   > Additional question: What KIND of filesystem you have for the logs in scheduler ? Is it a usual "local filesystem" or is it some kind of distributed, user-space kind of filesystem?
   local fs
   
   > 
   > Because if it is the latter, then it could be the (for example user-space run) filesystem that keeps the log files data in memory and frees them after they are deleted.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911820400


   Actually one thing that it might be helping eve to keep the "cache" memory down (though it has barely no consequences). Do you happen to run any kind of automated log rotation ? We have a "clean-logs.sh" script in the official Image that can be run to clean the logs.  This will have a side-effect of freeing the Page Cache memory used by that files: https://github.com/apache/airflow/blob/main/scripts/in_container/prod/clean-logs.sh


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911776556


   Few questions.
   
   Do you know which processes/containers keep the memory? Is it scheduler (and which container)? Maybe you can see the breakdown per process as well ? I understand this is whole cluster memory, and I am trying to wrap my head around it and see where it can come from, because it is super weird behaviour to get back memory after deleting files (?). 
   
   Dp you simply run "rm *"  in the "/opt/airflow/logs/scheduler" and it drops immediately after? Or is there some delay involved? Do you do anything else than `rm` ? I understand you do not restart scheduler. Can you check if maybe the scheduler  restarted itself by coincidence (or triggered by the scheduling) ? 
   
   Maybe also you can see how many airflow related processes you have when scheduler runs? And maybe their number  grows and then drops when you delete the logs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911882319


   > my screenshot shows the metric container_memory_working_set_bytes, it decreased after 10 seconds (scraping time of metrics) after rm execution, scheduler did not restart
   
   Ah. so investigation begins again then. @suhanov , Need some more answers :). 
   
   Just to clarify: So the 4GB memory drop (out of 6GB) was all from `container_memory_working_set_bytes` ?  
   
   Do you know which processes they were ?  Do you have any other processes running in the container besides airflow ? Was there a drop in a number of processes when you deleted the files?
   
   Maybe you can even look now and show the memory usage of the processes you have now in the container after some time of running the scheduler after deletion. 
   
   I'd love to get to the bottom of it, because I find it really surprising to find that removal of files causes memory drop. I think - besides the kernel cache - which is low level, you'd really have some kind of service that is subscribed to those files via FS "notify" kind of system to be able to free any memory as result of deleting a file.
   
   Normally even if you have a file opened in linux and the file gets deleted, nothing special happens, the file is not actually removed until the last process that keeps the file opened is closed. So I find it really surprising to see such behaviour (that's why it is so interesting - because it is counter-intuitive).
   
   Additional question: What KIND of filesystem you have for the logs in scheduler ?  Is it a usual "local filesystem" or is it some kind of distributed, user-space kind of filesystem. Because if it is the latter, then it could be the (for example user-space run) filesystem that keeps the log  files data in  memory and frees them after they are deleted. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suhanovv commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-912542933


   @potiuk 
   I was just planning to test a similar fix on our installation, since by examining the code I could not find where the file is being closed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suhanovv commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-912430014


   @potiuk @ashb 
   Firstly, since yesterday's deletion of the logs, memory consumption has grown by about 500MB.
   
   Second: my research led me to the following:
   
   Since the processes during this time did not consume more memory, and there were no processes in /proc/*/fd that were holding descriptors to the old logs, I decided to see what the Container_memory_working_set_bytes metric is actually showing -
   "From the cAdvisor code, the working set memory is defined as: The amount of working set memory and it includes recently accessed memory, dirty memory, and kernel memory. Therefore, Working set is (lesser than or equal to) </ =" usage ".". After that, I conducted an experiment - with the removal of another day of logs with the output of vmstat before and after deletion. As it turned out, Container_memory_working_set_bytes - decreased by about 0.7-0.8 GB and in vmstat the system cache also decreased by 770 MB. 
   Before
   ![image](https://user-images.githubusercontent.com/3032319/131990110-ee9a0a91-dde7-4714-b3d5-db6430746425.png)
   After
   ![image](https://user-images.githubusercontent.com/3032319/131990200-fddac30d-c89b-4481-8dd8-61d28acc3471.png)
   
   Those. all these logs are stored in the cache, but it remains a mystery why they are in the cache for so long, our devops will also look from their side why this can happen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] lixiaoyong12 commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
lixiaoyong12 commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-914233718


   > So I guess the quest continues. Hmm. Interesting one that it wen't down indeed after some time. If that's the cache then this would be strange to have `container_memory_working_set_bytes` (I presume the graph above is this?).
   > 
   > I have another hypothesis. Linux Kernel also has "dentries" and "inode" caches - it keeps in memory the used/opened directory structure and file node information. And I believe those caches would also be cleared whenever the log files are deleted.
   > 
   > If this is a cache, you can very easily check it - you can force cleaning the cache and see the results:
   > 
   > Cleaning just PageCache:
   > 
   > ```
   > sync; echo 1 > /proc/sys/vm/drop_caches
   > ```
   > 
   > Cleaning dentries and inodes:
   > 
   > ```
   > sync; echo 2 > /proc/sys/vm/drop_caches
   > ```
   > 
   > Can you make such experiment please?
   
   sync; echo 1 > /proc/sys/vm/drop_caches    ->It's down 40m, and there's more than 200


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-914165032


   Ah cool. So at least we figured that one out. Then it should be no problem whatsoever. One thing we COULD do is we could potentially add this hint to kernel to not add the log files to the cache if this is a Page Cache. It's not a harm in general to get this cache growing, but adding the hint might actually save us (and our users!) from diagnosing and investigating issues like this ;) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911882319


   > my screenshot shows the metric container_memory_working_set_bytes, it decreased after 10 seconds (scraping time of metrics) after rm execution, scheduler did not restart
   
   Ah. so investigation begins again then. @suhanov , Need some more answers :). 
   
   Just to clarify: So the 4GB memory drop (out of 6GB) was all from `container_memory_working_set_bytes` ?  
   
   Do you know which processes they were ?  Do you have any other processes running in the container besides airflow ? Was there a drop in a number of processes when you deleted the files?
   
   Maybe you can even look now and show the memory usage of the processes you have now in the container after some time of running the scheduler after deletion. 
   
   I'd love to get to the bottom of it, because I find it really surprising to find that removal of files causes memory drop. I think - besides the kernel cache - which is low level, you'd really have some kind of service that is subscribed to those files via FS "notify" kind of system to be able to free any memory as result of deleting a file.
   
   Normally, if you have a file opened in linux and the file gets deleted, nothing special happens, the file is not actually removed until the last process that keeps the file opened is closed. So I find it really surprising to see such behaviour (that's why it is so interesting - because it is counter-intuitive).
   
   Additional question: What KIND of filesystem you have for the logs in scheduler ?  Is it a usual "local filesystem" or is it some kind of distributed, user-space kind of filesystem?
   
    Because if it is the latter, then it could be the (for example user-space run) filesystem that keeps the log  files data in  memory and frees them after they are deleted. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suxin1995 commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suxin1995 commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-805461007


   guys. i also have some other questions.
   In my project, use `BashOperator` task while get  exception occasionally- 'Bash command failed. The command returned a non-zero exit code.'  But it will succeed after retrying. 
    No value in log file.I donot konw what reason is. If you have experience please let me know, thank you very much.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] CapBananoid edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
CapBananoid edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911791784


   I did nothing but a rm and it dropped quite immediately (sorry the memory is brought back by prometheus andd you have delay but what I can tell you is that it dropped within 15s after I did the rm)
   I just did it on my dev instance in fact, same result
   The container is the scheduler, I run separate container for each service, this is the one with the command "airflow scheduler -n -1"
   I don't think the scheduler did a restart by itself, if it had been the case then kubernetes would have shown a failed container service and so would have restarted it and it's not the case
   I can tell you that the type of memory that is shown to grow is the one from the prometheus metric called container_memory_cache


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-914420596


   I updated the fix in #18054 (hopefully it will be ok now ) @suhanovv - in case you would like to try. I will wait for it to pass the tests but hopefully it will be ok now (mixed `os.open` with `open` 🀦 ) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] maison2710 commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
maison2710 commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-826877720


   I also got similar issue with Airflow 2.0.1 when using Kubernetes executor. Is there any update or timeline for this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-912067074


   When I was digging in to a similar issue I couldn't see the memory attributed to any particular process -- only the whole container.
   
   So that led me to believe the problem was not a traditional memory leak from the python code, but something OS related.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-912479114


   I looked a bit more. I have a hypothesis what could be the problem - I looked at the possibility that we have growing "dirty" memory (i.e. not flushed/synchronised to disk).
   
   My hypothesis is that (at least some of) the logs are not flushed and they remain in "dirty" (or rather "unsynchronized") state. That would be fairly strange as usually "dirty" memory is synchronized after at most few seconds. And I believe that the standard "RotatingFileHandler" we use to write processor manager lgos should properly close and flush the streams anyway. But maybe you have some optimisatios/settings on your OS to prolong/disable auto-flushing (would be strange though) and maybe there is some special configuration/handler of logs that you write?
   
   I found this nice PDF describing how PageCache actually works (fascinating read) http://sylab-srv.cs.fiu.edu/lib/exe/fetch.php?media=paperclub:lkd3ch16.pdf - Linux uses "write-back" cache strategy where it first writes data to cache and then it is flushed to disk (and remains in cache as non-dirty). The os will mark the files as dirty (and thus you can see them in  `Container_memory_working_set_bytes'). If for whatever reason those files would remain as "dirty" they will still be counted as "working_set" memory.  And that would also explain why the memory is freed after deleting the files - when the file gets deleted. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] lixiaoyong12 commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
lixiaoyong12 commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-914137171


   We have deployed scheduler  today  and the memory is increased from 100 MB to 220 MB.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] CapBananoid edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
CapBananoid edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911803057


   fun fact
   if I create new log folders (doing "cp -R 2021-09-01 XXX" for example) then memory rises 30s afterwards


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911820400


   Actually one thing that it might be caused by. Do you happen to run any kind of automated log rotation ? We have a "clean-logs.sh" script in the official Image that can be run to clean the logs.  This will have a side-effect of freeing the Page Cache memory used by that files: https://github.com/apache/airflow/blob/main/scripts/in_container/prod/clean-logs.sh


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suxin1995 commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suxin1995 commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-804535936


   not completely.I use `CeleryExecutor`  in the project also have this problem
   After schedule, VSZ and RSS  keep growing  always.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-914144343


   > We have deployed scheduler today and the memory is increased from 100 MB to 220 MB.
   
   @lixiaoyong12 - what kind of memory you are talking about ? Is it `container_memory_working_set_bytes` or `container_memory_cache` ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suxin1995 removed a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suxin1995 removed a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-804629381


   yep.In my production environment, when using a small number of jobs, no problems are found temporarily.
   but as the business is connected, the increase in the amount of work will cause the schedule process  die.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] raphaelauv commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
raphaelauv commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-803901855


   Until there is a fix or you find a specific reason, you could handle the OOM -> https://github.com/kubernetes/kubernetes/issues/40157
   
   There is few initiatives :+1: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suhanovv commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-912452313


   I also came up with the idea to look at the stat by files, but unfortunately after deleting the logs, I will do it the next time I delete it, next week. But at the moment, all logs for September 2 have similar dates for accessing files:
   
   Access: 2021-09-02 00:00:11.232221400 +0000
   Modify: 2021-09-02 23:59:41.722766166 +0000
   Change: 2021-09-02 23:59:41.722766166 +0000


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suhanovv commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-915539770


   @potiuk 
   the last fix works as it should
   
   ![image](https://user-images.githubusercontent.com/3032319/132579146-44bb8d7c-75c3-412b-9940-96b7920e95b0.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-914263145


   Can you please dump a few pmap outputs at different times and share it in .tar.gz or smth @lixiaoyong12  ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-912003732


   @suhanovv  - Is the `container_memory_working_set_bytes` growing still continuously now ? Can you observe it for a while and when it grows, compare the memory used by processes and see which one is taking the GB of memory? I think knowing that would speed up any kind of hypothesis/investigation waaaay easier.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #14924:
URL: https://github.com/apache/airflow/issues/14924


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911852026


   > I don't get why it would cache files it doesn't even need to look at (when I create dummy folder in the logs folder)
   
   Well you do create a cache at the moment you WRITE the file (and when it is flushed to disk). This is simply how linux Page Cache works. You can specifically prevent the file to be written to cache when you save files but this is low-level API and very few systems do it because it has almost no effect besides dropping the metrics. See the discussion under "Optimizing Page Cache" in : https://www.thomas-krenn.com/en/wiki/Linux_Page_Cache_Basics
   
   Basically - it is on a Kernel Level and the Kernel does not KNOW (unless it is explicitly told to) whether the file will be read after it is being written. So just in case it will cache it (because it is cost-free and might bring a lot of benefits - like 2s instead of 10s for reading a file that has been written recently). You might argue with it whether it is a good decision or not, but this is how it is.
   
   > Strange thing is that this memory increase is not seen with the web service
   
   With the webserver, you likely already reached the memory limits and the whole available memory is used for cache. This is pretty much that happens on any long-running system that creates or reads a lot of files. Maybe you can compare the limits you have there.
   
   > And finally, what you say is that it doesn't relate to the memory leak topic that's it ? or maybe the memory leak is a false flag for the freeze @itispankajsingh was concerned about
   
   I do not know that. I am just saying that IF you see `container_memory_cache` growing, this is pretty normal and expected and the `container_memory_working_set_bytes` is something that you should rather look at if you want to see if there is a memory leak.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-915543514


   Thanks a lot ! That might really help with user confusion!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] CapBananoid commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
CapBananoid commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911803057


   fun fact
   if I create new log folders (doing "cp -R 2021-09-01 XXX" for example) then... memory rises 30s afterwards


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] CapBananoid edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
CapBananoid edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911827420


   I can set that launch with a Cron Job easily yes but even if I understand the cache thing, I don't get why it would cache files it doesn't even need to look at (when I create dummy folder in the logs folder)
   I will also try implementing limits in the scheduler container to see if it does the trick
   Strange thing is that this memory increase is not seen with the web service
   And finally, what you say is that it doesn't relate to the memory leak topic that's it ? or maybe the memory leak is a false flag for the freeze @itispankajsingh was concerned about


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #14924: Scheduler Memory Leak in Airflow 2.0

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-803605254


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suhanovv commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-915134523


   @potiuk 
   Ok, we will deploy to the test stand today


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] brunoffaustino commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
brunoffaustino commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-890965700


   We are also facing this issue right now. Any news?
   cc: @Cabeda @pmamarques


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] brunoffaustino edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
brunoffaustino edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-890965700


   We are also facing this issue right now. Any news?
   We use currently CeleryExecutor as @suxin1995
   cc: @Cabeda @pmamarques


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suhanovv commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-912028969


   @potiuk 
   
   At the moment, the container_memory_working_set_bytes has grown by about 150-200 mb, I will observe and try to further investigate the reason for this behavior. I will write the results in a few days


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr edited a comment on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
uranusjr edited a comment on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-805597470


   Please keep this thread on topic with the scheduler memory issue. For usage questions, please open threads in [Discussions](https://github.com/apache/airflow/discussions) instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-804482470


   It seems like this is specific to the Kubernetes executor? It’d be awesome if you can confirm.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-912443543


   (BTW - is the first screenshot wrong @suhanovv ? It indicates "growing" memory rather than "shrinking" after delete).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] suhanovv commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
suhanovv commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-914124582


   @potiuk 
   fix does not work
   
   ![image](https://user-images.githubusercontent.com/3032319/132313990-b52fcea7-15a2-4f99-98df-51d1b999d877.png)
   
   while we have such a graph
   
   ![image](https://user-images.githubusercontent.com/3032319/132316096-39ded18c-0ff8-495d-8bf3-c9a54fdbb15f.png)
   
    for the old container without this patch and we cannot yet say why the memory consumption was falling - either it was container restarts or the os cleared the cache itself, we want to wait for such a situation to check if filling is possible memory cache is not a problem
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] lixiaoyong12 commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
lixiaoyong12 commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-914232052


   > > We have deployed scheduler today and the memory is increased from 100 MB to 220 MB.
   > 
   > @lixiaoyong12 - what kind of memory you are talking about ? Is it `container_memory_working_set_bytes` or `container_memory_cache` ?
   I deployed the scheduler directly on the Linux operating system.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-915543014


   πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰ 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911812067


   Ah right. The last line you wrote is GOLD.
   
   That probably would explain it and it's NOT AN ISSUE.
   
   When you open many files Linux basically will use as much memory it can for file caches. Whenever you read or write a file, the blocks of disk are kept also in memory just in case the files needs to be accessed by any process. It also marks them dirty in case the blocks change and evicts such dirty blocks from memory. Also when some process needs  more memory than it has available, it will evict some unused pages from memory to free them. Basically for any system, that writes files to logs continuously and the logs are not modified later, the cache memory will grow CONTINUOUSLY until the limit set  by kernel configuration.
   
   So depending on what your Kernel configuration is (basically the Kernel of your Kubernetes Virtual machines under the hood), you will see the metrics growing continuously (up to the kernel limit). You can limit the memory available to your Scheduler container  to limit it "per container" (via giving it less memory resources) but basically as much memory you give to the scheduler container, it will be used for cache after some time (and will not be explicitly freed -  but it's not a problem because the memory is effectively "free" - it's just used for cache and it can be freed immediately when needed).
   
   That would PERFECTLY explain why the memory drops immediately after the files are deleted - those files are deleted so the cache for those files should also get deleted by the system immediately. 
   
   Instead of looking at total memory used you should look at the **container_memory_working_set_bytes** - metrics. It reflects the actually "actively used" memory.  You can read more here: https://blog.freshtracks.io/a-deep-dive-into-kubernetes-metrics-part-3-container-resource-metrics-361c5ee46e66
   
   You can also test it by running (from https://linuxhint.com/clear_cache_linux/):
   
   `echo 1 > /proc/sys/vm/drop_caches`
   
   In the container. This should drop your caches immediately without deleting the files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] CapBananoid commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
CapBananoid commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-911791784


   I did nothing but a rm and it dropped quite immediately (sorry the memory is brought back by prometheus andd you have delay but what I can tell you is that it dropped within 15s after I did the rm)
   I just did it on my dev instance in fact, same result
   The container is the scheduler, I run separate container for each service, this is the one with the command "airflow scheduler -n -1"
   I don't think the scheduler can restart by itself, if it had been the case then kubernetes would have shown a restart and it's not the case
   I can tell you that the type of memory that is shown to grow is the one from the prometheus metric called container_memory_cache


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] itispankajsingh commented on issue #14924: Scheduler Memory Leak in Airflow 2.0.1

Posted by GitBox <gi...@apache.org>.
itispankajsingh commented on issue #14924:
URL: https://github.com/apache/airflow/issues/14924#issuecomment-804624080


   One observation I have is that the rate of memory leak increases with number of dags (irrespective of whether they are being run). It definitely has something to do with the dag parsing process.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org