You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/10/23 22:56:56 UTC

[GitHub] [airflow] potiuk opened a new pull request, #27223: Make RotatingFilehandler used in DagProcessor non-caching

potiuk opened a new pull request, #27223:
URL: https://github.com/apache/airflow/pull/27223

   The RotatingFileHandler is used when you enable it via `CONFIG_PROCESSOR_MANAGER_LOGGER=True` and it exhibits similar behaviour as the FileHandler had when it comes to caching the file on the Kernel level. While it is harmless (the cache will be freed when needed), it is also misleading for those who are trying to understand memory usage by Airlfow.
   
   The fix is to add a custom non-caching RotatingFileHandler similarly as in #18054.
   
   Note that it will require to manually modify local settings if the settings were created before this change.
   
   Fixes: #27065
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of an existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #27223: Make RotatingFilehandler used in DagProcessor non-caching

Posted by GitBox <gi...@apache.org>.
potiuk merged PR #27223:
URL: https://github.com/apache/airflow/pull/27223


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #27223: Make RotatingFilehandler used in DagProcessor non-caching

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #27223:
URL: https://github.com/apache/airflow/pull/27223#discussion_r1003667982


##########
airflow/utils/log/non_caching_file_handler.py:
##########
@@ -16,11 +16,39 @@
 # under the License.
 from __future__ import annotations
 
-import logging
 import os
+from logging import FileHandler
+from logging.handlers import RotatingFileHandler
 
 
-class NonCachingFileHandler(logging.FileHandler):
+class NonCachingFileHandler(FileHandler):
+    """
+    This is an extension of the python FileHandler that advises the Kernel to not cache the file
+    in PageCache when it is written. While there is nothing wrong with such cache (it will be cleaned
+    when memory is needed), it causes ever-growing memory usage when scheduler is running as it keeps
+    on writing new log files and the files are not rotated later on. This might lead to confusion
+    for our users, who are monitoring memory usage of Scheduler - without realising that it is
+    harmless and expected in this case.
+
+    See https://github.com/apache/airflow/issues/14924
+
+    Adding the advice to Kernel might help with not generating the cache memory growth in the first place.
+    """
+
+    def _open(self):
+        wrapper = super()._open()
+        try:
+            fd = wrapper.fileno()
+            os.posix_fadvise(fd, 0, 0, os.POSIX_FADV_DONTNEED)
+        except Exception:
+            # in case either file descriptor cannot be retrieved or fadvise is not available
+            # we should simply return the wrapper retrieved by FileHandler's open method
+            # the advise to the kernel is just an advise and if we cannot give it, we won't
+            pass
+        return wrapper
+
+
+class NonCachingRotatingFileHandler(RotatingFileHandler):

Review Comment:
   Addressed. I also extracted the "non-caching" open to a separate function to avoid duplication.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #27223: Make RotatingFilehandler used in DagProcessor non-caching

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #27223:
URL: https://github.com/apache/airflow/pull/27223#discussion_r1002915227


##########
airflow/utils/log/non_caching_file_handler.py:
##########
@@ -16,11 +16,39 @@
 # under the License.
 from __future__ import annotations
 
-import logging
 import os
+from logging import FileHandler
+from logging.handlers import RotatingFileHandler
 
 
-class NonCachingFileHandler(logging.FileHandler):
+class NonCachingFileHandler(FileHandler):
+    """
+    This is an extension of the python FileHandler that advises the Kernel to not cache the file
+    in PageCache when it is written. While there is nothing wrong with such cache (it will be cleaned
+    when memory is needed), it causes ever-growing memory usage when scheduler is running as it keeps
+    on writing new log files and the files are not rotated later on. This might lead to confusion
+    for our users, who are monitoring memory usage of Scheduler - without realising that it is
+    harmless and expected in this case.
+
+    See https://github.com/apache/airflow/issues/14924
+
+    Adding the advice to Kernel might help with not generating the cache memory growth in the first place.
+    """
+
+    def _open(self):
+        wrapper = super()._open()
+        try:
+            fd = wrapper.fileno()
+            os.posix_fadvise(fd, 0, 0, os.POSIX_FADV_DONTNEED)
+        except Exception:
+            # in case either file descriptor cannot be retrieved or fadvise is not available
+            # we should simply return the wrapper retrieved by FileHandler's open method
+            # the advise to the kernel is just an advise and if we cannot give it, we won't
+            pass
+        return wrapper
+
+
+class NonCachingRotatingFileHandler(RotatingFileHandler):

Review Comment:
   DOcstring of this does not match



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org