You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/12/24 23:49:06 UTC

[GitHub] [airflow] kaxil opened a new pull request #13310: Respect LogFormat when using ES logging with Json Format

kaxil opened a new pull request #13310:
URL: https://github.com/apache/airflow/pull/13310


   This was a log standing bug / behaviour where Timestamps, log level,
   line number etc were not shown when using ElasticSearch Task Handler
   (Elasticsearch as remote logging) with json_format=True.
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13310: Respect LogFormat when using ES logging with Json Format

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13310:
URL: https://github.com/apache/airflow/pull/13310#discussion_r548774234



##########
File path: airflow/providers/elasticsearch/log/es_task_handler.py
##########
@@ -194,12 +194,26 @@ def _read(
         # to prevent it from showing in the UI.
         def concat_logs(lines):
             log_range = (len(lines) - 1) if lines[-1].message == self.end_of_log_mark.strip() else len(lines)
-            return '\n'.join([lines[i].message for i in range(log_range)])
+            return '\n'.join([self._format_msg(lines[i]) for i in range(log_range)])
 
         message = [(host, concat_logs(hosted_log)) for host, hosted_log in logs_by_host]
 
         return message, metadata
 
+    def _format_msg(self, log_line):
+        """Format ES Record to match settings.LOG_FORMAT when used with json_format"""
+        # Using formatter._style.format makes it future proof i.e.
+        # if we change the formatter style from '%' to '{' or '$', this will still work
+        if self.json_format:
+            try:
+                # pylint: disable=protected-access
+                return self.formatter._style.format(_ESJsonLogFmt(**log_line.to_dict()))
+            except Exception:  # noqa pylint: disable=broad-except

Review comment:
       For now, I can only foresee `KeyError` here i.e. if log_format has a key that is not present in ES logs:
   https://github.com/apache/airflow/blob/69d6d0239f470ac75e23160bac63408350c1835a/airflow/config_templates/default_airflow.cfg#L289




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13310: Respect LogFormat when using ES logging with Json Format

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13310:
URL: https://github.com/apache/airflow/pull/13310#issuecomment-751241028


   The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest master or amend the last commit of the PR, and push it with --force-with-lease.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13310: Respect LogFormat when using ES logging with Json Format

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13310:
URL: https://github.com/apache/airflow/pull/13310#discussion_r548864459



##########
File path: tests/providers/elasticsearch/log/test_es_task_handler.py
##########
@@ -251,6 +251,31 @@ def test_set_context_w_json_format_and_write_stdout(self):
         self.es_task_handler.json_format = True
         self.es_task_handler.set_context(self.ti)
 
+    def test_read_with_json_format(self):
+        ts = pendulum.now()
+        formatter = logging.Formatter('[%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s')
+        self.es_task_handler.formatter = formatter
+        self.es_task_handler.json_format = True

Review comment:
       I don't think that is needed since we already set it back to False in `setUp` before any tests using `self.es_task_handler` is run : 
   
   https://github.com/apache/airflow/blob/69d6d0239f470ac75e23160bac63408350c1835a/tests/providers/elasticsearch/log/test_es_task_handler.py#L48-L55




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil merged pull request #13310: Respect LogFormat when using ES logging with Json Format

Posted by GitBox <gi...@apache.org>.
kaxil merged pull request #13310:
URL: https://github.com/apache/airflow/pull/13310


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #13310: Respect LogFormat when using ES logging with Json Format

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #13310:
URL: https://github.com/apache/airflow/pull/13310#discussion_r548771172



##########
File path: airflow/providers/elasticsearch/log/es_task_handler.py
##########
@@ -194,12 +194,26 @@ def _read(
         # to prevent it from showing in the UI.
         def concat_logs(lines):
             log_range = (len(lines) - 1) if lines[-1].message == self.end_of_log_mark.strip() else len(lines)
-            return '\n'.join([lines[i].message for i in range(log_range)])
+            return '\n'.join([self._format_msg(lines[i]) for i in range(log_range)])
 
         message = [(host, concat_logs(hosted_log)) for host, hosted_log in logs_by_host]
 
         return message, metadata
 
+    def _format_msg(self, log_line):
+        """Format ES Record to match settings.LOG_FORMAT when used with json_format"""
+        # Using formatter._style.format makes it future proof i.e.
+        # if we change the formatter style from '%' to '{' or '$', this will still work
+        if self.json_format:
+            try:
+                # pylint: disable=protected-access
+                return self.formatter._style.format(_ESJsonLogFmt(**log_line.to_dict()))
+            except Exception:  # noqa pylint: disable=broad-except

Review comment:
       What is the reason for exception?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13310: Respect LogFormat when using ES logging with Json Format

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13310:
URL: https://github.com/apache/airflow/pull/13310#discussion_r548774348



##########
File path: airflow/providers/elasticsearch/log/es_task_handler.py
##########
@@ -194,12 +194,26 @@ def _read(
         # to prevent it from showing in the UI.
         def concat_logs(lines):
             log_range = (len(lines) - 1) if lines[-1].message == self.end_of_log_mark.strip() else len(lines)
-            return '\n'.join([lines[i].message for i in range(log_range)])
+            return '\n'.join([self._format_msg(lines[i]) for i in range(log_range)])
 
         message = [(host, concat_logs(hosted_log)) for host, hosted_log in logs_by_host]
 
         return message, metadata
 
+    def _format_msg(self, log_line):
+        """Format ES Record to match settings.LOG_FORMAT when used with json_format"""
+        # Using formatter._style.format makes it future proof i.e.
+        # if we change the formatter style from '%' to '{' or '$', this will still work
+        if self.json_format:
+            try:
+                # pylint: disable=protected-access
+                return self.formatter._style.format(_ESJsonLogFmt(**log_line.to_dict()))
+            except Exception:  # noqa pylint: disable=broad-except

Review comment:
       But kept a broad-except to avoid any unforeseen exception but it is just a safeguard and keeps the same behavior as now i.e. return `log_line.message`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] XD-DENG commented on a change in pull request #13310: Respect LogFormat when using ES logging with Json Format

Posted by GitBox <gi...@apache.org>.
XD-DENG commented on a change in pull request #13310:
URL: https://github.com/apache/airflow/pull/13310#discussion_r548864585



##########
File path: tests/providers/elasticsearch/log/test_es_task_handler.py
##########
@@ -251,6 +251,31 @@ def test_set_context_w_json_format_and_write_stdout(self):
         self.es_task_handler.json_format = True
         self.es_task_handler.set_context(self.ti)
 
+    def test_read_with_json_format(self):
+        ts = pendulum.now()
+        formatter = logging.Formatter('[%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s')
+        self.es_task_handler.formatter = formatter
+        self.es_task_handler.json_format = True

Review comment:
       Yep, you are right. I asked a silly question




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] XD-DENG commented on a change in pull request #13310: Respect LogFormat when using ES logging with Json Format

Posted by GitBox <gi...@apache.org>.
XD-DENG commented on a change in pull request #13310:
URL: https://github.com/apache/airflow/pull/13310#discussion_r548849514



##########
File path: tests/providers/elasticsearch/log/test_es_task_handler.py
##########
@@ -251,6 +251,31 @@ def test_set_context_w_json_format_and_write_stdout(self):
         self.es_task_handler.json_format = True
         self.es_task_handler.set_context(self.ti)
 
+    def test_read_with_json_format(self):
+        ts = pendulum.now()
+        formatter = logging.Formatter('[%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s')
+        self.es_task_handler.formatter = formatter
+        self.es_task_handler.json_format = True

Review comment:
       maybe invalid question: `self.es_task_handler` is also used in other test cases, and default/initial value of `json_format` is `False` here. So should it be changed back to `False` in the end (or in `tearDown`) to avoid potential effect on other test cases (especially the order of test case execution is not guaranteed if I'm not wrong)?
   
   the same quesiton applys to `test_set_context_w_json_format_and_write_stdout ` above.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org