You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/21 08:47:52 UTC

[GitHub] [airflow] millin opened a new pull request #15469: Fix incorrect order of messages in logs when using Elasticsearch

millin opened a new pull request #15469:
URL: https://github.com/apache/airflow/pull/15469


   Сurrent implementation of sorting is not correct, because it unreasonable sorts `grouped_logs` by first message text:
   ```python
   result = sorted(grouped_logs.items(), key=lambda kv: getattr(kv[1][0], 'message', '_'))
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15469: Fix incorrect order of messages in logs when using Elasticsearch

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15469:
URL: https://github.com/apache/airflow/pull/15469#issuecomment-824001388


   [The Workflow run](https://github.com/apache/airflow/actions/runs/770538212) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15469: Fix incorrect order of messages in logs when using Elasticsearch

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15469:
URL: https://github.com/apache/airflow/pull/15469#issuecomment-887109406


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] millin commented on a change in pull request #15469: Fix incorrect order of messages in logs when using Elasticsearch

Posted by GitBox <gi...@apache.org>.
millin commented on a change in pull request #15469:
URL: https://github.com/apache/airflow/pull/15469#discussion_r618497591



##########
File path: airflow/providers/elasticsearch/log/es_task_handler.py
##########
@@ -129,10 +129,14 @@ def _group_logs_by_host(logs):
             key = getattr(log, 'host', 'default_host')
             grouped_logs[key].append(log)
 
-        # return items sorted by timestamp.
-        result = sorted(grouped_logs.items(), key=lambda kv: getattr(kv[1][0], 'message', '_'))
+        # return items sorted by asctime.
+        def sorter(log):
+            return getattr(log, 'asctime', '_')
 
-        return result
+        for host_logs in grouped_logs.values():
+            host_logs.sort(key=sorter)

Review comment:
       I'm not very familiar with ElasticSearch, I'll try to research




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #15469: Fix incorrect order of messages in logs when using Elasticsearch

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #15469:
URL: https://github.com/apache/airflow/pull/15469#discussion_r618891133



##########
File path: tests/providers/elasticsearch/log/test_es_task_handler.py
##########
@@ -151,6 +156,23 @@ def test_read_with_none_metadata(self):
         assert '1' == metadatas[0]['offset']
         assert timezone.parse(metadatas[0]['last_log_timestamp']) < pendulum.now()
 
+    def test_read_asctime_order(self):
+        another_test_message = 'another message'
+        outdated_body = {
+            'asctime': '2020-12-24 19:25:00,862',
+            'message': another_test_message,
+            'log_id': self.LOG_ID,
+            'offset': 1,
+        }
+        self.es.index(index=self.index_name, doc_type=self.doc_type, body=outdated_body, id=1)
+        logs, metadatas = self.es_task_handler.read(self.ti, 1)
+        assert 1 == len(logs)

Review comment:
       I think it’d be useful to include multiple logs here though, to make sure we’re sorting the right thing.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] closed pull request #15469: Fix incorrect order of messages in logs when using Elasticsearch

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #15469:
URL: https://github.com/apache/airflow/pull/15469


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] millin commented on pull request #15469: Fix incorrect order of messages in logs when using Elasticsearch

Posted by GitBox <gi...@apache.org>.
millin commented on pull request #15469:
URL: https://github.com/apache/airflow/pull/15469#issuecomment-824837346


   @uranusjr Review pls


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #15469: Fix incorrect order of messages in logs when using Elasticsearch

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #15469:
URL: https://github.com/apache/airflow/pull/15469#discussion_r617450392



##########
File path: airflow/providers/elasticsearch/log/es_task_handler.py
##########
@@ -129,10 +129,11 @@ def _group_logs_by_host(logs):
             key = getattr(log, 'host', 'default_host')
             grouped_logs[key].append(log)
 
-        # return items sorted by timestamp.
-        result = sorted(grouped_logs.items(), key=lambda kv: getattr(kv[1][0], 'message', '_'))
+        # return items sorted by asctime.
+        for host in grouped_logs:
+            grouped_logs[host].sort(key=lambda log: getattr(log, 'asctime', '_'))

Review comment:
       ```suggestion
           def sorter(log):
               return getattr(log, 'asctime', '_')
   
           for logs in grouped_logs.values():
               logs.sort(key=sorter)
   ```
   
   Repeated lambda definition inside a loop is slow.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #15469: Fix incorrect order of messages in logs when using Elasticsearch

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #15469:
URL: https://github.com/apache/airflow/pull/15469#discussion_r618477359



##########
File path: airflow/providers/elasticsearch/log/es_task_handler.py
##########
@@ -129,10 +129,14 @@ def _group_logs_by_host(logs):
             key = getattr(log, 'host', 'default_host')
             grouped_logs[key].append(log)
 
-        # return items sorted by timestamp.
-        result = sorted(grouped_logs.items(), key=lambda kv: getattr(kv[1][0], 'message', '_'))
+        # return items sorted by asctime.
+        def sorter(log):
+            return getattr(log, 'asctime', '_')
 
-        return result
+        for host_logs in grouped_logs.values():
+            host_logs.sort(key=sorter)

Review comment:
       Is it not possible to make ElasticSearch sort things for us?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] millin commented on a change in pull request #15469: Fix incorrect order of messages in logs when using Elasticsearch

Posted by GitBox <gi...@apache.org>.
millin commented on a change in pull request #15469:
URL: https://github.com/apache/airflow/pull/15469#discussion_r618497001



##########
File path: tests/providers/elasticsearch/log/test_es_task_handler.py
##########
@@ -151,6 +156,23 @@ def test_read_with_none_metadata(self):
         assert '1' == metadatas[0]['offset']
         assert timezone.parse(metadatas[0]['last_log_timestamp']) < pendulum.now()
 
+    def test_read_asctime_order(self):
+        another_test_message = 'another message'
+        outdated_body = {
+            'asctime': '2020-12-24 19:25:00,862',
+            'message': another_test_message,
+            'log_id': self.LOG_ID,
+            'offset': 1,
+        }
+        self.es.index(index=self.index_name, doc_type=self.doc_type, body=outdated_body, id=1)
+        logs, metadatas = self.es_task_handler.read(self.ti, 1)
+        assert 1 == len(logs)

Review comment:
       Single log entry contains several lines connected by line break.
   But before this fix they may come mixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #15469: Fix incorrect order of messages in logs when using Elasticsearch

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #15469:
URL: https://github.com/apache/airflow/pull/15469#discussion_r618476894



##########
File path: tests/providers/elasticsearch/log/test_es_task_handler.py
##########
@@ -151,6 +156,23 @@ def test_read_with_none_metadata(self):
         assert '1' == metadatas[0]['offset']
         assert timezone.parse(metadatas[0]['last_log_timestamp']) < pendulum.now()
 
+    def test_read_asctime_order(self):
+        another_test_message = 'another message'
+        outdated_body = {
+            'asctime': '2020-12-24 19:25:00,862',
+            'message': another_test_message,
+            'log_id': self.LOG_ID,
+            'offset': 1,
+        }
+        self.es.index(index=self.index_name, doc_type=self.doc_type, body=outdated_body, id=1)
+        logs, metadatas = self.es_task_handler.read(self.ti, 1)
+        assert 1 == len(logs)

Review comment:
       If there is only a single log entry comming back, we aren't testing the sorting continues to work.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org