You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "uranusjr (via GitHub)" <gi...@apache.org> on 2023/06/08 06:42:03 UTC

[GitHub] [airflow] uranusjr commented on a diff in pull request #31786: rewrite method used in ecs to fetch less logs

uranusjr commented on code in PR #31786:
URL: https://github.com/apache/airflow/pull/31786#discussion_r1222520479


##########
airflow/providers/amazon/aws/hooks/ecs.py:
##########
@@ -168,7 +168,7 @@ def __init__(
         self.log_group = log_group
         self.log_stream_name = log_stream_name
 
-        self.hook = AwsLogsHook(aws_conn_id=aws_conn_id, region_name=region_name)
+        self.logs_hook = AwsLogsHook(aws_conn_id=aws_conn_id, region_name=region_name)

Review Comment:
   Is this a readability change? Just want to be sure I’m not missing anything.



##########
airflow/providers/amazon/aws/hooks/logs.py:
##########
@@ -103,7 +104,12 @@ def get_log_events(
                 skip -= event_count
                 events = []
 
-            yield from events
+            if not start_from_head:
+                # if we are not reading from head, it doesn't make sense to return events in "normal" order
+                # while hiding the subsequent calls, bc 1-9 queried by batches of 3 would return 789 456 123
+                yield from reversed(events)

Review Comment:
   > the method was never used with start_from_head=False so far
   
   You don’t know that, since the hook is public API and can be used by any users 🙂 
   
   But I agree this is better behaviour and it’s worth bumping the major version in any case.



##########
airflow/providers/amazon/aws/hooks/ecs.py:
##########
@@ -198,7 +197,12 @@ def _event_to_str(self, event: dict) -> str:
         return f"[{formatted_event_dt}] {message}"
 
     def get_last_log_messages(self, number_messages) -> list:
-        return [log["message"] for log in deque(self._get_log_events(), maxlen=number_messages)]
+        last_logs_iterator = self.logs_hook.get_log_events(
+            self.log_group, self.log_stream_name, start_from_head=False
+        )
+        truncated = list(itertools.islice(last_logs_iterator, number_messages))
+        # need to reverse the order to put the logs back in order after getting them from the end
+        return [log["message"] for log in reversed(truncated)]

Review Comment:
   ```suggestion
           truncated = itertools.islice(last_logs_iterator, number_messages)
           messages = [log["message"] for log in truncated]
           # need to reverse the order to put the logs back in order after getting them from the end
           return messages.reverse()
   ```
   
   I wonder if this would be slightly more efficient (one less list allocation). Probably marginal either way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org