You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "bbovenzi (via GitHub)" <gi...@apache.org> on 2023/02/06 15:28:17 UTC

[GitHub] [airflow] bbovenzi opened a new pull request, #29390: Fix grid logs for large logs

bbovenzi opened a new pull request, #29390:
URL: https://github.com/apache/airflow/pull/29390

   If a task instance an very large log file, the UI would have difficulty rendering it. This PR fixes this on the UI side but truncating lines in the logs to less than 1 million characters and including a warning to the user.
   
   <img width="875" alt="Screenshot 2023-02-06 at 10 00 04 AM" src="https://user-images.githubusercontent.com/4600967/217012926-47b71c4b-3a0b-48b4-8aa7-81dfa7832149.png">
   
   
   Example DAG:
   ```import re
   import time
   import pendulum
   
   from airflow import DAG
   from airflow.decorators import task
   
   with DAG(
       dag_id="bad_dag",
       start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
       schedule=None,
   ):
   
       @task()
       def one():
           print("Hello DAG is starting")
           file_contents = """
           2022-08-29T07:19:55.933Z [LogLevel = Information] XXXXXX.SetGameState.SetGameStateFunction: Entering handler  
           [AwsRequestId = c4e55d4c-718e-41bc-857e-15cbdf07751b, Region = us-east-1, AmazonTraceId = 
           Root=1-630c689b-713bc72374fa8913774ea82e;Parent=00dea2e84ddcf28d;Sampled=0, ExecutionEnv = 
           AWS_Lambda_dotnetcore3.1, FunctionName = int-set-game-state, FunctionVersion = $LATEST, MemoryLimitInMB = 1024]
           2022-08-29T07:19:56.119Z [LogLevel = Information] XXXXXX.Serverless.Shared.Repositories.Dynamo.LowLevelDynamoClient: 
           Sending request "(attribute_not_exists(#N0)) OR (#N1 = :v0)" [{"#N0":"PK","#N1":"ActiveDeviceId"}] 
           [{":v0":{"BOOL":false,"IsBOOLSet":false,"BS":[],"L":[],"IsLSet":false,"M":{},"IsMSet":false,"NS":[],"NULL":false,
           "S":"cde4646da573124b61d843d2afe48b74","SS":[]}}]  
           [AwsRequestId = c4e55d4c-718e-41bc-857e-15cbdf07751b, Region = us-east-1, AmazonTraceId = 
           Root=1-630c689b-713bc72374fa8913774ea82e;Parent=00dea2e84ddcf28d;Sampled=0, ExecutionEnv = 
           AWS_Lambda_dotnetcore3.1, FunctionName = int-set-game-state, FunctionVersion = $LATEST, MemoryLimitInMB = 1024]
           2022-08-29T07:19:56.200Z [LogLevel = Information] XXXXXX.SetGameState.SetGameStateFunction: Operation complete  
           [AwsRequestId = c4e55d4c-718e-41bc-857e-15cbdf07751b, Region = us-east-1, AmazonTraceId = 
           Root=1-630c689b-713bc72374fa8913774ea82e;Parent=00dea2e84ddcf28d;Sampled=0, ExecutionEnv = 
           AWS_Lambda_dotnetcore3.1, FunctionName = int-set-game-state, FunctionVersion = $LATEST, MemoryLimitInMB = 1024]
           2022-08-29T07:19:56.202Z END RequestId: c4e55d4c-718e-41bc-857e-15cbdf07751b
           """ * 5000
           re_pattern = r"(\d{4}-\d{2}-\d{2}.*?)\s+?\n\s+?(\[.*?\])"
           all_records = re.findall(re_pattern, file_contents, flags=re.S)
           print(f"All records {all_records}")
           print(all_records)
           print("All Done!")
           return
   
       @task()
       def two():
           time.sleep(5)
           print("two")
   
       one() >> two()
   ```
   
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] bbovenzi merged pull request #29390: Fix grid logs for large logs

Posted by "bbovenzi (via GitHub)" <gi...@apache.org>.
bbovenzi merged PR #29390:
URL: https://github.com/apache/airflow/pull/29390


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pierrejeambrun commented on pull request #29390: Fix grid logs for large logs

Posted by "pierrejeambrun (via GitHub)" <gi...@apache.org>.
pierrejeambrun commented on PR #29390:
URL: https://github.com/apache/airflow/pull/29390#issuecomment-1419565493

   Yep I agree, we can remove it for now, and add it back when we actually have support for it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on pull request #29390: Fix grid logs for large logs

Posted by "eladkal (via GitHub)" <gi...@apache.org>.
eladkal commented on PR #29390:
URL: https://github.com/apache/airflow/pull/29390#issuecomment-1419300812

   mmm we had a PR that added the "Full logs" checkbox as far as I remember the PR show part (tail) of the logs to avoid loading all of it and allow the user the option to show the full log so I thought we no longer have this problem?
   
   I find the message confusing with the box:
   ![Screenshot 2023-02-06 at 17 46 22](https://user-images.githubusercontent.com/45845474/217017888-a7bfa68a-5dd2-4dc6-aaa6-259247f6c5a0.png)
   
   If we know we can't load the log in the UI we should disable the box.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pierrejeambrun commented on pull request #29390: Fix grid logs for large logs

Posted by "pierrejeambrun (via GitHub)" <gi...@apache.org>.
pierrejeambrun commented on PR #29390:
URL: https://github.com/apache/airflow/pull/29390#issuecomment-1419556849

   Thanks Brent for this fix.
   
   'Full Logs' impacts only how we request logs to the rest API. Most of the time, without requesting full logs, reading by chunk/tail will be enough to avoid most of UI issues due to large file. Unfortunatly if a log chunk contains a really really long line we can still have a huge chunk, Brent sample shows 1 line of log that is 5Mb of text by itself and this can will cause rendering + performance issues.
   
   I agree that the wording between these two elements are a bit confusing.
   
   > Note: Looking at the get_task_log implementation, we use streaming capabilities, with `read_log_stream` but aggregate everything before returning the payload. I must be missing something, but I am wondering if this 'Full Logs' really makes sense for us here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pierrejeambrun commented on a diff in pull request #29390: Fix grid logs for large logs

Posted by "pierrejeambrun (via GitHub)" <gi...@apache.org>.
pierrejeambrun commented on code in PR #29390:
URL: https://github.com/apache/airflow/pull/29390#discussion_r1097790430


##########
airflow/www/static/js/dag/details/taskInstance/Logs/utils.ts:
##########
@@ -47,11 +47,15 @@ export const parseLogs = (
     return {};
   }
   let lines;
+
+  let warning;
+
   try {
     lines = data.split('\n');
   } catch (err) {
     console.error(err);
-    return {};
+    warning = 'Unable to show logs. There was an error parsing logs.';

Review Comment:
   We might remove the `console.error` just above. eslint is complaining about it and now that the error is properly handled it's not required anymore.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] bbovenzi commented on pull request #29390: Fix grid logs for large logs

Posted by "bbovenzi (via GitHub)" <gi...@apache.org>.
bbovenzi commented on PR #29390:
URL: https://github.com/apache/airflow/pull/29390#issuecomment-1419563805

   There is a `log_pos` and `offset` metadata in `read_log_stream` but none of that is exposed to the REST API. We should probably fix that. But in the meantime, the UI should have some guardrails. Also, I don't think the Full Logs buttons really does anything for the UI, so I'll remove it entirely.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org