You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/03/15 10:18:06 UTC

[GitHub] [airflow-ci-infra] ashb opened a new pull request #8: Upload job output logs to Cloudwatch too

ashb opened a new pull request #8:
URL: https://github.com/apache/airflow-ci-infra/pull/8


   We have some cases where logs aren't being uploaded to Github, which
   makes debugging failures hard.
   
   This is a problem with GitHub's hosted runners too, but for self-hosted
   runners we can at least do something about it.
   
   Relates to apache/airflow#14782


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow-ci-infra] ashb commented on a change in pull request #8: Upload job output logs to Cloudwatch too

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #8:
URL: https://github.com/apache/airflow-ci-infra/pull/8#discussion_r594209236



##########
File path: cloud-init.yml
##########
@@ -242,23 +251,50 @@ write_files:
             timeout_ms = 250
 
       [transforms.grok-runner-logs]
-        type = "grok_parser"
+        type = "remap"

Review comment:
       This is because vector are now strongly encouraging use of the remap parser instead.
   
   > This transform has been deprecated in favor of the remap transform, which enables you to use Vector Remap Language (VRL for short) to create transform logic of any degree of complexity




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow-ci-infra] ashb commented on a change in pull request #8: Upload job output logs to Cloudwatch too

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #8:
URL: https://github.com/apache/airflow-ci-infra/pull/8#discussion_r594209880



##########
File path: cloud-init.yml
##########
@@ -242,23 +251,50 @@ write_files:
             timeout_ms = 250
 
       [transforms.grok-runner-logs]
-        type = "grok_parser"
+        type = "remap"
         inputs=["runner-logs"]
-        pattern = "(?m)\\[%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{NOTSPACE:logger}\\] %{GREEDYDATA:message}"
-        types.timestamp = "timestamp|%F %TZ"
+        source = '''
+          structured, err = parse_grok(.message, "(?m)\\[%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{NOTSPACE:logger}\\] %{GREEDYDATA:message}")
+
+          if err != null {
+            .err = err
+          } else {
+            . = merge(., structured)
+          }
+        '''
+      [transforms.filter-runner-logs]
+        type = "filter"
+        inputs = ['grok-runner-logs']
+        condition.type = "remap"
+        condition.source = '''
+          if .logger == "JobServerQueue" {
+            !match!(.message, r'Try to append \d+ batches web console lines for record')
+          } else if .logger == "HostContext" {
+            !starts_with!(.message, "Well known directory")

Review comment:
       Drop lots of repeated log messages that aren't useful to use now we have job logs directly in CloudWatch




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow-ci-infra] ashb commented on a change in pull request #8: Upload job output logs to Cloudwatch too

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #8:
URL: https://github.com/apache/airflow-ci-infra/pull/8#discussion_r594209516



##########
File path: cloud-init.yml
##########
@@ -242,23 +251,50 @@ write_files:
             timeout_ms = 250
 
       [transforms.grok-runner-logs]
-        type = "grok_parser"
+        type = "remap"
         inputs=["runner-logs"]
-        pattern = "(?m)\\[%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{NOTSPACE:logger}\\] %{GREEDYDATA:message}"
-        types.timestamp = "timestamp|%F %TZ"
+        source = '''
+          structured, err = parse_grok(.message, "(?m)\\[%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{NOTSPACE:logger}\\] %{GREEDYDATA:message}")
+
+          if err != null {
+            .err = err
+          } else {
+            . = merge(., structured)
+          }
+        '''
+      [transforms.filter-runner-logs]
+        type = "filter"
+        inputs = ['grok-runner-logs']
+        condition.type = "remap"
+        condition.source = '''
+          if .logger == "JobServerQueue" {
+            !match!(.message, r'Try to append \d+ batches web console lines for record')
+          } else if .logger == "HostContext" {
+            !starts_with!(.message, "Well known directory")
+          } else {
+            true
+          }
+        '''
+
+      [sources.job-logs]
+        type = "file"
+        include = ["/home/runner/actions-runner/_diag/pages/*.log"]
 
-      [transforms.without_systemd_fields]
-        type = "remove_fields"
-        inputs = ["logs"]
-        fields = ["_CAP_EFFECTIVE", "_SYSTEMD_SLICE", "_SYSTEMD_CGROUP",
-          "_SYSTEMD_INVOCATION_ID", "_SELINUX_CONTEXT", "_COMM", "_BOOT_ID",
-          "_MACHINE_ID", "_STREAM_ID", "_PID", "_GID", "_UID","_TRANSPORT",
-          "__MONOTONIC_TIMESTAMP", "SYSLOG_IDENTIFIER", "PRIORITY",
-          "source_type"]
+      [transforms.grok-job-logs]
+        type = "remap"
+        inputs = ["job-logs"]
+        source = '''
+          structured, err = parse_grok(.message, "%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:message}")
+
+          if err == null {
+            . = merge(., structured)
+            .type = "job-output"

Review comment:
       This adds a field to the log messages that we can then filter by to get log output.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow-ci-infra] ashb merged pull request #8: Upload job output logs to Cloudwatch too

Posted by GitBox <gi...@apache.org>.
ashb merged pull request #8:
URL: https://github.com/apache/airflow-ci-infra/pull/8


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org