You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/19 14:46:24 UTC

[GitHub] [airflow] eyalzek opened a new issue #10406: log_id field is missing from log lines (ES remote logging)

eyalzek opened a new issue #10406:
URL: https://github.com/apache/airflow/issues/10406


   **Apache Airflow version**:
   apache/airflow:1.10.11
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`):
   v1.16.11-gke.5
   
   **Environment**:
   GKE
   
   
   **What happened**:
   Webserver doesn't fetch logs for tasks from elasticsearch
   
   **What you expected to happen**:
   task logs will be displayed in the webserver UI
   
   It seems like the webserver is trying to query task logs by the `log_id` field:
   https://github.com/apache/airflow/blob/1.10.11/airflow/utils/log/es_task_handler.py#L175
   
   this field is missing from all log lines (which are written to stdout) using the KubernetesExecutor. Example log line:
   `{"asctime": null, "filename": "standard_task_runner.py", "lineno": 77, "levelname": "INFO", "message": "Running: ['airflow', 'run', 'hello_world', 'hello_task_3', '2020-08-19T14:26:07.226064+00:00', '--job_id', '158', '--pool', 'default_pool', '--raw', '-sd', '/opt/airflow/dags/repo/dags/hello_world.py', '--cfg_path', '/tmp/tmpt7lafkaf']", "dag_id": "hello_world", "task_id": "hello_task_3", "execution_date": "2020_08_19T14_26_07_226064", "try_number": "1"}`
   
   
   **How to reproduce it**:
   this is the relevant configuration we have, scheduler and webserver running separately and tasks run using KubernetsExecutor (all in the same cluster/namespace):
   ```
   AIRFLOW__CORE__LOGGING_LEVEL: INFO
   AIRFLOW__CORE__REMOTE_LOGGING: "True"
   AIRFLOW__ELASTICSEARCH__HOST: http://elasticsearch.logging:9200
   AIRFLOW__ELASTICSEARCH__JSON_FORMAT: "True"
   AIRFLOW__ELASTICSEARCH__WRITE_STDOUT: "True"
   ```
   
   we are using fluentd (https://github.com/fluent/fluentd-kubernetes-daemonset) to forward log lines to elasticsearch, all task logs are written to stdout + elasticsearch as expected.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eyalzek commented on issue #10406: log_id field is missing from log lines (ES remote logging)

Posted by GitBox <gi...@apache.org>.
eyalzek commented on issue #10406:
URL: https://github.com/apache/airflow/issues/10406#issuecomment-879908419


   We switched away from EFK to stackdriver logging a while ago so I can't really say. This sound like you might need to configure `multiline` parsing on the fluentd side though


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eyalzek edited a comment on issue #10406: log_id field is missing from log lines (ES remote logging)

Posted by GitBox <gi...@apache.org>.
eyalzek edited a comment on issue #10406:
URL: https://github.com/apache/airflow/issues/10406#issuecomment-676734935


   @potiuk I'll be happy to if you could point me in the right direction. Where should this feature be implemented? In the logger? Opened a PR with an attempt, will try to test this tomorrow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ldacey commented on issue #10406: log_id field is missing from log lines (ES remote logging)

Posted by GitBox <gi...@apache.org>.
ldacey commented on issue #10406:
URL: https://github.com/apache/airflow/issues/10406#issuecomment-879878872


   @eyalzek hi - I got this to work for the most part. My logs are split quite heavily though, each message is a separate log event which means one log might consist of 10-50 rows and it is hard to read. Does that happen for you as well?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #10406: log_id field is missing from log lines (ES remote logging)

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #10406:
URL: https://github.com/apache/airflow/issues/10406#issuecomment-676522892


   Wold you maybe like to work ona fix ? It does not seem like a complex things to do and we are happy to help new contributors :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eyalzek commented on issue #10406: log_id field is missing from log lines (ES remote logging)

Posted by GitBox <gi...@apache.org>.
eyalzek commented on issue #10406:
URL: https://github.com/apache/airflow/issues/10406#issuecomment-676734935


   @potiuk I'll be happy to if you could point me in the right direction. Where should this feature be implemented? In the logger?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #10406: log_id field is missing from log lines (ES remote logging)

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #10406:
URL: https://github.com/apache/airflow/issues/10406


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #10406: log_id field is missing from log lines (ES remote logging)

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #10406:
URL: https://github.com/apache/airflow/issues/10406#issuecomment-676471737


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eyalzek commented on issue #10406: log_id field is missing from log lines (ES remote logging)

Posted by GitBox <gi...@apache.org>.
eyalzek commented on issue #10406:
URL: https://github.com/apache/airflow/issues/10406#issuecomment-678109841


   For posterity, for anyone deploying to kubernetes and using EFK for logging (specifically with https://github.com/fluent/fluentd-kubernetes-daemonset), this is the fluentd configuration we're using at the moment for getting `log_id` & `offset` into worker log lines:
   
   ```
   <filter kubernetes.var.log.containers.**>
     @type parser
     <parse>
       @type json
     </parse>
     emit_invalid_record_to_error false
     key_name log
     replace_invalid_sequence true
     reserve_data true
     reserve_time true
     remove_key_name_field true
   </filter>
   
   <filter var.log.containers.**>
     @type record_modifier
     prepare_value time = Time.now; @offset = time.to_i * (10 ** 9) + time.nsec
     remove_keys _dummy_
     <record>
       _dummy_ ${if record.has_key?('task_log'); record['log_id'] = "#{record['kubernetes']['labels']['dag_id']}-#{record['kubernetes']['labels']['task_id']}-#{record['kubernetes']['labels']['execution_date'].gsub(/_plus.+/, '').gsub(/[-\.]/, '_')}-#{record['kubernetes']['labels']['try_number']}"; record['offset'] = @offset; end; nil}
     </record>
   </filter>
   ```
   
   in conjunction with the following airflow configuration:
   ```
   AIRFLOW__CORE__REMOTE_LOGGING: "True"
   AIRFLOW__ELASTICSEARCH__HOST: http://elasticsearch:9200
   AIRFLOW__ELASTICSEARCH__WRITE_STDOUT: "True"
   AIRFLOW__ELASTICSEARCH__JSON_FIELDS: asctime, filename, lineno, levelname, message, task_log # task_log is used to tell task logs apart from airflow logs in fluentd
   AIRFLOW__ELASTICSEARCH__JSON_FORMAT: "True"
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org