You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/04/20 21:52:59 UTC

[GitHub] [airflow] NiklasBeierl commented on issue #15613: Airflow2.0.2 --- TypeError: unhashable type: 'AttrDict' while trying to read logs from elasticsearch

NiklasBeierl commented on issue #15613:
URL: https://github.com/apache/airflow/issues/15613#issuecomment-1104487752

   The problem is the combination of `add_host_metadata` processor, some infuriating default behavior of filebeat and `AIRFLOW__ELASTICSEARCH__JSON_FORMAT: "True"`. 
   
   Due to the setting the logs from airflow are JSON and may already contain a `host` field which is needed to correctly display the logs (See [here](https://github.com/apache/airflow/blob/495a5a9e557591c1a5cc820f105e8f0c4f29a4ee/airflow/providers/elasticsearch/log/es_task_handler.py#L153))
   
   As the [documentation of add_host_metadata](https://www.elastic.co/guide/en/beats/filebeat/current/add-host-metadata.html) says, it will by default just override the `host` field in the document.
   
   I am not seeing the "host" field actually getting written by airflow when using the celery executor. I guess its only there when using Dask? So in this case the workaround is quite simple: Rename the "host" field to something else:
   ```yml
   # In filebeat.yml
   processors:
     - add_host_metadata: # This (over)writes the "host" field 
     - rename: 
         fields:
           - from: "host" # Still want this info, but I can't use the "host" field for it
             to: "host_meta"  
   ```
   There is no `host` in any of the records and the code can deal with a `null` just fine since its hashable.
   
   # The longer story: 
   If you actually do want to preserve the original value of the "host" field (I am guessing airflow puts a string there), it gets a bit more complicated. Originally I wanted to preserve the `host` field in my proposed solution, it would look like this:
   ```yml
   # WARNING, THIS WILL NOT WORK!
   processors:
     - rename: # The next proc will overwrite the host field but its needed by the af webserver ...
         fields:
           - from: "host" # ... so lets just store it somewhere else
             to: "airflow_log_host" # With some AF executors there is no host-field and this will just be a NoOp
     - add_host_metadata: # This writes to the "host" field 
     - rename: 
         fail_on_error: false # This is needed to move host to host_meta even if airflow_log_host doesn't exist
         fields:
           - from: "host" # Still want this info, but I can't use the "host" field for it
             to: "host_meta"  
           - from: "airflow_log_host" # Move back the original value to the host field
             to: "host"
   ```
   But it turns out that BEFORE applying any processors, filebeat will already overwrite `hosts` with an object that looks like this:
   `{ "name": "<hostname>" }` and [apparently this behavior can not be turned off](https://discuss.elastic.co/t/how-to-disable-add-host-metadata-processor/162254).
   In order to work around this, you need to use something like this: 
   ```yml
   filebeat.inputs:
     - type: log
       paths:
          - .... .json
        # JSON expansion is done in processors, DO NOT TURN IT ON HERE!
       # json.keys_under_root: true
       # json.overwrite_keys: true
       # json.add_error_key: true
       # json.expand_keys: true
   
   # .... 
   
   processors:
     - drop_fields: # First get rid of the "built in" host field
         fields: 
           - host
     - decode_json_fields: # Expand our JSON log to the root
         fields: 
           - message # This holds a line of JSON as string
         process_array: true
         target: "" # Store at the root
         overwrite_keys: true # message attribute will be overwritten with the message from airflow
     - rename: # The next proc will overwrite the host field which is needed by the AF webserver ...
         fields:
           - from: "host" # ... so lets just store it somewhere else
             to: "airflow_log_host" # With some AF executors there is no host-field and this will just be a NoOp
     - add_host_metadata: # This writes to the "host" field 
     - rename: 
         fail_on_error: false # This is needed to move host to host_meta even if airflow_log_host doesn't exist
         fields:
           - from: "host" # Still want this info, but I can't use the "host" field for it
             to: "host_meta"  
           - from: "airflow_log_host" # Move back the original value to the host field
             to: "host"
   
    ```
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org