You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Matt Wise (JIRA)" <ji...@apache.org> on 2015/08/19 23:40:45 UTC

[jira] [Created] (FLUME-2768) New ElasticSearch "structured" log behavior is wrong, and dangerous.

Matt Wise created FLUME-2768:
--------------------------------

             Summary: New ElasticSearch "structured" log behavior is wrong, and dangerous.
                 Key: FLUME-2768
                 URL: https://issues.apache.org/jira/browse/FLUME-2768
             Project: Flume
          Issue Type: Bug
          Components: Sinks+Sources
    Affects Versions: v1.6.0
            Reporter: Matt Wise


The new behavior introduced in Flume 1.6.0 to _automatically_ treat all JSON log messages as structured data (https://issues.apache.org/jira/browse/FLUME-2649, later fixed in https://issues.apache.org/jira/browse/FLUME-2126) is really dangerous, under documented and not controllable by a configuration switch.

*ElasticSearch Schema Change for the @message field*
The change that was made is pretty dangerous -- it assumes that if you're passing _any_ JSON data, you must be _only_ passing JSON data... why? Because as soon as you pass in {{@message}} as a {{Object}}, ElasticSearch will refuse any future data to the {{@message}} field that comes in {{String}} format. As soon as this happens, _your log events get dropped on the floor_.

*Assumes stable field-names and types*
Similar to the first issue, but more likely to bite you later on ... this change assumes that your field names are stable and always contain the same type of  data. That is, if you pass in {{"duration": "5 seconds"}} then a field in ElasticSearch named {{duration}} will be created with the {{"string"}} type. Now imagine another app writes a log message with {{"duration": 5.0"}} -- you're stuck, ElasticSearch cannot index that data and drops it on the floor because it violates the schema.

*Finally ... its an undocumented behavior change*
This is the real big one here -- this change is not documented anywhere other than the commit messages. Also, _you can't turn it off!_. At the very least this new behavior should be _optional_, controlled by a configuration switch, and _disabled by default_.

*Lastly ... a fix?*
I plan to release the ElasticSearchLogStashStructuredEventSerializer that we use here at Nextdoor that handles all of the above issues silently. It never touches the {{@message}} field and it automatically handles all structured log data by dynamically renaming fields to include {{__<field type>}} in their name. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)