You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Adam Gent (JIRA)" <ji...@apache.org> on 2015/08/24 23:16:45 UTC

[jira] [Comment Edited] (FLUME-2768) New ElasticSearch "structured" log behavior is wrong, and dangerous.

    [ https://issues.apache.org/jira/browse/FLUME-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710070#comment-14710070 ] 

Adam Gent edited comment on FLUME-2768 at 8/24/15 9:16 PM:
-----------------------------------------------------------

I agree that this behavior is annoying and probably wrong but it has been in since at least 1.5. The Elasticsearch sink has not changed much from 1.5->~1.7.

I believe it is not Flume that has changed but rather the latest Elasticsearch client has gotten extremely aggressive with trying to guess field content.. In the past it would almost always pick String but now it tries to guess. In fact if you see FLUME-2769 the exception handling is now incorrect (if you pass in a {{@message}} that looks sort of like JSON but is not Flume will choke).

It is terrible annoying but I'm not sure if the Flume code is really at fault. They appear to be using the client correctly.


was (Author: agentgt):
I agree that this behavior is annoying and probably wrong but it has been in since at least 1.5. The Elasticsearch sink has not changed much from 1.5->~1.7.

I believe it is not Flume that has changed but rather the latest Elasticsearch client has gotten extremely aggressive with trying to guess field content.. In the past it would almost always pick String but now it tries to guess. In fact if you see FLUME-2769 the exception handling is now incorrect (if you pass in a message that looks like JSON but is not Flume will choke).

It is terrible annoying but I'm not sure if the Flume code is really at fault. They appear to be using the client correctly.

> New ElasticSearch "structured" log behavior is wrong, and dangerous.
> --------------------------------------------------------------------
>
>                 Key: FLUME-2768
>                 URL: https://issues.apache.org/jira/browse/FLUME-2768
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.6.0
>            Reporter: Matt Wise
>
> The new behavior introduced in Flume 1.6.0 to _automatically_ treat all JSON log messages as structured data (https://issues.apache.org/jira/browse/FLUME-2649, later fixed in https://issues.apache.org/jira/browse/FLUME-2126) is really dangerous, under documented and not controllable by a configuration switch.
> *ElasticSearch Schema Change for the @message field*
> The change that was made is pretty dangerous -- it assumes that if you're passing _any_ JSON data, you must be _only_ passing JSON data... why? Because as soon as you pass in {{@message}} as a {{Object}}, ElasticSearch will refuse any future data to the {{@message}} field that comes in {{String}} format. As soon as this happens, _your log events get dropped on the floor_.
> *Assumes stable field-names and types*
> Similar to the first issue, but more likely to bite you later on ... this change assumes that your field names are stable and always contain the same type of  data. That is, if you pass in {{"duration": "5 seconds"}} then a field in ElasticSearch named {{duration}} will be created with the {{"string"}} type. Now imagine another app writes a log message with {{"duration": 5.0"}} -- you're stuck, ElasticSearch cannot index that data and drops it on the floor because it violates the schema.
> *Finally ... its an undocumented behavior change*
> This is the real big one here -- this change is not documented anywhere other than the commit messages. Also, _you can't turn it off!_. At the very least this new behavior should be _optional_, controlled by a configuration switch, and _disabled by default_.
> *Lastly ... a fix?*
> I plan to release the ElasticSearchLogStashStructuredEventSerializer that we use here at Nextdoor that handles all of the above issues silently. It never touches the {{@message}} field and it automatically handles all structured log data by dynamically renaming fields to include {{__<field type>}} in their name. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)