You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@metron.apache.org by "Jon Zeolla (JIRA)" <ji...@apache.org> on 2016/10/26 02:45:59 UTC

[jira] [Commented] (METRON-517) Update elasticsearch bro templates for uri

    [ https://issues.apache.org/jira/browse/METRON-517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15607182#comment-15607182 ] 

Jon Zeolla commented on METRON-517:
-----------------------------------

Personally, I'm a fan of options #3 and #4.  #4 would be a sensible choice if we are assuming that Metron environments are not limited by computing resources (within reason).

Due to the relative infrequency of this I also don't think #3 is a horrible choice for resource-constrained environments, however I'm concerned that:
1. Attackers could abuse this setting by simply appending a long, irrelevant query string to their URI.  
2. Metron Users could have a confusing experience because data is inconsistently available.

It is also worth mentioning that this issue is in Lucene, and thus will exist across both Solr and Elasticsearch.

> Update elasticsearch bro templates for uri
> ------------------------------------------
>
>                 Key: METRON-517
>                 URL: https://issues.apache.org/jira/browse/METRON-517
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Jon Zeolla
>            Assignee: Jon Zeolla
>             Fix For: 0.2.2BETA
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The bro uri field in [HTTP::Info](https://www.bro.org/sphinx/scripts/base/protocols/http/main.bro.html#type-HTTP::Info) can exceed the Lucene-imposed limit of 32766 per term (non-analyzed fields are treated as a single term).  The resolution options that I've been able to find appear to be:
> 1. Set analyzed to "[no](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-index.html)", which will not add that field to the index, making it not queryable.
> 2. Change the type to [binary](https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html), which will not store it by default.
> 3. Use "[ignore_above](https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-above.html)" to set a limit, above which strings are not indexed.
> 4. Set the field as "analyzed".  
> Here is an example error message:
> ```
> [4]: index [bro_index_2016.10.25.21], type [bro_doc], id [AVf-iCuooLg3mHEm2PpH], message [java.lang.IllegalArgumentException: Document contains at least one immense term in field="uri" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[<redacted>]...', original message: bytes can be at most 32766 in length; got 38623]
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)