You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@metron.apache.org by "Casey Stella (JIRA)" <ji...@apache.org> on 2016/11/02 19:02:58 UTC

[jira] [Updated] (METRON-517) Update elasticsearch bro templates for uri

     [ https://issues.apache.org/jira/browse/METRON-517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Casey Stella updated METRON-517:
--------------------------------
    Fix Version/s:     (was: 0.2.2BETA)

> Update elasticsearch bro templates for uri
> ------------------------------------------
>
>                 Key: METRON-517
>                 URL: https://issues.apache.org/jira/browse/METRON-517
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Jon Zeolla
>            Assignee: Jon Zeolla
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The bro uri field in [HTTP::Info](https://www.bro.org/sphinx/scripts/base/protocols/http/main.bro.html#type-HTTP::Info) can exceed the Lucene-imposed limit of 32766 per term (non-analyzed fields are treated as a single term, and we are setting it as not_analyzed here - https://github.com/apache/incubator-metron/blob/master/metron-deployment/roles/metron_elasticsearch_templates/files/es_templates/bro_index.template).  The resolution options that I've been able to find appear to be:
> 1. Set analyzed to "[no](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-index.html)", which will not add that field to the index, making it not queryable.
> 2. Change the type to [binary](https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html), which will not store it by default.
> 3. Use "[ignore_above](https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-above.html)" to set a limit, above which strings are not indexed.
> 4. Set the field as "analyzed".  
> Here is an example error message:
> ```
> [4]: index [bro_index_2016.10.25.21], type [bro_doc], id [AVf-iCuooLg3mHEm2PpH], message [java.lang.IllegalArgumentException: Document contains at least one immense term in field="uri" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[<redacted>]...', original message: bytes can be at most 32766 in length; got 38623]
> ```
> Relevant Lucene documentation:  https://lucene.apache.org/core/6_2_1/core/constant-values.html#org.apache.lucene.index.IndexWriter.MAX_TERM_LENGTH



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)