You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Pierre Villard (Jira)" <ji...@apache.org> on 2024/02/28 19:06:00 UTC

[jira] [Updated] (NIFI-12850) Failure to index Provenance Events with large filename attribute

     [ https://issues.apache.org/jira/browse/NIFI-12850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pierre Villard updated NIFI-12850:
----------------------------------
    Summary: Failure to index Provenance Events with large filename attribute  (was: Failure to index Provenance Events with large attributes)

> Failure to index Provenance Events with large filename attribute
> ----------------------------------------------------------------
>
>                 Key: NIFI-12850
>                 URL: https://issues.apache.org/jira/browse/NIFI-12850
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>            Reporter: Pierre Villard
>            Assignee: Pierre Villard
>            Priority: Major
>
> {code:java}
> ERROR org.apache.nifi.provenance.index.lucene.EventIndexTask: Failed to index Provenance Events java.lang.IllegalArgumentException: Document contains at least one immense term in field="filename" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[49, 50, 55, 48, 54, 50, 51, 55, 51, 57, 51, 52, 53, 50, 56, 51, 53, 46, 48, 46, 97, 118, 114, 111, 46, 48, 46, 97, 118, 114]...', original message: bytes can be at most 32766 in length; got 74483 at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:984) at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:527) at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:491) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:208) at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:415) at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1471) at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1444) at org.apache.nifi.provenance.lucene.LuceneEventIndexWriter.index(LuceneEventIndexWriter.java:70) at org.apache.nifi.provenance.index.lucene.EventIndexTask.index(EventIndexTask.java:202) at org.apache.nifi.provenance.index.lucene.EventIndexTask.run(EventIndexTask.java:113) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 74483 at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:281) at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:182) at org.apache.lucene.index.DefaultIndexingChain$PerField. {code}
> Looking at the code, it looks like filename is the only attribute that could be set with arbitrary values that is not protected against overly large values right now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)