You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2015/10/30 19:09:27 UTC
[jira] [Commented] (LUCENE-6866) TestCheckIndex.testChecksumsOnlyVerbose() failure: Document contains at least one immense term

    [ https://issues.apache.org/jira/browse/LUCENE-6866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14982989#comment-14982989 ] 

ASF subversion and git services commented on LUCENE-6866:
---------------------------------------------------------

Commit 1711531 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1711531 ]

LUCENE-6866: limit max token length

> TestCheckIndex.testChecksumsOnlyVerbose() failure: Document contains at least one immense term
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-6866
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6866
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Steve Rowe
>
> My Jenkins found the following seed that reproduces for me 100% on branch_5x, both Java7&8, and on trunk:
> {noformat}
>    [junit4] Suite: org.apache.lucene.index.TestCheckIndex
>    [junit4]   2> NOTE: download the large Jenkins line-docs file by running 'ant get-jenkins-line-docs' in the lucene directory.
>    [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestCheckIndex -Dtests.method=testChecksumsOnlyVerbose -Dtests.seed=1B39BC3F6E1634F -Dtests.slow=true -Dtests.linedocsfile=/home/jenkins/lucene-data/enwiki.random.lines.txt -Dtests.locale=hu -Dtests.timezone=America/Havana -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>    [junit4] ERROR   0.22s | TestCheckIndex.testChecksumsOnlyVerbose <<<
>    [junit4]    > Throwable #1: java.lang.IllegalArgumentException: Document contains at least one immense term in field="body" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[125, 125, 123, 123, 123, 123, 123, 115, 117, 98, 115, 116, 99, 124, 125, 125, 125, 123, 123, 123, 49, 125, 125, 125, 124, 123, 123, 123, 112, 49]...', original message: bytes can be at most 32766 in length; got 94384
>    [junit4]    >        at __randomizedtesting.SeedInfo.seed([1B39BC3F6E1634F:C5FF0BD55AE404AD]:0)
>    [junit4]    >        at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:726)
>    [junit4]    >        at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:347)
>    [junit4]    >        at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:234)
>    [junit4]    >        at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:449)
>    [junit4]    >        at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1461)
>    [junit4]    >        at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1240)
>    [junit4]    >        at org.apache.lucene.index.TestCheckIndex.testChecksumsOnlyVerbose(TestCheckIndex.java:156)
>    [junit4]    >        at java.lang.Thread.run(Thread.java:745)
>    [junit4]    >        Suppressed: java.lang.IllegalStateException: close() called in wrong state: INCREMENT
>    [junit4]    >                at org.apache.lucene.analysis.MockTokenizer.fail(MockTokenizer.java:126)
>    [junit4]    >                at org.apache.lucene.analysis.MockTokenizer.close(MockTokenizer.java:293)
>    [junit4]    >                at org.apache.lucene.analysis.TokenFilter.close(TokenFilter.java:58)
>    [junit4]    >                at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:742)
>    [junit4]    >                ... 42 more
>    [junit4]    > Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 94384
>    [junit4]    >        at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:284)
>    [junit4]    >        at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:150)
>    [junit4]    >        at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:716)
>    [junit4]    >        ... 42 more
>    [junit4]   2> NOTE: test params are: codec=Asserting(Lucene54): {}, docValues:{}, sim=ClassicSimilarity, locale=hu, timezone=America/Havana
>    [junit4]   2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=16,threads=1,free=412272632,total=514850816
>    [junit4]   2> NOTE: All tests run in this JVM: [TestCheckIndex]
>    [junit4] Completed [1/1] in 0.45s, 1 test, 1 error <<< FAILURES!
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org