You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Tarrall (JIRA)" <ji...@apache.org> on 2015/09/09 05:50:45 UTC

[jira] [Created] (LUCENE-6788) Mishandling of Integer.MIN_VALUE in FuzzySet leads to AssertionError

Robert Tarrall created LUCENE-6788:
--------------------------------------

             Summary: Mishandling of Integer.MIN_VALUE in FuzzySet leads to AssertionError
                 Key: LUCENE-6788
                 URL: https://issues.apache.org/jira/browse/LUCENE-6788
             Project: Lucene - Core
          Issue Type: Bug
          Components: core/index
    Affects Versions: 4.10.4, Trunk
            Reporter: Robert Tarrall


Reindexing some data in the DataStax Enterprise Search product (which uses Solr) led to these stack traces:

ERROR [Lucene Merge Thread #13430] 2015-09-08 11:14:36,582 CassandraDaemon.java (line 258) Exception in thread Thread[Lucene Merge Thread #13430,6,main]
org.apache.lucene.index.MergePolicy$MergeException: java.lang.AssertionError
        at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545)
        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518)
Caused by: java.lang.AssertionError
        at org.apache.lucene.codecs.bloom.FuzzySet.mayContainValue(FuzzySet.java:216)
        at org.apache.lucene.codecs.bloom.FuzzySet.contains(FuzzySet.java:165)
        at org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat$BloomFilteredFieldsProducer$BloomFilteredTermsEnum.seekExact(BloomFilteringPostingsFormat.java:351)
        at org.apache.lucene.index.BufferedUpdatesStream.applyTermDeletes(BufferedUpdatesStream.java:414)
        at org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:283)
        at org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:3838)
        at org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:3799)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3651)
        at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

In tracking down the cause of the stack trace, I noticed this:
https://github.com/apache/lucene-solr/blob/trunk/lucene/codecs/src/java/org/apache/lucene/codecs/bloom/FuzzySet.java#L164

It is possible for the Murmur2 hash to return Integer.MIN_VALUE (e.g. when hashing "WeH44wlbCK").  Multiplying Integer.MIN_VALUE by -1 returns Integer.MIN_VALUE again, so the "positiveHash >= 0" assertion at line 217 fails.

We could special-case Integer.MIN_VALUE, map it to 42 or some other magic number... since the same "* -1" logic appears on line 236 perhaps it should be part of the hash function?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org