You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2015/09/09 10:29:45 UTC

[jira] [Commented] (LUCENE-6788) Mishandling of Integer.MIN_VALUE in FuzzySet leads to AssertionError

    [ https://issues.apache.org/jira/browse/LUCENE-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736443#comment-14736443 ] 

Adrien Grand commented on LUCENE-6788:
--------------------------------------

Hmm actually it looks to me that having a positive value is not necessary as the only thing we are doing with the result of the hash is to and it with the bloom size, which would work fine with a negative number too.

> Mishandling of Integer.MIN_VALUE in FuzzySet leads to AssertionError
> --------------------------------------------------------------------
>
>                 Key: LUCENE-6788
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6788
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.10.4, Trunk
>            Reporter: Robert Tarrall
>
> Reindexing some data in the DataStax Enterprise Search product (which uses Solr) led to these stack traces:
> ERROR [Lucene Merge Thread #13430] 2015-09-08 11:14:36,582 CassandraDaemon.java (line 258) Exception in thread Thread[Lucene Merge Thread #13430,6,main]
> org.apache.lucene.index.MergePolicy$MergeException: java.lang.AssertionError
>         at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545)
>         at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518)
> Caused by: java.lang.AssertionError
>         at org.apache.lucene.codecs.bloom.FuzzySet.mayContainValue(FuzzySet.java:216)
>         at org.apache.lucene.codecs.bloom.FuzzySet.contains(FuzzySet.java:165)
>         at org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat$BloomFilteredFieldsProducer$BloomFilteredTermsEnum.seekExact(BloomFilteringPostingsFormat.java:351)
>         at org.apache.lucene.index.BufferedUpdatesStream.applyTermDeletes(BufferedUpdatesStream.java:414)
>         at org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:283)
>         at org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:3838)
>         at org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:3799)
>         at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3651)
>         at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
>         at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
> In tracking down the cause of the stack trace, I noticed this:
> https://github.com/apache/lucene-solr/blob/trunk/lucene/codecs/src/java/org/apache/lucene/codecs/bloom/FuzzySet.java#L164
> It is possible for the Murmur2 hash to return Integer.MIN_VALUE (e.g. when hashing "WeH44wlbCK").  Multiplying Integer.MIN_VALUE by -1 returns Integer.MIN_VALUE again, so the "positiveHash >= 0" assertion at line 217 fails.
> We could special-case Integer.MIN_VALUE, map it to 42 or some other magic number... since the same "* -1" logic appears on line 236 perhaps it should be part of the hash function?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org