You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Nicolás Lichtmaier <ni...@wolfram.com.INVALID> on 2021/02/10 20:30:33 UTC

Reproducible crash matching phrases

I've been able to reproduce a crash we are seeing in our product with 
newer Lucene versions.

I'm attaching a small Java code that reproduces this. It might look 
weird, it's the result of removing every custom thing we are applying to 
the query while still seeing the bug.

This is the crash I see with this code (with assertions disabled it 
crashes in a different place):

Exception in thread "main" java.lang.AssertionError
     at 
org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$EverythingEnum.nextPosition(Lucene84PostingsReader.java:940)
     at 
org.apache.lucene.search.PhrasePositions.nextPosition(PhrasePositions.java:57)
     at 
org.apache.lucene.search.PhrasePositions.firstPosition(PhrasePositions.java:46)
     at 
org.apache.lucene.search.SloppyPhraseMatcher.initSimple(SloppyPhraseMatcher.java:368)
     at 
org.apache.lucene.search.SloppyPhraseMatcher.initPhrasePositions(SloppyPhraseMatcher.java:356)
     at 
org.apache.lucene.search.SloppyPhraseMatcher.reset(SloppyPhraseMatcher.java:153)
     at 
org.apache.lucene.search.PhraseScorer$1.matches(PhraseScorer.java:49)
     at 
org.apache.lucene.search.DoubleValuesSource$WeightDoubleValuesSource$1.advanceExact(DoubleValuesSource.java:631)
     at 
org.apache.lucene.queries.function.FunctionScoreQuery$QueryBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:343)
     at 
org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
     at 
org.apache.lucene.search.DoubleValues$1.advanceExact(DoubleValues.java:53)
     at 
org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.advanceExact(FunctionScoreQuery.java:270)
     at 
org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:228)
     at 
org.apache.lucene.search.DoubleValuesSource$2.doubleValue(DoubleValuesSource.java:344)
     at 
org.apache.lucene.search.DoubleValues$1.doubleValue(DoubleValues.java:48)
     at 
org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource$1.doubleValue(FunctionScoreQuery.java:265)
     at 
org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight$1.score(FunctionScoreQuery.java:229)
     at 
org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:76)
     at 
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:276)
     at 
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:232)
     at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
     at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:661)
     at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445)
     at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:574)
     at 
org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:421)
     at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:432)
     at com.wolfram.textsearch.LuceneCrash.main(LuceneCrash.java:48)

Interestingly, the bug does not happen if the index is created on a 
ByteBuffersDirectory.

I hope this is useful!

Thanks!


Re: Reproducible crash matching phrases

Posted by Nicolás Lichtmaier <ni...@wolfram.com.INVALID>.
Great! Thanks!

Yes, I realized that this is actually directory-type independent.

El 10/2/21 a las 20:19, Chris Hostetter escribió:
> : I'm attaching an updated file as well this this changes.
> :
> : This happens in Lucene 8.8.0 (and probably since 8.4.0).
>
> Ok -- cool ... with the udpated code i was able to reproduce on branch_8x,
> and with 8.8 & 8.7 (but not 8.4) -- I've distilled your patch into a test
> case and attached to a new jira...
>
> https://issues.apache.org/jira/browse/LUCENE-9762
>
> FYI: with this updated code the error *DOES* reproduce for me regardless
> of Directory type -- i suspect your original comment about it not failing
> if you used ByteBuffersDirectory was because that would have been a
> "clean" index everytime, and the old code was only failing with your
> existing index on disk.
>
> let's see if the folks with the low level expertise can figure out what's
> going wrong here.
>
>
> -Hoss
> http://www.lucidworks.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Reproducible crash matching phrases

Posted by Chris Hostetter <ho...@fucit.org>.
: I'm attaching an updated file as well this this changes.
: 
: This happens in Lucene 8.8.0 (and probably since 8.4.0).

Ok -- cool ... with the udpated code i was able to reproduce on branch_8x, 
and with 8.8 & 8.7 (but not 8.4) -- I've distilled your patch into a test 
case and attached to a new jira...

https://issues.apache.org/jira/browse/LUCENE-9762

FYI: with this updated code the error *DOES* reproduce for me regardless 
of Directory type -- i suspect your original comment about it not failing 
if you used ByteBuffersDirectory was because that would have been a 
"clean" index everytime, and the old code was only failing with your 
existing index on disk.

let's see if the folks with the low level expertise can figure out what's 
going wrong here.


-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Reproducible crash matching phrases

Posted by Nicolás Lichtmaier <ni...@wolfram.com.INVALID>.
The crash happens if you instead add these two fields:

             doc.add(new TextField("ExampleText", "periodic function", 
Field.Store.NO));
             doc.add(new TextField("ExampleText", "plot of the original 
function", Field.Store.NO));

I'm attaching an updated file as well this this changes.

This happens in Lucene 8.8.0 (and probably since 8.4.0).

Thanks!

El 10/2/21 a las 19:11, Nicolás Lichtmaier escribió:
> This happens on Lucene 8.8. I deleted the index and now I don't see 
> the problem. =( I'll post an updated version of the code shortly.
>
> Thanks!
>
> El 10/2/21 a las 19:01, Chris Hostetter escribió:
>> : I've been able to reproduce a crash we are seeing in our product 
>> with newer
>> : Lucene versions.
>>
>> Can you be specific?  What exact versions of Lucene are you using that
>> reproduces this failure?  If you know of other "older" versions where 
>> you
>> can't reproduce the problem, that info would also be helpful...
>>
>>
>> I tried running your test code against the current branch_8x and was
>> unable to trigger any sort of failure.  I also tried using 8.4.1 
>> based on
>> the stack trace indicating that you must be using a version of lucene no
>> older then 8.4 given the codec in use -- and was also unable to 
>> reproduce
>> any sort of problem.
>>
>> Also note that as written your LuceneCrash code leaves an index on disk
>> which is re-used the next time the code is run: does the problem 
>> reproduce
>> for you if you manually "rm -r /tmp/xxx" and run it again, or is the
>> problem specific to having some "cruft" documents left in the index from
>> previous runs?  Can you zip up the contents of /tmp/xxx on your machine
>> and attache it ti a new jira?
>>
>>
>> : Interestingly, the bug does not happen if the index is created on a
>> : ByteBuffersDirectory.
>>
>> That makes it seem like the bug might be filesystem specific -- what 
>> impl
>> does the FSDirectory.open() call in your code return?
>>
>>
>>
>> -Hoss
>> http://www.lucidworks.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

Re: Reproducible crash matching phrases

Posted by Nicolás Lichtmaier <ni...@wolfram.com.INVALID>.
This happens on Lucene 8.8. I deleted the index and now I don't see the 
problem. =( I'll post an updated version of the code shortly.

Thanks!

El 10/2/21 a las 19:01, Chris Hostetter escribió:
> : I've been able to reproduce a crash we are seeing in our product with newer
> : Lucene versions.
>
> Can you be specific?  What exact versions of Lucene are you using that
> reproduces this failure?  If you know of other "older" versions where you
> can't reproduce the problem, that info would also be helpful...
>
>
> I tried running your test code against the current branch_8x and was
> unable to trigger any sort of failure.  I also tried using 8.4.1 based on
> the stack trace indicating that you must be using a version of lucene no
> older then 8.4 given the codec in use -- and was also unable to reproduce
> any sort of problem.
>
> Also note that as written your LuceneCrash code leaves an index on disk
> which is re-used the next time the code is run: does the problem reproduce
> for you if you manually "rm -r /tmp/xxx" and run it again, or is the
> problem specific to having some "cruft" documents left in the index from
> previous runs?  Can you zip up the contents of /tmp/xxx on your machine
> and attache it ti a new jira?
>
>
> : Interestingly, the bug does not happen if the index is created on a
> : ByteBuffersDirectory.
>
> That makes it seem like the bug might be filesystem specific -- what impl
> does the FSDirectory.open() call in your code return?
>
>
>
> -Hoss
> http://www.lucidworks.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Reproducible crash matching phrases

Posted by Chris Hostetter <ho...@fucit.org>.
: I've been able to reproduce a crash we are seeing in our product with newer
: Lucene versions.

Can you be specific?  What exact versions of Lucene are you using that 
reproduces this failure?  If you know of other "older" versions where you 
can't reproduce the problem, that info would also be helpful...


I tried running your test code against the current branch_8x and was 
unable to trigger any sort of failure.  I also tried using 8.4.1 based on 
the stack trace indicating that you must be using a version of lucene no 
older then 8.4 given the codec in use -- and was also unable to reproduce 
any sort of problem.

Also note that as written your LuceneCrash code leaves an index on disk 
which is re-used the next time the code is run: does the problem reproduce 
for you if you manually "rm -r /tmp/xxx" and run it again, or is the 
problem specific to having some "cruft" documents left in the index from 
previous runs?  Can you zip up the contents of /tmp/xxx on your machine 
and attache it ti a new jira?


: Interestingly, the bug does not happen if the index is created on a
: ByteBuffersDirectory.

That makes it seem like the bug might be filesystem specific -- what impl 
does the FSDirectory.open() call in your code return?



-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org