You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Nicolás Lichtmaier <ni...@wolfram.com.INVALID> on 2019/03/07 23:33:13 UTC

Re: FlattenGraphFilter assertion error

After a lot of time... Here's an small example that triggers that assertion.

             Builder builder = CustomAnalyzer.builder();

             builder.withTokenizer(StandardTokenizerFactory.class);
builder.addTokenFilter(WordDelimiterGraphFilterFactory.class, 
"camelCase", "1", "preserveOriginal", "1");
             builder.addTokenFilter(StopFilterFactory.class);

builder.addTokenFilter(FlattenGraphFilterFactory.class);
             Analyzer analyzer = builder.build();

             TokenStream ts = analyzer.tokenStream("*", new 
StringReader("x7in"));
             ts.reset();
             while(ts.incrementToken())
                 ;

This gives:

Exception in thread "main" java.lang.AssertionError: 2
     at 
org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:195)
     at 
org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:258)
     at com.wolfram.textsearch.AnalyzerError.main(AnalyzerError.java:32)

It's the interaction between WordDelimiterGraphFilter and stop word 
removal, it seems, that trigger an assertion when flattening.


El 12/10/17 a las 19:18, Michael McCandless escribió:
> Hmm, that's not good!  Clearly there is a bug somewhere.
>
> Are you able to isolate a small example, e.g. text input and synonyms 
> you fed to SynonymGraphFilter, to show this assertion trip?
>
> Are you using any custom analysis components before the 
> FlattenGraphFilter?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Tue, Oct 10, 2017 at 11:24 AM, Nicolás Lichtmaier 
> <nicolasl@wolfram.com <ma...@wolfram.com>> wrote:
>
>     Hi!
>
>     I was getting an exception in FlattenGraphFilter and, as I saw
>     there was assertion statements nearby, I reran everything with
>     assertions enabled. And I see it crashes here
>     (FlattenGraphFilter.java:174)
>
>
>     At this point inputNode has all fields with -1 (except nextOut,
>     which is 0).. and outputFrom's value is 395.
>
>     The code is pretty complex, so before trying to undestand it I
>     thought maybe someone could know what's happening just seeing
>     this, maybe not. =)
>
>     I'll keep the debugging session open for a while in case some more
>     variables could be useful to debug this.
>
>     Thanks!
>
>
>

Re: FlattenGraphFilter assertion error

Posted by Nicolás Lichtmaier <ni...@wolfram.com.INVALID>.
I've created a Jira issue for this here: 
https://issues.apache.org/jira/browse/LUCENE-8723

El 8/3/19 a las 00:08, Nicolás Lichtmaier escribió:
>
> Oops, sorry... in that code there's a "camelCase" parameter that is 
> not implemented in normal Lucene. That is an option I've added for 
> better camel case support, but the bug happens without that option as 
> well.
>
> El 7/3/19 a las 20:33, Nicolás Lichtmaier escribió:
>>
>> After a lot of time... Here's an small example that triggers that 
>> assertion.
>>
>>             Builder builder = CustomAnalyzer.builder();
>>
>> builder.withTokenizer(StandardTokenizerFactory.class);
>> builder.addTokenFilter(WordDelimiterGraphFilterFactory.class, 
>> "camelCase", "1", "preserveOriginal", "1");
>>             builder.addTokenFilter(StopFilterFactory.class);
>>
>> builder.addTokenFilter(FlattenGraphFilterFactory.class);
>>             Analyzer analyzer = builder.build();
>>
>>             TokenStream ts = analyzer.tokenStream("*", new 
>> StringReader("x7in"));
>>             ts.reset();
>>             while(ts.incrementToken())
>>                 ;
>>
>> This gives:
>>
>> Exception in thread "main" java.lang.AssertionError: 2
>>     at 
>> org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:195)
>>     at 
>> org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:258)
>>     at com.wolfram.textsearch.AnalyzerError.main(AnalyzerError.java:32)
>>
>> It's the interaction between WordDelimiterGraphFilter and stop word 
>> removal, it seems, that trigger an assertion when flattening.
>>
>>
>> El 12/10/17 a las 19:18, Michael McCandless escribió:
>>> Hmm, that's not good!  Clearly there is a bug somewhere.
>>>
>>> Are you able to isolate a small example, e.g. text input and 
>>> synonyms you fed to SynonymGraphFilter, to show this assertion trip?
>>>
>>> Are you using any custom analysis components before the 
>>> FlattenGraphFilter?
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Tue, Oct 10, 2017 at 11:24 AM, Nicolás Lichtmaier 
>>> <nicolasl@wolfram.com <ma...@wolfram.com>> wrote:
>>>
>>>     Hi!
>>>
>>>     I was getting an exception in FlattenGraphFilter and, as I saw
>>>     there was assertion statements nearby, I reran everything with
>>>     assertions enabled. And I see it crashes here
>>>     (FlattenGraphFilter.java:174)
>>>
>>>
>>>     At this point inputNode has all fields with -1 (except nextOut,
>>>     which is 0).. and outputFrom's value is 395.
>>>
>>>     The code is pretty complex, so before trying to undestand it I
>>>     thought maybe someone could know what's happening just seeing
>>>     this, maybe not. =)
>>>
>>>     I'll keep the debugging session open for a while in case some
>>>     more variables could be useful to debug this.
>>>
>>>     Thanks!
>>>
>>>
>>>

Re: FlattenGraphFilter assertion error

Posted by Nicolás Lichtmaier <ni...@wolfram.com.INVALID>.
Yes, of course. It's here: https://issues.apache.org/jira/browse/LUCENE-8723

Thanks.

El 12/3/19 a las 12:59, Michael McCandless escribió:
> Hello Nicolás,
>
> Can you please open an issue for this?  Thanks.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Mar 7, 2019 at 10:08 PM Nicolás Lichtmaier 
> <nicolasl@wolfram.com <ma...@wolfram.com>> wrote:
>
>     Oops, sorry... in that code there's a "camelCase" parameter that
>     is not implemented in normal Lucene. That is an option I've added
>     for better camel case support, but the bug happens without that
>     option as well.
>
>     El 7/3/19 a las 20:33, Nicolás Lichtmaier escribió:
>>
>>     After a lot of time... Here's an small example that triggers that
>>     assertion.
>>
>>                 Builder builder = CustomAnalyzer.builder();
>>
>>     builder.withTokenizer(StandardTokenizerFactory.class);
>>     builder.addTokenFilter(WordDelimiterGraphFilterFactory.class,
>>     "camelCase", "1", "preserveOriginal", "1");
>>     builder.addTokenFilter(StopFilterFactory.class);
>>
>>     builder.addTokenFilter(FlattenGraphFilterFactory.class);
>>                 Analyzer analyzer = builder.build();
>>
>>                 TokenStream ts = analyzer.tokenStream("*", new
>>     StringReader("x7in"));
>>                 ts.reset();
>>                 while(ts.incrementToken())
>>                     ;
>>
>>     This gives:
>>
>>     Exception in thread "main" java.lang.AssertionError: 2
>>         at
>>     org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:195)
>>         at
>>     org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:258)
>>         at
>>     com.wolfram.textsearch.AnalyzerError.main(AnalyzerError.java:32)
>>
>>     It's the interaction between WordDelimiterGraphFilter and stop
>>     word removal, it seems, that trigger an assertion when flattening.
>>
>>
>>     El 12/10/17 a las 19:18, Michael McCandless escribió:
>>>     Hmm, that's not good!  Clearly there is a bug somewhere.
>>>
>>>     Are you able to isolate a small example, e.g. text input and
>>>     synonyms you fed to SynonymGraphFilter, to show this assertion trip?
>>>
>>>     Are you using any custom analysis components before the
>>>     FlattenGraphFilter?
>>>
>>>     Mike McCandless
>>>
>>>     http://blog.mikemccandless.com
>>>
>>>     On Tue, Oct 10, 2017 at 11:24 AM, Nicolás Lichtmaier
>>>     <nicolasl@wolfram.com <ma...@wolfram.com>> wrote:
>>>
>>>         Hi!
>>>
>>>         I was getting an exception in FlattenGraphFilter and, as I
>>>         saw there was assertion statements nearby, I reran
>>>         everything with assertions enabled. And I see it crashes
>>>         here (FlattenGraphFilter.java:174)
>>>
>>>
>>>         At this point inputNode has all fields with -1 (except
>>>         nextOut, which is 0).. and outputFrom's value is 395.
>>>
>>>         The code is pretty complex, so before trying to undestand it
>>>         I thought maybe someone could know what's happening just
>>>         seeing this, maybe not. =)
>>>
>>>         I'll keep the debugging session open for a while in case
>>>         some more variables could be useful to debug this.
>>>
>>>         Thanks!
>>>
>>>
>>>

Re: FlattenGraphFilter assertion error

Posted by Michael McCandless <lu...@mikemccandless.com>.
Hello Nicolás,

Can you please open an issue for this?  Thanks.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Mar 7, 2019 at 10:08 PM Nicolás Lichtmaier <ni...@wolfram.com>
wrote:

> Oops, sorry... in that code there's a "camelCase" parameter that is not
> implemented in normal Lucene. That is an option I've added for better camel
> case support, but the bug happens without that option as well.
> El 7/3/19 a las 20:33, Nicolás Lichtmaier escribió:
>
> After a lot of time... Here's an small example that triggers that
> assertion.
>
>             Builder builder = CustomAnalyzer.builder();
>
>             builder.withTokenizer(StandardTokenizerFactory.class);
>             builder.addTokenFilter(WordDelimiterGraphFilterFactory.class,
> "camelCase", "1", "preserveOriginal", "1");
>             builder.addTokenFilter(StopFilterFactory.class);
>
>             builder.addTokenFilter(FlattenGraphFilterFactory.class);
>             Analyzer analyzer = builder.build();
>
>             TokenStream ts = analyzer.tokenStream("*", new
> StringReader("x7in"));
>             ts.reset();
>             while(ts.incrementToken())
>                 ;
>
> This gives:
>
> Exception in thread "main" java.lang.AssertionError: 2
>     at
> org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:195)
>     at
> org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:258)
>     at com.wolfram.textsearch.AnalyzerError.main(AnalyzerError.java:32)
>
> It's the interaction between WordDelimiterGraphFilter and stop word
> removal, it seems, that trigger an assertion when flattening.
>
>
> El 12/10/17 a las 19:18, Michael McCandless escribió:
>
> Hmm, that's not good!  Clearly there is a bug somewhere.
>
> Are you able to isolate a small example, e.g. text input and synonyms you
> fed to SynonymGraphFilter, to show this assertion trip?
>
> Are you using any custom analysis components before the FlattenGraphFilter?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Tue, Oct 10, 2017 at 11:24 AM, Nicolás Lichtmaier <nicolasl@wolfram.com
> > wrote:
>
>> Hi!
>>
>> I was getting an exception in FlattenGraphFilter and, as I saw there was
>> assertion statements nearby, I reran everything with assertions enabled.
>> And I see it crashes here (FlattenGraphFilter.java:174)
>>
>>
>> At this point inputNode has all fields with -1 (except nextOut, which is
>> 0).. and outputFrom's value is 395.
>>
>> The code is pretty complex, so before trying to undestand it I thought
>> maybe someone could know what's happening just seeing this, maybe not. =)
>>
>> I'll keep the debugging session open for a while in case some more
>> variables could be useful to debug this.
>>
>> Thanks!
>>
>>
>>
>

Re: FlattenGraphFilter assertion error

Posted by Nicolás Lichtmaier <ni...@wolfram.com.INVALID>.
Oops, sorry... in that code there's a "camelCase" parameter that is not 
implemented in normal Lucene. That is an option I've added for better 
camel case support, but the bug happens without that option as well.

El 7/3/19 a las 20:33, Nicolás Lichtmaier escribió:
>
> After a lot of time... Here's an small example that triggers that 
> assertion.
>
>             Builder builder = CustomAnalyzer.builder();
>
> builder.withTokenizer(StandardTokenizerFactory.class);
> builder.addTokenFilter(WordDelimiterGraphFilterFactory.class, 
> "camelCase", "1", "preserveOriginal", "1");
>             builder.addTokenFilter(StopFilterFactory.class);
>
> builder.addTokenFilter(FlattenGraphFilterFactory.class);
>             Analyzer analyzer = builder.build();
>
>             TokenStream ts = analyzer.tokenStream("*", new 
> StringReader("x7in"));
>             ts.reset();
>             while(ts.incrementToken())
>                 ;
>
> This gives:
>
> Exception in thread "main" java.lang.AssertionError: 2
>     at 
> org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:195)
>     at 
> org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:258)
>     at com.wolfram.textsearch.AnalyzerError.main(AnalyzerError.java:32)
>
> It's the interaction between WordDelimiterGraphFilter and stop word 
> removal, it seems, that trigger an assertion when flattening.
>
>
> El 12/10/17 a las 19:18, Michael McCandless escribió:
>> Hmm, that's not good!  Clearly there is a bug somewhere.
>>
>> Are you able to isolate a small example, e.g. text input and synonyms 
>> you fed to SynonymGraphFilter, to show this assertion trip?
>>
>> Are you using any custom analysis components before the 
>> FlattenGraphFilter?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Tue, Oct 10, 2017 at 11:24 AM, Nicolás Lichtmaier 
>> <nicolasl@wolfram.com <ma...@wolfram.com>> wrote:
>>
>>     Hi!
>>
>>     I was getting an exception in FlattenGraphFilter and, as I saw
>>     there was assertion statements nearby, I reran everything with
>>     assertions enabled. And I see it crashes here
>>     (FlattenGraphFilter.java:174)
>>
>>
>>     At this point inputNode has all fields with -1 (except nextOut,
>>     which is 0).. and outputFrom's value is 395.
>>
>>     The code is pretty complex, so before trying to undestand it I
>>     thought maybe someone could know what's happening just seeing
>>     this, maybe not. =)
>>
>>     I'll keep the debugging session open for a while in case some
>>     more variables could be useful to debug this.
>>
>>     Thanks!
>>
>>
>>