You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Nicolás Lichtmaier <ni...@wolfram.com.INVALID> on 2019/03/07 23:33:13 UTC
Re: FlattenGraphFilter assertion error
After a lot of time... Here's an small example that triggers that assertion.
Builder builder = CustomAnalyzer.builder();
builder.withTokenizer(StandardTokenizerFactory.class);
builder.addTokenFilter(WordDelimiterGraphFilterFactory.class,
"camelCase", "1", "preserveOriginal", "1");
builder.addTokenFilter(StopFilterFactory.class);
builder.addTokenFilter(FlattenGraphFilterFactory.class);
Analyzer analyzer = builder.build();
TokenStream ts = analyzer.tokenStream("*", new
StringReader("x7in"));
ts.reset();
while(ts.incrementToken())
;
This gives:
Exception in thread "main" java.lang.AssertionError: 2
at
org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:195)
at
org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:258)
at com.wolfram.textsearch.AnalyzerError.main(AnalyzerError.java:32)
It's the interaction between WordDelimiterGraphFilter and stop word
removal, it seems, that trigger an assertion when flattening.
El 12/10/17 a las 19:18, Michael McCandless escribió:
> Hmm, that's not good! Clearly there is a bug somewhere.
>
> Are you able to isolate a small example, e.g. text input and synonyms
> you fed to SynonymGraphFilter, to show this assertion trip?
>
> Are you using any custom analysis components before the
> FlattenGraphFilter?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Tue, Oct 10, 2017 at 11:24 AM, Nicolás Lichtmaier
> <nicolasl@wolfram.com <ma...@wolfram.com>> wrote:
>
> Hi!
>
> I was getting an exception in FlattenGraphFilter and, as I saw
> there was assertion statements nearby, I reran everything with
> assertions enabled. And I see it crashes here
> (FlattenGraphFilter.java:174)
>
>
> At this point inputNode has all fields with -1 (except nextOut,
> which is 0).. and outputFrom's value is 395.
>
> The code is pretty complex, so before trying to undestand it I
> thought maybe someone could know what's happening just seeing
> this, maybe not. =)
>
> I'll keep the debugging session open for a while in case some more
> variables could be useful to debug this.
>
> Thanks!
>
>
>
Re: FlattenGraphFilter assertion error
Posted by Nicolás Lichtmaier <ni...@wolfram.com.INVALID>.
I've created a Jira issue for this here:
https://issues.apache.org/jira/browse/LUCENE-8723
El 8/3/19 a las 00:08, Nicolás Lichtmaier escribió:
>
> Oops, sorry... in that code there's a "camelCase" parameter that is
> not implemented in normal Lucene. That is an option I've added for
> better camel case support, but the bug happens without that option as
> well.
>
> El 7/3/19 a las 20:33, Nicolás Lichtmaier escribió:
>>
>> After a lot of time... Here's an small example that triggers that
>> assertion.
>>
>> Builder builder = CustomAnalyzer.builder();
>>
>> builder.withTokenizer(StandardTokenizerFactory.class);
>> builder.addTokenFilter(WordDelimiterGraphFilterFactory.class,
>> "camelCase", "1", "preserveOriginal", "1");
>> builder.addTokenFilter(StopFilterFactory.class);
>>
>> builder.addTokenFilter(FlattenGraphFilterFactory.class);
>> Analyzer analyzer = builder.build();
>>
>> TokenStream ts = analyzer.tokenStream("*", new
>> StringReader("x7in"));
>> ts.reset();
>> while(ts.incrementToken())
>> ;
>>
>> This gives:
>>
>> Exception in thread "main" java.lang.AssertionError: 2
>> at
>> org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:195)
>> at
>> org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:258)
>> at com.wolfram.textsearch.AnalyzerError.main(AnalyzerError.java:32)
>>
>> It's the interaction between WordDelimiterGraphFilter and stop word
>> removal, it seems, that trigger an assertion when flattening.
>>
>>
>> El 12/10/17 a las 19:18, Michael McCandless escribió:
>>> Hmm, that's not good! Clearly there is a bug somewhere.
>>>
>>> Are you able to isolate a small example, e.g. text input and
>>> synonyms you fed to SynonymGraphFilter, to show this assertion trip?
>>>
>>> Are you using any custom analysis components before the
>>> FlattenGraphFilter?
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Tue, Oct 10, 2017 at 11:24 AM, Nicolás Lichtmaier
>>> <nicolasl@wolfram.com <ma...@wolfram.com>> wrote:
>>>
>>> Hi!
>>>
>>> I was getting an exception in FlattenGraphFilter and, as I saw
>>> there was assertion statements nearby, I reran everything with
>>> assertions enabled. And I see it crashes here
>>> (FlattenGraphFilter.java:174)
>>>
>>>
>>> At this point inputNode has all fields with -1 (except nextOut,
>>> which is 0).. and outputFrom's value is 395.
>>>
>>> The code is pretty complex, so before trying to undestand it I
>>> thought maybe someone could know what's happening just seeing
>>> this, maybe not. =)
>>>
>>> I'll keep the debugging session open for a while in case some
>>> more variables could be useful to debug this.
>>>
>>> Thanks!
>>>
>>>
>>>
Re: FlattenGraphFilter assertion error
Posted by Nicolás Lichtmaier <ni...@wolfram.com.INVALID>.
Yes, of course. It's here: https://issues.apache.org/jira/browse/LUCENE-8723
Thanks.
El 12/3/19 a las 12:59, Michael McCandless escribió:
> Hello Nicolás,
>
> Can you please open an issue for this? Thanks.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Mar 7, 2019 at 10:08 PM Nicolás Lichtmaier
> <nicolasl@wolfram.com <ma...@wolfram.com>> wrote:
>
> Oops, sorry... in that code there's a "camelCase" parameter that
> is not implemented in normal Lucene. That is an option I've added
> for better camel case support, but the bug happens without that
> option as well.
>
> El 7/3/19 a las 20:33, Nicolás Lichtmaier escribió:
>>
>> After a lot of time... Here's an small example that triggers that
>> assertion.
>>
>> Builder builder = CustomAnalyzer.builder();
>>
>> builder.withTokenizer(StandardTokenizerFactory.class);
>> builder.addTokenFilter(WordDelimiterGraphFilterFactory.class,
>> "camelCase", "1", "preserveOriginal", "1");
>> builder.addTokenFilter(StopFilterFactory.class);
>>
>> builder.addTokenFilter(FlattenGraphFilterFactory.class);
>> Analyzer analyzer = builder.build();
>>
>> TokenStream ts = analyzer.tokenStream("*", new
>> StringReader("x7in"));
>> ts.reset();
>> while(ts.incrementToken())
>> ;
>>
>> This gives:
>>
>> Exception in thread "main" java.lang.AssertionError: 2
>> at
>> org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:195)
>> at
>> org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:258)
>> at
>> com.wolfram.textsearch.AnalyzerError.main(AnalyzerError.java:32)
>>
>> It's the interaction between WordDelimiterGraphFilter and stop
>> word removal, it seems, that trigger an assertion when flattening.
>>
>>
>> El 12/10/17 a las 19:18, Michael McCandless escribió:
>>> Hmm, that's not good! Clearly there is a bug somewhere.
>>>
>>> Are you able to isolate a small example, e.g. text input and
>>> synonyms you fed to SynonymGraphFilter, to show this assertion trip?
>>>
>>> Are you using any custom analysis components before the
>>> FlattenGraphFilter?
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Tue, Oct 10, 2017 at 11:24 AM, Nicolás Lichtmaier
>>> <nicolasl@wolfram.com <ma...@wolfram.com>> wrote:
>>>
>>> Hi!
>>>
>>> I was getting an exception in FlattenGraphFilter and, as I
>>> saw there was assertion statements nearby, I reran
>>> everything with assertions enabled. And I see it crashes
>>> here (FlattenGraphFilter.java:174)
>>>
>>>
>>> At this point inputNode has all fields with -1 (except
>>> nextOut, which is 0).. and outputFrom's value is 395.
>>>
>>> The code is pretty complex, so before trying to undestand it
>>> I thought maybe someone could know what's happening just
>>> seeing this, maybe not. =)
>>>
>>> I'll keep the debugging session open for a while in case
>>> some more variables could be useful to debug this.
>>>
>>> Thanks!
>>>
>>>
>>>
Re: FlattenGraphFilter assertion error
Posted by Michael McCandless <lu...@mikemccandless.com>.
Hello Nicolás,
Can you please open an issue for this? Thanks.
Mike McCandless
http://blog.mikemccandless.com
On Thu, Mar 7, 2019 at 10:08 PM Nicolás Lichtmaier <ni...@wolfram.com>
wrote:
> Oops, sorry... in that code there's a "camelCase" parameter that is not
> implemented in normal Lucene. That is an option I've added for better camel
> case support, but the bug happens without that option as well.
> El 7/3/19 a las 20:33, Nicolás Lichtmaier escribió:
>
> After a lot of time... Here's an small example that triggers that
> assertion.
>
> Builder builder = CustomAnalyzer.builder();
>
> builder.withTokenizer(StandardTokenizerFactory.class);
> builder.addTokenFilter(WordDelimiterGraphFilterFactory.class,
> "camelCase", "1", "preserveOriginal", "1");
> builder.addTokenFilter(StopFilterFactory.class);
>
> builder.addTokenFilter(FlattenGraphFilterFactory.class);
> Analyzer analyzer = builder.build();
>
> TokenStream ts = analyzer.tokenStream("*", new
> StringReader("x7in"));
> ts.reset();
> while(ts.incrementToken())
> ;
>
> This gives:
>
> Exception in thread "main" java.lang.AssertionError: 2
> at
> org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:195)
> at
> org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:258)
> at com.wolfram.textsearch.AnalyzerError.main(AnalyzerError.java:32)
>
> It's the interaction between WordDelimiterGraphFilter and stop word
> removal, it seems, that trigger an assertion when flattening.
>
>
> El 12/10/17 a las 19:18, Michael McCandless escribió:
>
> Hmm, that's not good! Clearly there is a bug somewhere.
>
> Are you able to isolate a small example, e.g. text input and synonyms you
> fed to SynonymGraphFilter, to show this assertion trip?
>
> Are you using any custom analysis components before the FlattenGraphFilter?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Tue, Oct 10, 2017 at 11:24 AM, Nicolás Lichtmaier <nicolasl@wolfram.com
> > wrote:
>
>> Hi!
>>
>> I was getting an exception in FlattenGraphFilter and, as I saw there was
>> assertion statements nearby, I reran everything with assertions enabled.
>> And I see it crashes here (FlattenGraphFilter.java:174)
>>
>>
>> At this point inputNode has all fields with -1 (except nextOut, which is
>> 0).. and outputFrom's value is 395.
>>
>> The code is pretty complex, so before trying to undestand it I thought
>> maybe someone could know what's happening just seeing this, maybe not. =)
>>
>> I'll keep the debugging session open for a while in case some more
>> variables could be useful to debug this.
>>
>> Thanks!
>>
>>
>>
>
Re: FlattenGraphFilter assertion error
Posted by Nicolás Lichtmaier <ni...@wolfram.com.INVALID>.
Oops, sorry... in that code there's a "camelCase" parameter that is not
implemented in normal Lucene. That is an option I've added for better
camel case support, but the bug happens without that option as well.
El 7/3/19 a las 20:33, Nicolás Lichtmaier escribió:
>
> After a lot of time... Here's an small example that triggers that
> assertion.
>
> Builder builder = CustomAnalyzer.builder();
>
> builder.withTokenizer(StandardTokenizerFactory.class);
> builder.addTokenFilter(WordDelimiterGraphFilterFactory.class,
> "camelCase", "1", "preserveOriginal", "1");
> builder.addTokenFilter(StopFilterFactory.class);
>
> builder.addTokenFilter(FlattenGraphFilterFactory.class);
> Analyzer analyzer = builder.build();
>
> TokenStream ts = analyzer.tokenStream("*", new
> StringReader("x7in"));
> ts.reset();
> while(ts.incrementToken())
> ;
>
> This gives:
>
> Exception in thread "main" java.lang.AssertionError: 2
> at
> org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:195)
> at
> org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:258)
> at com.wolfram.textsearch.AnalyzerError.main(AnalyzerError.java:32)
>
> It's the interaction between WordDelimiterGraphFilter and stop word
> removal, it seems, that trigger an assertion when flattening.
>
>
> El 12/10/17 a las 19:18, Michael McCandless escribió:
>> Hmm, that's not good! Clearly there is a bug somewhere.
>>
>> Are you able to isolate a small example, e.g. text input and synonyms
>> you fed to SynonymGraphFilter, to show this assertion trip?
>>
>> Are you using any custom analysis components before the
>> FlattenGraphFilter?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Tue, Oct 10, 2017 at 11:24 AM, Nicolás Lichtmaier
>> <nicolasl@wolfram.com <ma...@wolfram.com>> wrote:
>>
>> Hi!
>>
>> I was getting an exception in FlattenGraphFilter and, as I saw
>> there was assertion statements nearby, I reran everything with
>> assertions enabled. And I see it crashes here
>> (FlattenGraphFilter.java:174)
>>
>>
>> At this point inputNode has all fields with -1 (except nextOut,
>> which is 0).. and outputFrom's value is 395.
>>
>> The code is pretty complex, so before trying to undestand it I
>> thought maybe someone could know what's happening just seeing
>> this, maybe not. =)
>>
>> I'll keep the debugging session open for a while in case some
>> more variables could be useful to debug this.
>>
>> Thanks!
>>
>>
>>