You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Matt Pearce <ma...@flax.co.uk> on 2015/12/03 17:40:05 UTC

Spellcheck error

Hi,

We're using Solr 5.3.1, and we're getting a 
StringIndexOutOfBoundsException from the SpellCheckCollator. I've done 
some investigation, and it looks like the problem is that the corrected 
string is shorter than the original query.

For example, the search term is "theatre", the suggested correction is 
"there". The error is being thrown when replacing the original query 
with the shorter replacement.

This is the stack trace:
java.lang.StringIndexOutOfBoundsException: String index out of range: -2
     at 
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:824)
     at java.lang.StringBuilder.replace(StringBuilder.java:262)
     at 
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:235)
     at 
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:92)
     at 
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:237)
     at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:202)
     at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)

The error looks very similar to those described in 
https://issues.apache.org/jira/browse/SOLR-4489, 
https://issues.apache.org/jira/browse/SOLR-3608 and 
https://issues.apache.org/jira/browse/SOLR-2509, most of which are closed.

Any suggestions would be appreciated, or should I open a JIRA ticket?

Thanks,

Matt

-- 
Matt Pearce
Flax - Open Source Enterprise Search
www.flax.co.uk


Re: Spellcheck error

Posted by Matt Pearce <ma...@flax.co.uk>.
Hi James,

Thanks for responding.

The query we were testing looks like this:
http://localhost:8983/solr/testdata/select?q=theatre&spellcheck.q=theatre

I did some further investigation, after discovering that omitting the 
spellcheck.q parameter stops the error appearing, and it looks like 
synonym expansion is playing a part in the problem. The spellcheck field 
is essentially the same as text_general in the example schema, with the 
substitution of HTMLStripCharFilterFactory instead of the 
StandardTokenizerFactory at index time:

     <fieldType name="text_html" class="solr.TextField" 
positionIncrementGap="100">
       <analyzer type="index">
         <charFilter class="solr.HTMLStripCharFilterFactory"/>
         <tokenizer class="solr.StandardTokenizerFactory"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt" />
         <filter class="solr.LowerCaseFilterFactory"/>
       </analyzer>
       <analyzer type="query">
         <tokenizer class="solr.StandardTokenizerFactory"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt" />
         <filter class="solr.SynonymFilterFactory" 
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
         <filter class="solr.LowerCaseFilterFactory"/>
       </analyzer>
     </fieldType>

With synonyms enabled, spellcheck.q=theatre is being expanded to seven 
tokens - theatre (3 times), theater, playhouse, studio and workshop. If 
I disable synonyms in the query analyser, "theatre" is used on its own, 
and the error doesn't happen (this is the same behaviour as when I omit 
spellcheck.q).

So, it looks like the quick solution is to disable synonyms in the query 
analyser for that field. I'll do some further investigation tomorrow to 
see if I can figure out why the synonym expansion triggers the problem 
while neither "theatre" nor "theater" on their own do (I can't imagine 
the other three variants are going to make "there" appear as a spelling 
correction).

Cheers,

Matt

On 03/12/15 18:53, Dyer, James wrote:
> Matt,
>
> Can you give some information about how your spellcheck field is analyzed and also if you're using a custom query converter.  Also, try and place the bare terms you want checked in spellcheck.q (ex, if your query is q=+movie +theatre, then spellcheck.q=movie theatre).  Does it work in this case?  Also, could you give the exact query you're using?
>
> This is the very same bug as in the 3 tickets you mention.  We clearly haven't solved all of the possible ways this bug can be triggered.  But we cannot fix this unless we can come up with a unit test that reliably reproduces it.  At the very least, we should handle these problems better than throwing SIOOB like this.
>
> Long term, there is probably a better design we could come up with for how terms are identified within queries and how collations are generated.
>
> James Dyer
> Ingram Content Group
>
>
> -----Original Message-----
> From: Matt Pearce [mailto:matt@flax.co.uk]
> Sent: Thursday, December 03, 2015 10:40 AM
> To: solr-user
> Subject: Spellcheck error
>
> Hi,
>
> We're using Solr 5.3.1, and we're getting a
> StringIndexOutOfBoundsException from the SpellCheckCollator. I've done
> some investigation, and it looks like the problem is that the corrected
> string is shorter than the original query.
>
> For example, the search term is "theatre", the suggested correction is
> "there". The error is being thrown when replacing the original query
> with the shorter replacement.
>
> This is the stack trace:
> java.lang.StringIndexOutOfBoundsException: String index out of range: -2
>       at
> java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:824)
>       at java.lang.StringBuilder.replace(StringBuilder.java:262)
>       at
> org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:235)
>       at
> org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:92)
>       at
> org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:237)
>       at
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:202)
>       at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)
>
> The error looks very similar to those described in
> https://issues.apache.org/jira/browse/SOLR-4489,
> https://issues.apache.org/jira/browse/SOLR-3608 and
> https://issues.apache.org/jira/browse/SOLR-2509, most of which are closed.
>
> Any suggestions would be appreciated, or should I open a JIRA ticket?
>
> Thanks,
>
> Matt
>

-- 
Matt Pearce
Flax - Open Source Enterprise Search
www.flax.co.uk


RE: Spellcheck error

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
Matt,

Can you give some information about how your spellcheck field is analyzed and also if you're using a custom query converter.  Also, try and place the bare terms you want checked in spellcheck.q (ex, if your query is q=+movie +theatre, then spellcheck.q=movie theatre).  Does it work in this case?  Also, could you give the exact query you're using?

This is the very same bug as in the 3 tickets you mention.  We clearly haven't solved all of the possible ways this bug can be triggered.  But we cannot fix this unless we can come up with a unit test that reliably reproduces it.  At the very least, we should handle these problems better than throwing SIOOB like this.

Long term, there is probably a better design we could come up with for how terms are identified within queries and how collations are generated.

James Dyer
Ingram Content Group


-----Original Message-----
From: Matt Pearce [mailto:matt@flax.co.uk] 
Sent: Thursday, December 03, 2015 10:40 AM
To: solr-user
Subject: Spellcheck error

Hi,

We're using Solr 5.3.1, and we're getting a 
StringIndexOutOfBoundsException from the SpellCheckCollator. I've done 
some investigation, and it looks like the problem is that the corrected 
string is shorter than the original query.

For example, the search term is "theatre", the suggested correction is 
"there". The error is being thrown when replacing the original query 
with the shorter replacement.

This is the stack trace:
java.lang.StringIndexOutOfBoundsException: String index out of range: -2
     at 
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:824)
     at java.lang.StringBuilder.replace(StringBuilder.java:262)
     at 
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:235)
     at 
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:92)
     at 
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:237)
     at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:202)
     at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)

The error looks very similar to those described in 
https://issues.apache.org/jira/browse/SOLR-4489, 
https://issues.apache.org/jira/browse/SOLR-3608 and 
https://issues.apache.org/jira/browse/SOLR-2509, most of which are closed.

Any suggestions would be appreciated, or should I open a JIRA ticket?

Thanks,

Matt

-- 
Matt Pearce
Flax - Open Source Enterprise Search
www.flax.co.uk