You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Robin Wojciki <ro...@gmail.com> on 2009/12/04 19:02:29 UTC

Solr 1.4: StringIndexOutOfBoundsException in SpellCheckComponent with HTMLStripCharFilterFactory

I am running a search in Solr 1.4 and I am getting the
StringIndexOutOfBoundsException pasted below. The spell check field
uses HTMLStripCharFilterFactory. However, the search works fine if I
do not use the HTMLStripCharFilterFactory.

If I set a breakpoint at SpellCheckComponent.java: 248, the value of
the variable "best" is as shown in the screenshot:
http://yfrog.com/j5solrdebuginspectp

At the end of first iteration, offset = 5 - (24 - 0) = -19
This causes the index out of bounds exception.

The spell check field is defined as:

        <fieldType name="text_spell" class="solr.TextField"
positionIncrementGap="100" >
            <analyzer>
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.StandardFilterFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt"
enablePositionIncrements="true"/>
                <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
            </analyzer>
        </fieldType>



Stack Trace:
=========
String index out of range: -19

java.lang.StringIndexOutOfBoundsException: String index out of range: -19
	at java.lang.AbstractStringBuilder.replace(Unknown Source)
	at java.lang.StringBuilder.replace(Unknown Source)
	at org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248)
	at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
	at org.mortbay.jetty.Server.handle(Server.java:285)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
	at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

Re: Solr 1.4: StringIndexOutOfBoundsException in SpellCheckComponent with HTMLStripCharFilterFactory

Posted by Robin Wojciki <ro...@gmail.com>.
Logged a ticket for Solr: https://issues.apache.org/jira/browse/SOLR-1630

Thanks,
Robin

On Mon, Dec 7, 2009 at 9:36 PM, Robin Wojciki <ro...@gmail.com> wrote:
> Koji,
>
> In the sample I sent, the exception comes only if the
> HTMLStripCharFilter is there.
>
> However, your test case seems to capture the essence. Sorry if I sent
> you on a wild goose chase.
>
> Thanks for taking the time! I will log a ticket.
> Robin
>
> On Mon, Dec 7, 2009 at 5:09 PM, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
>> Robin Wojciki wrote:
>>>
>>> Koji, I was able to create a minimal replication.
>>>
>>> Attached zip has solr.xml, solrconf.xml and Main.java. I was able to
>>> replicate the issue by replacing the conf files in
>>> apache-solr-1.4.0/example/solr/conf and running the class Main. Could
>>> please confirm if this replication is enough.
>>>
>>> Also, please let me know if I should log the ticket with Lucene or Solr.
>>>
>>> Thanks,
>>> Robin
>>>
>>
>> Robin,
>>
>> I reproduced the problem with your sample data, but it could be
>> reproduceable
>> without HTMLStripCharFilter ... I commented out HTML Strippers
>> in schema.xml and rebuild indexes with the following data:
>>
>> <add>
>>  <doc>
>>   <field name="id">debug-1</field>
>>   <field name="description">hello world WGKEKW AWEHGSE</field>
>>  </doc>
>> </add>
>>
>> still the exception occurred.
>>
>> Can you check it and open a JIRA issue for Solr?
>>
>> Thank you!
>>
>> Koji
>>
>> --
>> http://www.rondhuit.com/en/
>>
>>
>

Re: Solr 1.4: StringIndexOutOfBoundsException in SpellCheckComponent with HTMLStripCharFilterFactory

Posted by Robin Wojciki <ro...@gmail.com>.
Koji,

In the sample I sent, the exception comes only if the
HTMLStripCharFilter is there.

However, your test case seems to capture the essence. Sorry if I sent
you on a wild goose chase.

Thanks for taking the time! I will log a ticket.
Robin

On Mon, Dec 7, 2009 at 5:09 PM, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
> Robin Wojciki wrote:
>>
>> Koji, I was able to create a minimal replication.
>>
>> Attached zip has solr.xml, solrconf.xml and Main.java. I was able to
>> replicate the issue by replacing the conf files in
>> apache-solr-1.4.0/example/solr/conf and running the class Main. Could
>> please confirm if this replication is enough.
>>
>> Also, please let me know if I should log the ticket with Lucene or Solr.
>>
>> Thanks,
>> Robin
>>
>
> Robin,
>
> I reproduced the problem with your sample data, but it could be
> reproduceable
> without HTMLStripCharFilter ... I commented out HTML Strippers
> in schema.xml and rebuild indexes with the following data:
>
> <add>
>  <doc>
>   <field name="id">debug-1</field>
>   <field name="description">hello world WGKEKW AWEHGSE</field>
>  </doc>
> </add>
>
> still the exception occurred.
>
> Can you check it and open a JIRA issue for Solr?
>
> Thank you!
>
> Koji
>
> --
> http://www.rondhuit.com/en/
>
>

Re: Solr 1.4: StringIndexOutOfBoundsException in SpellCheckComponent with HTMLStripCharFilterFactory

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Robin Wojciki wrote:
> Koji, I was able to create a minimal replication.
>
> Attached zip has solr.xml, solrconf.xml and Main.java. I was able to
> replicate the issue by replacing the conf files in
> apache-solr-1.4.0/example/solr/conf and running the class Main. Could
> please confirm if this replication is enough.
>
> Also, please let me know if I should log the ticket with Lucene or Solr.
>
> Thanks,
> Robin
>   

Robin,

I reproduced the problem with your sample data, but it could be 
reproduceable
without HTMLStripCharFilter ... I commented out HTML Strippers
in schema.xml and rebuild indexes with the following data:

<add>
  <doc>
    <field name="id">debug-1</field>
    <field name="description">hello world WGKEKW AWEHGSE</field>
  </doc>
</add>

still the exception occurred.

Can you check it and open a JIRA issue for Solr?

Thank you!

Koji

-- 
http://www.rondhuit.com/en/


Re: Solr 1.4: StringIndexOutOfBoundsException in SpellCheckComponent with HTMLStripCharFilterFactory

Posted by Robin Wojciki <ro...@gmail.com>.
Koji, I was able to create a minimal replication.

Attached zip has solr.xml, solrconf.xml and Main.java. I was able to
replicate the issue by replacing the conf files in
apache-solr-1.4.0/example/solr/conf and running the class Main. Could
please confirm if this replication is enough.

Also, please let me know if I should log the ticket with Lucene or Solr.

Thanks,
Robin

On Sat, Dec 5, 2009 at 8:49 PM, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
> Robin Wojciki wrote:
>>
>> I am running a search in Solr 1.4 and I am getting the
>> StringIndexOutOfBoundsException pasted below. The spell check field
>> uses HTMLStripCharFilterFactory. However, the search works fine if I
>> do not use the HTMLStripCharFilterFactory.
>>
>> If I set a breakpoint at SpellCheckComponent.java: 248, the value of
>> the variable "best" is as shown in the screenshot:
>> http://yfrog.com/j5solrdebuginspectp
>>
>> At the end of first iteration, offset = 5 - (24 - 0) = -19
>> This causes the index out of bounds exception.
>>
>> The spell check field is defined as:
>>
>>        <fieldType name="text_spell" class="solr.TextField"
>> positionIncrementGap="100" >
>>            <analyzer>
>>                <charFilter class="solr.HTMLStripCharFilterFactory"/>
>>                <tokenizer class="solr.StandardTokenizerFactory"/>
>>                <filter class="solr.StandardFilterFactory"/>
>>                <filter class="solr.LowerCaseFilterFactory"/>
>>                <filter class="solr.StopFilterFactory"
>> ignoreCase="true" words="stopwords.txt"
>> enablePositionIncrements="true"/>
>>                <filter class="solr.SynonymFilterFactory"
>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>            </analyzer>
>>        </fieldType>
>>
>>
>>
>> Stack Trace:
>> =========
>> String index out of range: -19
>>
>> java.lang.StringIndexOutOfBoundsException: String index out of range: -19
>>        at java.lang.AbstractStringBuilder.replace(Unknown Source)
>>        at java.lang.StringBuilder.replace(Unknown Source)
>>        at
>> org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248)
>>        at
>> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143)
>>        at
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
>>        at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>>        at
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>>        at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>>        at
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
>>        at
>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
>>        at
>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>>        at
>> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>>        at
>> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
>>        at
>> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>>        at
>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
>>        at
>> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>>        at
>> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
>>        at org.mortbay.jetty.Server.handle(Server.java:285)
>>        at
>> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
>>        at
>> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
>>        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
>>        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
>>        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
>>        at
>> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
>>        at
>> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
>>
>>
>
> I couldn't reproduce it with simple test data.
> Can you open a JIRA and attach a test case that reproduces
> the problem with spellchecker definition in solrconfig.xml.
>
> Koji
>
> --
> http://www.rondhuit.com/en/
>
>

Re: Solr 1.4: StringIndexOutOfBoundsException in SpellCheckComponent with HTMLStripCharFilterFactory

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Robin Wojciki wrote:
> I am running a search in Solr 1.4 and I am getting the
> StringIndexOutOfBoundsException pasted below. The spell check field
> uses HTMLStripCharFilterFactory. However, the search works fine if I
> do not use the HTMLStripCharFilterFactory.
>
> If I set a breakpoint at SpellCheckComponent.java: 248, the value of
> the variable "best" is as shown in the screenshot:
> http://yfrog.com/j5solrdebuginspectp
>
> At the end of first iteration, offset = 5 - (24 - 0) = -19
> This causes the index out of bounds exception.
>
> The spell check field is defined as:
>
>         <fieldType name="text_spell" class="solr.TextField"
> positionIncrementGap="100" >
>             <analyzer>
>                 <charFilter class="solr.HTMLStripCharFilterFactory"/>
>                 <tokenizer class="solr.StandardTokenizerFactory"/>
>                 <filter class="solr.StandardFilterFactory"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="stopwords.txt"
> enablePositionIncrements="true"/>
>                 <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>                 <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>             </analyzer>
>         </fieldType>
>
>
>
> Stack Trace:
> =========
> String index out of range: -19
>
> java.lang.StringIndexOutOfBoundsException: String index out of range: -19
> 	at java.lang.AbstractStringBuilder.replace(Unknown Source)
> 	at java.lang.StringBuilder.replace(Unknown Source)
> 	at org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248)
> 	at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143)
> 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> 	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> 	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
> 	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
> 	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> 	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> 	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
> 	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> 	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
> 	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> 	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
> 	at org.mortbay.jetty.Server.handle(Server.java:285)
> 	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
> 	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
> 	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
> 	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
> 	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
> 	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
> 	at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
>
>   
I couldn't reproduce it with simple test data.
Can you open a JIRA and attach a test case that reproduces
the problem with spellchecker definition in solrconfig.xml.

Koji

-- 
http://www.rondhuit.com/en/