You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Anders Melchiorsen (JIRA)" <ji...@apache.org> on 2009/09/02 17:47:32 UTC

[jira] Created: (SOLR-1404) Random failures with highlighting

Random failures with highlighting
---------------------------------

                 Key: SOLR-1404
                 URL: https://issues.apache.org/jira/browse/SOLR-1404
             Project: Solr
          Issue Type: Bug
          Components: Analysis, highlighter
    Affects Versions: 1.4
            Reporter: Anders Melchiorsen


With a recent Solr nightly, we started getting errors when highlighting.

I have not been able to reduce our real setup to a minimal one that is failing, but the same error seems to pop up with the configuration below. Note that the QUERY will mostly fail, but it will work sometimes. Notably, after running "java -jar start.jar", the QUERY will work the first time, but then start failing for a while. Seems that something is not being reset properly.

The example uses the deprecated HTMLStripWhitespaceTokenizerFactory but the problem apparently also exists with other tokenizers; I was just unable to create a minimal example with other configurations.


SCHEMA

<?xml version="1.0" encoding="UTF-8" ?>

<schema name="example" version="1.2">

  <types>
    <fieldType name="string" class="solr.StrField" />

    <fieldtype name="testtype" class="solr.TextField">
      <analyzer>
        <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory" />
      </analyzer>
    </fieldtype>
 </types>

 <fields>
   <field name="id" type="string" indexed="true" stored="false" />
   <field name="test" type="testtype" indexed="false" stored="true" />
 </fields>

 <uniqueKey>id</uniqueKey>

</schema>

INDEX

URL=http://localhost:8983/solr/update

curl $URL --data-binary '<add><doc><field name="id">1</field><field name="test">test</field></doc></add>' -H 'Content-type:text/xml; charset=utf-8'
curl $URL --data-binary '<commit/>' -H 'Content-type:text/xml; charset=utf-8'

QUERY

curl 'http://localhost:8983/solr/select/?hl.fl=test&hl=true&q=id:1'

ERROR

org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4

org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:328)
	at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
	at org.mortbay.jetty.Server.handle(Server.java:285)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
	at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
	at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:254)
	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:321)
	... 23 more


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1404) Random failures with highlighting

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753779#action_12753779 ] 

Jason Rutherglen commented on SOLR-1404:
----------------------------------------

It will be good to get this fixed, I have experienced problems in analyzing because of the bug and reverted back to HTMLStripReader.  

> Random failures with highlighting
> ---------------------------------
>
>                 Key: SOLR-1404
>                 URL: https://issues.apache.org/jira/browse/SOLR-1404
>             Project: Solr
>          Issue Type: Bug
>          Components: Analysis, highlighter
>    Affects Versions: 1.4
>            Reporter: Anders Melchiorsen
>             Fix For: 1.4
>
>         Attachments: SOLR-1404.patch
>
>
> With a recent Solr nightly, we started getting errors when highlighting.
> I have not been able to reduce our real setup to a minimal one that is failing, but the same error seems to pop up with the configuration below. Note that the QUERY will mostly fail, but it will work sometimes. Notably, after running "java -jar start.jar", the QUERY will work the first time, but then start failing for a while. Seems that something is not being reset properly.
> The example uses the deprecated HTMLStripWhitespaceTokenizerFactory but the problem apparently also exists with other tokenizers; I was just unable to create a minimal example with other configurations.
> SCHEMA
> <?xml version="1.0" encoding="UTF-8" ?>
> <schema name="example" version="1.2">
>   <types>
>     <fieldType name="string" class="solr.StrField" />
>     <fieldtype name="testtype" class="solr.TextField">
>       <analyzer>
>         <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory" />
>       </analyzer>
>     </fieldtype>
>  </types>
>  <fields>
>    <field name="id" type="string" indexed="true" stored="false" />
>    <field name="test" type="testtype" indexed="false" stored="true" />
>  </fields>
>  <uniqueKey>id</uniqueKey>
> </schema>
> INDEX
> URL=http://localhost:8983/solr/update
> curl $URL --data-binary '<add><doc><field name="id">1</field><field name="test">test</field></doc></add>' -H 'Content-type:text/xml; charset=utf-8'
> curl $URL --data-binary '<commit/>' -H 'Content-type:text/xml; charset=utf-8'
> QUERY
> curl 'http://localhost:8983/solr/select/?hl.fl=test&hl=true&q=id:1'
> ERROR
> org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> 	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:328)
> 	at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
> 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
> 	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> 	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
> 	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
> 	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> 	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> 	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
> 	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> 	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
> 	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> 	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
> 	at org.mortbay.jetty.Server.handle(Server.java:285)
> 	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
> 	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
> 	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
> 	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
> 	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
> 	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
> 	at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
> Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> 	at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:254)
> 	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:321)
> 	... 23 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1404) Random failures with highlighting

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754162#action_12754162 ] 

Uwe Schindler commented on SOLR-1404:
-------------------------------------

bq. Will LUCENE-1906 fix it (in an alternate way)?

It should fix it. Lucene Tokenizer now do not have separate methods for CharStream anymore. They are simply handled as Readers. The trap of overwriting the wrong method should be fixed now. The offset correction is now done conditionally if the Reader is a CharStream subclass.

> Random failures with highlighting
> ---------------------------------
>
>                 Key: SOLR-1404
>                 URL: https://issues.apache.org/jira/browse/SOLR-1404
>             Project: Solr
>          Issue Type: Bug
>          Components: Analysis, highlighter
>    Affects Versions: 1.4
>            Reporter: Anders Melchiorsen
>             Fix For: 1.4
>
>         Attachments: SOLR-1404.patch
>
>
> With a recent Solr nightly, we started getting errors when highlighting.
> I have not been able to reduce our real setup to a minimal one that is failing, but the same error seems to pop up with the configuration below. Note that the QUERY will mostly fail, but it will work sometimes. Notably, after running "java -jar start.jar", the QUERY will work the first time, but then start failing for a while. Seems that something is not being reset properly.
> The example uses the deprecated HTMLStripWhitespaceTokenizerFactory but the problem apparently also exists with other tokenizers; I was just unable to create a minimal example with other configurations.
> SCHEMA
> <?xml version="1.0" encoding="UTF-8" ?>
> <schema name="example" version="1.2">
>   <types>
>     <fieldType name="string" class="solr.StrField" />
>     <fieldtype name="testtype" class="solr.TextField">
>       <analyzer>
>         <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory" />
>       </analyzer>
>     </fieldtype>
>  </types>
>  <fields>
>    <field name="id" type="string" indexed="true" stored="false" />
>    <field name="test" type="testtype" indexed="false" stored="true" />
>  </fields>
>  <uniqueKey>id</uniqueKey>
> </schema>
> INDEX
> URL=http://localhost:8983/solr/update
> curl $URL --data-binary '<add><doc><field name="id">1</field><field name="test">test</field></doc></add>' -H 'Content-type:text/xml; charset=utf-8'
> curl $URL --data-binary '<commit/>' -H 'Content-type:text/xml; charset=utf-8'
> QUERY
> curl 'http://localhost:8983/solr/select/?hl.fl=test&hl=true&q=id:1'
> ERROR
> org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> 	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:328)
> 	at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
> 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
> 	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> 	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
> 	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
> 	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> 	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> 	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
> 	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> 	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
> 	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> 	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
> 	at org.mortbay.jetty.Server.handle(Server.java:285)
> 	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
> 	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
> 	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
> 	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
> 	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
> 	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
> 	at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
> Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> 	at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:254)
> 	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:321)
> 	... 23 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1404) Random failures with highlighting

Posted by "Igor Motov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753794#action_12753794 ] 

Igor Motov commented on SOLR-1404:
----------------------------------

{quote}
Will LUCENE-1906 fix it (in an alternate way)?
{quote}
I guess it depends on how they will decide to resolve it. But it looks like the same issues, that's for sure.

> Random failures with highlighting
> ---------------------------------
>
>                 Key: SOLR-1404
>                 URL: https://issues.apache.org/jira/browse/SOLR-1404
>             Project: Solr
>          Issue Type: Bug
>          Components: Analysis, highlighter
>    Affects Versions: 1.4
>            Reporter: Anders Melchiorsen
>             Fix For: 1.4
>
>         Attachments: SOLR-1404.patch
>
>
> With a recent Solr nightly, we started getting errors when highlighting.
> I have not been able to reduce our real setup to a minimal one that is failing, but the same error seems to pop up with the configuration below. Note that the QUERY will mostly fail, but it will work sometimes. Notably, after running "java -jar start.jar", the QUERY will work the first time, but then start failing for a while. Seems that something is not being reset properly.
> The example uses the deprecated HTMLStripWhitespaceTokenizerFactory but the problem apparently also exists with other tokenizers; I was just unable to create a minimal example with other configurations.
> SCHEMA
> <?xml version="1.0" encoding="UTF-8" ?>
> <schema name="example" version="1.2">
>   <types>
>     <fieldType name="string" class="solr.StrField" />
>     <fieldtype name="testtype" class="solr.TextField">
>       <analyzer>
>         <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory" />
>       </analyzer>
>     </fieldtype>
>  </types>
>  <fields>
>    <field name="id" type="string" indexed="true" stored="false" />
>    <field name="test" type="testtype" indexed="false" stored="true" />
>  </fields>
>  <uniqueKey>id</uniqueKey>
> </schema>
> INDEX
> URL=http://localhost:8983/solr/update
> curl $URL --data-binary '<add><doc><field name="id">1</field><field name="test">test</field></doc></add>' -H 'Content-type:text/xml; charset=utf-8'
> curl $URL --data-binary '<commit/>' -H 'Content-type:text/xml; charset=utf-8'
> QUERY
> curl 'http://localhost:8983/solr/select/?hl.fl=test&hl=true&q=id:1'
> ERROR
> org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> 	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:328)
> 	at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
> 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
> 	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> 	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
> 	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
> 	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> 	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> 	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
> 	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> 	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
> 	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> 	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
> 	at org.mortbay.jetty.Server.handle(Server.java:285)
> 	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
> 	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
> 	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
> 	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
> 	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
> 	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
> 	at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
> Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> 	at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:254)
> 	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:321)
> 	... 23 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1404) Random failures with highlighting

Posted by "Anders Melchiorsen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752044#action_12752044 ] 

Anders Melchiorsen commented on SOLR-1404:
------------------------------------------

Hi Igor, thanks for the patch.

It does seem to work for me. I will leave it for others to decide whether it is the best fix. If the issue is not fixed at a lower layer, note that the HTMLStripStandardTokenizerFactory seems to have a similar problem.

I reported that this problem exists with other tokenizers as well, including the HTMLStripCharFilterFactory+WhitespaceTokenizerFactory combo that you recommend. Today, however, I cannot reproduce that behaviour. As I have been reporting several issues, I find it likely that I have been confused by having multiple configurations running at the same time.


> Random failures with highlighting
> ---------------------------------
>
>                 Key: SOLR-1404
>                 URL: https://issues.apache.org/jira/browse/SOLR-1404
>             Project: Solr
>          Issue Type: Bug
>          Components: Analysis, highlighter
>    Affects Versions: 1.4
>            Reporter: Anders Melchiorsen
>             Fix For: 1.4
>
>         Attachments: SOLR-1404.patch
>
>
> With a recent Solr nightly, we started getting errors when highlighting.
> I have not been able to reduce our real setup to a minimal one that is failing, but the same error seems to pop up with the configuration below. Note that the QUERY will mostly fail, but it will work sometimes. Notably, after running "java -jar start.jar", the QUERY will work the first time, but then start failing for a while. Seems that something is not being reset properly.
> The example uses the deprecated HTMLStripWhitespaceTokenizerFactory but the problem apparently also exists with other tokenizers; I was just unable to create a minimal example with other configurations.
> SCHEMA
> <?xml version="1.0" encoding="UTF-8" ?>
> <schema name="example" version="1.2">
>   <types>
>     <fieldType name="string" class="solr.StrField" />
>     <fieldtype name="testtype" class="solr.TextField">
>       <analyzer>
>         <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory" />
>       </analyzer>
>     </fieldtype>
>  </types>
>  <fields>
>    <field name="id" type="string" indexed="true" stored="false" />
>    <field name="test" type="testtype" indexed="false" stored="true" />
>  </fields>
>  <uniqueKey>id</uniqueKey>
> </schema>
> INDEX
> URL=http://localhost:8983/solr/update
> curl $URL --data-binary '<add><doc><field name="id">1</field><field name="test">test</field></doc></add>' -H 'Content-type:text/xml; charset=utf-8'
> curl $URL --data-binary '<commit/>' -H 'Content-type:text/xml; charset=utf-8'
> QUERY
> curl 'http://localhost:8983/solr/select/?hl.fl=test&hl=true&q=id:1'
> ERROR
> org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> 	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:328)
> 	at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
> 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
> 	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> 	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
> 	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
> 	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> 	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> 	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
> 	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> 	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
> 	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> 	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
> 	at org.mortbay.jetty.Server.handle(Server.java:285)
> 	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
> 	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
> 	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
> 	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
> 	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
> 	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
> 	at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
> Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> 	at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:254)
> 	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:321)
> 	... 23 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-1404) Random failures with highlighting

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley resolved SOLR-1404.
--------------------------------

    Resolution: Fixed

This has been fixed with the update of Lucene 2.9 RC4

> Random failures with highlighting
> ---------------------------------
>
>                 Key: SOLR-1404
>                 URL: https://issues.apache.org/jira/browse/SOLR-1404
>             Project: Solr
>          Issue Type: Bug
>          Components: Analysis, highlighter
>    Affects Versions: 1.4
>            Reporter: Anders Melchiorsen
>             Fix For: 1.4
>
>         Attachments: SOLR-1404.patch
>
>
> With a recent Solr nightly, we started getting errors when highlighting.
> I have not been able to reduce our real setup to a minimal one that is failing, but the same error seems to pop up with the configuration below. Note that the QUERY will mostly fail, but it will work sometimes. Notably, after running "java -jar start.jar", the QUERY will work the first time, but then start failing for a while. Seems that something is not being reset properly.
> The example uses the deprecated HTMLStripWhitespaceTokenizerFactory but the problem apparently also exists with other tokenizers; I was just unable to create a minimal example with other configurations.
> SCHEMA
> <?xml version="1.0" encoding="UTF-8" ?>
> <schema name="example" version="1.2">
>   <types>
>     <fieldType name="string" class="solr.StrField" />
>     <fieldtype name="testtype" class="solr.TextField">
>       <analyzer>
>         <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory" />
>       </analyzer>
>     </fieldtype>
>  </types>
>  <fields>
>    <field name="id" type="string" indexed="true" stored="false" />
>    <field name="test" type="testtype" indexed="false" stored="true" />
>  </fields>
>  <uniqueKey>id</uniqueKey>
> </schema>
> INDEX
> URL=http://localhost:8983/solr/update
> curl $URL --data-binary '<add><doc><field name="id">1</field><field name="test">test</field></doc></add>' -H 'Content-type:text/xml; charset=utf-8'
> curl $URL --data-binary '<commit/>' -H 'Content-type:text/xml; charset=utf-8'
> QUERY
> curl 'http://localhost:8983/solr/select/?hl.fl=test&hl=true&q=id:1'
> ERROR
> org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> 	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:328)
> 	at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
> 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
> 	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> 	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
> 	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
> 	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> 	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> 	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
> 	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> 	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
> 	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> 	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
> 	at org.mortbay.jetty.Server.handle(Server.java:285)
> 	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
> 	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
> 	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
> 	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
> 	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
> 	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
> 	at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
> Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> 	at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:254)
> 	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:321)
> 	... 23 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1404) Random failures with highlighting

Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753686#action_12753686 ] 

Koji Sekiguchi commented on SOLR-1404:
--------------------------------------

bq. A better fix, perhaps, would be implementing reset(CharStream input) in CharTokenizer in Lucene. 

Will LUCENE-1906 fix it (in an alternate way)?

> Random failures with highlighting
> ---------------------------------
>
>                 Key: SOLR-1404
>                 URL: https://issues.apache.org/jira/browse/SOLR-1404
>             Project: Solr
>          Issue Type: Bug
>          Components: Analysis, highlighter
>    Affects Versions: 1.4
>            Reporter: Anders Melchiorsen
>             Fix For: 1.4
>
>         Attachments: SOLR-1404.patch
>
>
> With a recent Solr nightly, we started getting errors when highlighting.
> I have not been able to reduce our real setup to a minimal one that is failing, but the same error seems to pop up with the configuration below. Note that the QUERY will mostly fail, but it will work sometimes. Notably, after running "java -jar start.jar", the QUERY will work the first time, but then start failing for a while. Seems that something is not being reset properly.
> The example uses the deprecated HTMLStripWhitespaceTokenizerFactory but the problem apparently also exists with other tokenizers; I was just unable to create a minimal example with other configurations.
> SCHEMA
> <?xml version="1.0" encoding="UTF-8" ?>
> <schema name="example" version="1.2">
>   <types>
>     <fieldType name="string" class="solr.StrField" />
>     <fieldtype name="testtype" class="solr.TextField">
>       <analyzer>
>         <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory" />
>       </analyzer>
>     </fieldtype>
>  </types>
>  <fields>
>    <field name="id" type="string" indexed="true" stored="false" />
>    <field name="test" type="testtype" indexed="false" stored="true" />
>  </fields>
>  <uniqueKey>id</uniqueKey>
> </schema>
> INDEX
> URL=http://localhost:8983/solr/update
> curl $URL --data-binary '<add><doc><field name="id">1</field><field name="test">test</field></doc></add>' -H 'Content-type:text/xml; charset=utf-8'
> curl $URL --data-binary '<commit/>' -H 'Content-type:text/xml; charset=utf-8'
> QUERY
> curl 'http://localhost:8983/solr/select/?hl.fl=test&hl=true&q=id:1'
> ERROR
> org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> 	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:328)
> 	at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
> 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
> 	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> 	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
> 	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
> 	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> 	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> 	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
> 	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> 	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
> 	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> 	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
> 	at org.mortbay.jetty.Server.handle(Server.java:285)
> 	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
> 	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
> 	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
> 	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
> 	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
> 	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
> 	at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
> Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> 	at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:254)
> 	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:321)
> 	... 23 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1404) Random failures with highlighting

Posted by "Igor Motov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751797#action_12751797 ] 

Igor Motov commented on SOLR-1404:
----------------------------------

First of all, HTMLStripWhitespaceTokenizerFactory is deprecated, so it might be better to just replace it with: HTMLStripCharFilterFactory and WhitespaceTokenizerFactory

{code:xml}
<analyzer>
  <charFilter class="solr.HTMLStripCharFilterFactory" />
  <tokenizer class="solr.WhitespaceTokenizerFactory" />
</analyzer>
{code}

Anyway, there seems to be a bug in reseting a token stream created by the HTMLStripWhitespaceTokenizerFactory. That's why the test works the first time when the token stream is created and fails the next time when it's reused. The problem might have been introduced in revision 802286 (see [SOLR-1343|http://issues.apache.org/jira/browse/SOLR-1343]), when HTMLStripReader, which was a Reader, became HTMLStripCharFilter, which is CharStream. As a result, super.reset in the following code changed from reset(CharStream input) to  reset(Reader input)

{code}
public class HTMLStripWhitespaceTokenizerFactory extends BaseTokenizerFactory {
  public Tokenizer create(Reader input) {
    return new WhitespaceTokenizer(new HTMLStripReader(input)) {
      @Override
      public void reset(Reader input) throws IOException {
        super.reset(new HTMLStripReader(input));
      }
    };
  }
}
{code}

WhitespaceTokenizer inherits from CharTokenizer. But CharTokenizer implements only reset(Reader input) and doesn't reset the stream on reset(CharStream input) which is now called. The simplest fix is to explicitly call super.reset(Reader input). A better fix, perhaps, would be implementing reset(CharStream input) in CharTokenizer in Lucene. 


> Random failures with highlighting
> ---------------------------------
>
>                 Key: SOLR-1404
>                 URL: https://issues.apache.org/jira/browse/SOLR-1404
>             Project: Solr
>          Issue Type: Bug
>          Components: Analysis, highlighter
>    Affects Versions: 1.4
>            Reporter: Anders Melchiorsen
>             Fix For: 1.4
>
>
> With a recent Solr nightly, we started getting errors when highlighting.
> I have not been able to reduce our real setup to a minimal one that is failing, but the same error seems to pop up with the configuration below. Note that the QUERY will mostly fail, but it will work sometimes. Notably, after running "java -jar start.jar", the QUERY will work the first time, but then start failing for a while. Seems that something is not being reset properly.
> The example uses the deprecated HTMLStripWhitespaceTokenizerFactory but the problem apparently also exists with other tokenizers; I was just unable to create a minimal example with other configurations.
> SCHEMA
> <?xml version="1.0" encoding="UTF-8" ?>
> <schema name="example" version="1.2">
>   <types>
>     <fieldType name="string" class="solr.StrField" />
>     <fieldtype name="testtype" class="solr.TextField">
>       <analyzer>
>         <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory" />
>       </analyzer>
>     </fieldtype>
>  </types>
>  <fields>
>    <field name="id" type="string" indexed="true" stored="false" />
>    <field name="test" type="testtype" indexed="false" stored="true" />
>  </fields>
>  <uniqueKey>id</uniqueKey>
> </schema>
> INDEX
> URL=http://localhost:8983/solr/update
> curl $URL --data-binary '<add><doc><field name="id">1</field><field name="test">test</field></doc></add>' -H 'Content-type:text/xml; charset=utf-8'
> curl $URL --data-binary '<commit/>' -H 'Content-type:text/xml; charset=utf-8'
> QUERY
> curl 'http://localhost:8983/solr/select/?hl.fl=test&hl=true&q=id:1'
> ERROR
> org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> 	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:328)
> 	at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
> 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
> 	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> 	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
> 	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
> 	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> 	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> 	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
> 	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> 	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
> 	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> 	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
> 	at org.mortbay.jetty.Server.handle(Server.java:285)
> 	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
> 	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
> 	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
> 	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
> 	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
> 	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
> 	at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
> Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> 	at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:254)
> 	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:321)
> 	... 23 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1404) Random failures with highlighting

Posted by "Igor Motov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Igor Motov updated SOLR-1404:
-----------------------------

    Attachment: SOLR-1404.patch

> Random failures with highlighting
> ---------------------------------
>
>                 Key: SOLR-1404
>                 URL: https://issues.apache.org/jira/browse/SOLR-1404
>             Project: Solr
>          Issue Type: Bug
>          Components: Analysis, highlighter
>    Affects Versions: 1.4
>            Reporter: Anders Melchiorsen
>             Fix For: 1.4
>
>         Attachments: SOLR-1404.patch
>
>
> With a recent Solr nightly, we started getting errors when highlighting.
> I have not been able to reduce our real setup to a minimal one that is failing, but the same error seems to pop up with the configuration below. Note that the QUERY will mostly fail, but it will work sometimes. Notably, after running "java -jar start.jar", the QUERY will work the first time, but then start failing for a while. Seems that something is not being reset properly.
> The example uses the deprecated HTMLStripWhitespaceTokenizerFactory but the problem apparently also exists with other tokenizers; I was just unable to create a minimal example with other configurations.
> SCHEMA
> <?xml version="1.0" encoding="UTF-8" ?>
> <schema name="example" version="1.2">
>   <types>
>     <fieldType name="string" class="solr.StrField" />
>     <fieldtype name="testtype" class="solr.TextField">
>       <analyzer>
>         <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory" />
>       </analyzer>
>     </fieldtype>
>  </types>
>  <fields>
>    <field name="id" type="string" indexed="true" stored="false" />
>    <field name="test" type="testtype" indexed="false" stored="true" />
>  </fields>
>  <uniqueKey>id</uniqueKey>
> </schema>
> INDEX
> URL=http://localhost:8983/solr/update
> curl $URL --data-binary '<add><doc><field name="id">1</field><field name="test">test</field></doc></add>' -H 'Content-type:text/xml; charset=utf-8'
> curl $URL --data-binary '<commit/>' -H 'Content-type:text/xml; charset=utf-8'
> QUERY
> curl 'http://localhost:8983/solr/select/?hl.fl=test&hl=true&q=id:1'
> ERROR
> org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> 	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:328)
> 	at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
> 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
> 	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> 	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
> 	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
> 	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> 	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> 	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
> 	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> 	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
> 	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> 	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
> 	at org.mortbay.jetty.Server.handle(Server.java:285)
> 	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
> 	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
> 	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
> 	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
> 	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
> 	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
> 	at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
> Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> 	at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:254)
> 	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:321)
> 	... 23 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1404) Random failures with highlighting

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated SOLR-1404:
------------------------------

    Fix Version/s: 1.4

> Random failures with highlighting
> ---------------------------------
>
>                 Key: SOLR-1404
>                 URL: https://issues.apache.org/jira/browse/SOLR-1404
>             Project: Solr
>          Issue Type: Bug
>          Components: Analysis, highlighter
>    Affects Versions: 1.4
>            Reporter: Anders Melchiorsen
>             Fix For: 1.4
>
>
> With a recent Solr nightly, we started getting errors when highlighting.
> I have not been able to reduce our real setup to a minimal one that is failing, but the same error seems to pop up with the configuration below. Note that the QUERY will mostly fail, but it will work sometimes. Notably, after running "java -jar start.jar", the QUERY will work the first time, but then start failing for a while. Seems that something is not being reset properly.
> The example uses the deprecated HTMLStripWhitespaceTokenizerFactory but the problem apparently also exists with other tokenizers; I was just unable to create a minimal example with other configurations.
> SCHEMA
> <?xml version="1.0" encoding="UTF-8" ?>
> <schema name="example" version="1.2">
>   <types>
>     <fieldType name="string" class="solr.StrField" />
>     <fieldtype name="testtype" class="solr.TextField">
>       <analyzer>
>         <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory" />
>       </analyzer>
>     </fieldtype>
>  </types>
>  <fields>
>    <field name="id" type="string" indexed="true" stored="false" />
>    <field name="test" type="testtype" indexed="false" stored="true" />
>  </fields>
>  <uniqueKey>id</uniqueKey>
> </schema>
> INDEX
> URL=http://localhost:8983/solr/update
> curl $URL --data-binary '<add><doc><field name="id">1</field><field name="test">test</field></doc></add>' -H 'Content-type:text/xml; charset=utf-8'
> curl $URL --data-binary '<commit/>' -H 'Content-type:text/xml; charset=utf-8'
> QUERY
> curl 'http://localhost:8983/solr/select/?hl.fl=test&hl=true&q=id:1'
> ERROR
> org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> 	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:328)
> 	at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
> 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
> 	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> 	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
> 	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
> 	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> 	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> 	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
> 	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> 	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
> 	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> 	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
> 	at org.mortbay.jetty.Server.handle(Server.java:285)
> 	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
> 	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
> 	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
> 	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
> 	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
> 	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
> 	at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
> Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length of provided text sized 4
> 	at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:254)
> 	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:321)
> 	... 23 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.