You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Marian Steinbach <ma...@gmail.com> on 2011/12/05 11:12:31 UTC

search.highlight.InvalidTokenOffsetsException in Solr 3.5

I get InvalidTokenOffsetsException in some searches when highlighting
is activated. It seems to depend on the result documents involved.

In previous versions of Solr I haven't experienced this kind of error.
Any ideas?

Here is the complete exception stack:

Problem accessing /solr/select. Reason:

    org.apache.lucene.search.highlight.InvalidTokenOffsetsException:
Token verwaltung exceeds length of provided text sized 3228

org.apache.solr.common.SolrException:
org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token
verwaltung exceeds length of provided text sized 3228
	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:497)
	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401)
	at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
	at org.mortbay.jetty.Server.handle(Server.java:326)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException:
Token verwaltung exceeds length of provided text sized 3228
	at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233)
	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:490)
	... 24 more

These are my highlighting parameters, which seem to have no effect on
the exception:

	<str name="hl">true</str>
	<str name="hl.fl">body,text</str>
	<int name="hl.snippets">3</int>
	<int name="hl.maxAnalyzedChars">20000</int>
	<str name="hl.mergeContiguous">true</str>

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by Ahmet Arslan <io...@yahoo.com>.
> But - the wiki page has a foot note that says "a tokenizer
> must be defined
> for the field, but it doesn't need to be indexed". The body
> field has the
> type "dcx_text" which has a tokenizer.
> 
> Is the documentation wrong here or am I misunderstanding
> something? 

Ah, I never read that note. (just looking on the table).

I think you are right, I can generate snippet from the following field:

<field name="body" type="dcx_text" stored="true" indexed="false" multiValued="true"/>



Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by andrew <an...@digicol.de>.
Ah, ok - thank you for looking at it.

But - the wiki page has a foot note that says "a tokenizer must be defined
for the field, but it doesn't need to be indexed". The body field has the
type "dcx_text" which has a tokenizer.

Is the documentation wrong here or am I misunderstanding something? 


--
View this message in context: http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3793706.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by Ahmet Arslan <io...@yahoo.com>.
> Ahmet, this is a good find. Can we still open a JIRA issue
> so that a
> more useful exception is thrown here?

Robert, I created SOLR-3193 and created a test using Andrew's files.

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by Robert Muir <rc...@gmail.com>.
On Fri, Mar 2, 2012 at 9:41 AM, Ahmet Arslan <io...@yahoo.com> wrote:
>
>> Robert, I just tried with
>> 3.6-SNAPSHOT 1296203 from svn - the problem is
>> still there.
>>
>> I am just about to leave for a vacation. I'll try to open a
>> JIRA issue this
>> evening.
>
> Andrew, thanks for providing files. I also re-produced it.
>
> But cause of the exception is that you are trying to highlight on a field (body) that is not indexed.
>
> To enable highlighting you need both indexed="true" and stored="true" .
> http://wiki.apache.org/solr/FieldOptionsByUseCase
>
> I changed definition of body field from indexed="false" to indexed="true" and it is working now.
>
> But for the record (with indexed="false"), it is weird that it produces snippet in the first request, and then fails in the second request.
>
>

Ahmet, this is a good find. Can we still open a JIRA issue so that a
more useful exception is thrown here?


-- 
lucidimagination.com

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by Ahmet Arslan <io...@yahoo.com>.
> Robert, I just tried with
> 3.6-SNAPSHOT 1296203 from svn - the problem is
> still there.
> 
> I am just about to leave for a vacation. I'll try to open a
> JIRA issue this
> evening.

Andrew, thanks for providing files. I also re-produced it. 

But cause of the exception is that you are trying to highlight on a field (body) that is not indexed. 

To enable highlighting you need both indexed="true" and stored="true" .
http://wiki.apache.org/solr/FieldOptionsByUseCase

I changed definition of body field from indexed="false" to indexed="true" and it is working now.

But for the record (with indexed="false"), it is weird that it produces snippet in the first request, and then fails in the second request. 



Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by andrew <an...@digicol.de>.
Robert, I just tried with 3.6-SNAPSHOT 1296203 from svn - the problem is
still there.

I am just about to leave for a vacation. I'll try to open a JIRA issue this
evening.


--
View this message in context: http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3793593.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by andrew <an...@digicol.de>.
I posted the files here: http://www.mediafire.com/?z43a5qyfvz4zxp1


--
View this message in context: http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3793496.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by Ahmet Arslan <io...@yahoo.com>.
> I think it is not a good idea to post the Solr <add/>
> XML here - it is very
> long (text extract of a newspaper page) and may not
> reproduce verbatim
> (whitespace etc.) if I paste it here. 
> 
> iorixxx, koji - is it ok if I send the necessary artifacts
> (add XML, schema,
> config) via email?

I saw people using http://pastebin.com/ for this purposes before. Can you provide your full search URL too?

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by Robert Muir <rc...@gmail.com>.
On Fri, Mar 2, 2012 at 7:37 AM, andrew <an...@digicol.de> wrote:
> I was able to create a test case.
>
> We are querying ranges of documents. When I tried to isolate the document
> that causes trouble, I found it happens with exactly every second request
> only for a single document query (it fails constantly when requesting a
> range of documents where that document is included). I could also reproduce
> the exception with only that single document in the index.
>
> I think it is not a good idea to post the Solr <add/> XML here - it is very
> long (text extract of a newspaper page) and may not reproduce verbatim
> (whitespace etc.) if I paste it here.
>
> iorixxx, koji - is it ok if I send the necessary artifacts (add XML, schema,
> config) via email?
>

You can also open a jira issue
(https://issues.apache.org/jira/browse/SOLR), and upload everything as
attachments.

I would also be very interested if you can test a nightly 3.6 build
(https://builds.apache.org/job/Solr-3.x/lastSuccessfulBuild/artifact/artifacts/)

There have been *numerous* offsets bugs fixed in 3.6 in a variety of
tokenizers/tokenfilters besides the HTMLStripCharFilter:
https://issues.apache.org/jira/browse/LUCENE-3642
https://issues.apache.org/jira/browse/SOLR-2891
https://issues.apache.org/jira/browse/LUCENE-3717

-- 
lucidimagination.com

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by andrew <an...@digicol.de>.
I was able to create a test case.

We are querying ranges of documents. When I tried to isolate the document
that causes trouble, I found it happens with exactly every second request
only for a single document query (it fails constantly when requesting a
range of documents where that document is included). I could also reproduce
the exception with only that single document in the index.

I think it is not a good idea to post the Solr <add/> XML here - it is very
long (text extract of a newspaper page) and may not reproduce verbatim
(whitespace etc.) if I paste it here. 

iorixxx, koji - is it ok if I send the necessary artifacts (add XML, schema,
config) via email?

--
View this message in context: http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3793347.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
(12/03/02 6:05), Ahmet Arslan wrote:
>> I have the same problem. This happens
>> only for some documents in the index.
>
> Andrew, can you provide a document string and a query pair? I will try to re-produce the exception. Then we can create a test case that fails. Others can look into it.

+1. Please do it!

koji
-- 
Query Log Visualizer for Apache Solr
http://soleami.com/

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by Ahmet Arslan <io...@yahoo.com>.
> I have the same problem. This happens
> only for some documents in the index.

Andrew, can you provide a document string and a query pair? I will try to re-produce the exception. Then we can create a test case that fails. Others can look into it.


Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by andrew <an...@digicol.de>.
I have the same problem. This happens only for some documents in the index.

Like sharadgaur, the problem ceased when I removed
ReversedWildcardFilterFactory from my analysis chain,
HTMLStripCharFilterFactory has been there before and after. 

I am running branch-3.6 r1238628. As far as I can tell, this already has the
fixes from LUCENE-2208 / LUCENE-3690.


--
View this message in context: http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3791598.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by Ahmet Arslan <io...@yahoo.com>.
> Not sure whether that question was directed at me, but I am
> not using HTMLStripCharFilter but some other pattern
> replacements which modify
> character positions, probably in the same manner as
> HTMLStripCharFilter
> does.

I thought that cause of the problem is https://issues.apache.org/jira/browse/LUCENE-2208

What is your field definition? Can you provide your document and query pair that causes this exception?

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by Marian Steinbach <ma...@sendung.de>.
Am 28. Februar 2012 21:14 schrieb Ahmet Arslan <io...@yahoo.com>:

>
> Are you using HTMLStripCharFilter ? If yes this could be :
> https://issues.apache.org/jira/browse/LUCENE-3690
>


Not sure whether that question was directed at me, but I am not
using HTMLStripCharFilter but some other pattern replacements which modify
character positions, probably in the same manner as HTMLStripCharFilter
does.

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by sharadgaur <sh...@gmail.com>.
I was using fieldType text_general_rev

 <fieldType name="text_general_rev" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
      	<charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ReversedWildcardFilterFactory"
withOriginal="true"
           maxPosAsterisk="3" maxPosQuestion="2"
maxFractionAsterisk="0.33"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>




But since I changed to fieldType text_genral. Everything is running fine....
not getting InvalidTokenOffsetsException exception.




   <fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>





--
View this message in context: http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3785456.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by Ahmet Arslan <io...@yahoo.com>.
> Unfortunately I don't have any news
> on that. I disabled highlighting on the
> text field (sadly).
> 
> Have you tracked down which field causes the problem? Can
> you tell which
> filters you are applying to the according field type?

Are you using HTMLStripCharFilter ? If yes this could be :
https://issues.apache.org/jira/browse/LUCENE-3690

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by Marian Steinbach <ma...@sendung.de>.
Unfortunately I don't have any news on that. I disabled highlighting on the
text field (sadly).

Have you tracked down which field causes the problem? Can you tell which
filters you are applying to the according field type?

Marian

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by sharadgaur <sh...@gmail.com>.
I am also facing same problem do you have any update on it..... I am using
Solr 3.5 and getting same error...

Feb 28, 2012 1:40:44 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException:
org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token to
exceeds length of provided text sized 11503
        at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:497)
        at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401)
        at
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131)
        at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
        at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
        at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:567)
        at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
        at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
        at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
        at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
        at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException:
Token to exceeds length of provided text sized 11503
        at
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233)
        at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:490)
        ... 20 more


--
View this message in context: http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3785157.html
Sent from the Solr - User mailing list archive at Nabble.com.