You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by andrew <an...@digicol.de> on 2012/03/01 21:14:22 UTC

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

I have the same problem. This happens only for some documents in the index.

Like sharadgaur, the problem ceased when I removed
ReversedWildcardFilterFactory from my analysis chain,
HTMLStripCharFilterFactory has been there before and after. 

I am running branch-3.6 r1238628. As far as I can tell, this already has the
fixes from LUCENE-2208 / LUCENE-3690.


--
View this message in context: http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3791598.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by Ahmet Arslan <io...@yahoo.com>.
> But - the wiki page has a foot note that says "a tokenizer
> must be defined
> for the field, but it doesn't need to be indexed". The body
> field has the
> type "dcx_text" which has a tokenizer.
> 
> Is the documentation wrong here or am I misunderstanding
> something? 

Ah, I never read that note. (just looking on the table).

I think you are right, I can generate snippet from the following field:

<field name="body" type="dcx_text" stored="true" indexed="false" multiValued="true"/>



Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by andrew <an...@digicol.de>.
Ah, ok - thank you for looking at it.

But - the wiki page has a foot note that says "a tokenizer must be defined
for the field, but it doesn't need to be indexed". The body field has the
type "dcx_text" which has a tokenizer.

Is the documentation wrong here or am I misunderstanding something? 


--
View this message in context: http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3793706.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by Ahmet Arslan <io...@yahoo.com>.
> Ahmet, this is a good find. Can we still open a JIRA issue
> so that a
> more useful exception is thrown here?

Robert, I created SOLR-3193 and created a test using Andrew's files.

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by Robert Muir <rc...@gmail.com>.
On Fri, Mar 2, 2012 at 9:41 AM, Ahmet Arslan <io...@yahoo.com> wrote:
>
>> Robert, I just tried with
>> 3.6-SNAPSHOT 1296203 from svn - the problem is
>> still there.
>>
>> I am just about to leave for a vacation. I'll try to open a
>> JIRA issue this
>> evening.
>
> Andrew, thanks for providing files. I also re-produced it.
>
> But cause of the exception is that you are trying to highlight on a field (body) that is not indexed.
>
> To enable highlighting you need both indexed="true" and stored="true" .
> http://wiki.apache.org/solr/FieldOptionsByUseCase
>
> I changed definition of body field from indexed="false" to indexed="true" and it is working now.
>
> But for the record (with indexed="false"), it is weird that it produces snippet in the first request, and then fails in the second request.
>
>

Ahmet, this is a good find. Can we still open a JIRA issue so that a
more useful exception is thrown here?


-- 
lucidimagination.com

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by Ahmet Arslan <io...@yahoo.com>.
> Robert, I just tried with
> 3.6-SNAPSHOT 1296203 from svn - the problem is
> still there.
> 
> I am just about to leave for a vacation. I'll try to open a
> JIRA issue this
> evening.

Andrew, thanks for providing files. I also re-produced it. 

But cause of the exception is that you are trying to highlight on a field (body) that is not indexed. 

To enable highlighting you need both indexed="true" and stored="true" .
http://wiki.apache.org/solr/FieldOptionsByUseCase

I changed definition of body field from indexed="false" to indexed="true" and it is working now.

But for the record (with indexed="false"), it is weird that it produces snippet in the first request, and then fails in the second request. 



Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by andrew <an...@digicol.de>.
Robert, I just tried with 3.6-SNAPSHOT 1296203 from svn - the problem is
still there.

I am just about to leave for a vacation. I'll try to open a JIRA issue this
evening.


--
View this message in context: http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3793593.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by andrew <an...@digicol.de>.
I posted the files here: http://www.mediafire.com/?z43a5qyfvz4zxp1


--
View this message in context: http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3793496.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by Ahmet Arslan <io...@yahoo.com>.
> I think it is not a good idea to post the Solr <add/>
> XML here - it is very
> long (text extract of a newspaper page) and may not
> reproduce verbatim
> (whitespace etc.) if I paste it here. 
> 
> iorixxx, koji - is it ok if I send the necessary artifacts
> (add XML, schema,
> config) via email?

I saw people using http://pastebin.com/ for this purposes before. Can you provide your full search URL too?

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by Robert Muir <rc...@gmail.com>.
On Fri, Mar 2, 2012 at 7:37 AM, andrew <an...@digicol.de> wrote:
> I was able to create a test case.
>
> We are querying ranges of documents. When I tried to isolate the document
> that causes trouble, I found it happens with exactly every second request
> only for a single document query (it fails constantly when requesting a
> range of documents where that document is included). I could also reproduce
> the exception with only that single document in the index.
>
> I think it is not a good idea to post the Solr <add/> XML here - it is very
> long (text extract of a newspaper page) and may not reproduce verbatim
> (whitespace etc.) if I paste it here.
>
> iorixxx, koji - is it ok if I send the necessary artifacts (add XML, schema,
> config) via email?
>

You can also open a jira issue
(https://issues.apache.org/jira/browse/SOLR), and upload everything as
attachments.

I would also be very interested if you can test a nightly 3.6 build
(https://builds.apache.org/job/Solr-3.x/lastSuccessfulBuild/artifact/artifacts/)

There have been *numerous* offsets bugs fixed in 3.6 in a variety of
tokenizers/tokenfilters besides the HTMLStripCharFilter:
https://issues.apache.org/jira/browse/LUCENE-3642
https://issues.apache.org/jira/browse/SOLR-2891
https://issues.apache.org/jira/browse/LUCENE-3717

-- 
lucidimagination.com

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by andrew <an...@digicol.de>.
I was able to create a test case.

We are querying ranges of documents. When I tried to isolate the document
that causes trouble, I found it happens with exactly every second request
only for a single document query (it fails constantly when requesting a
range of documents where that document is included). I could also reproduce
the exception with only that single document in the index.

I think it is not a good idea to post the Solr <add/> XML here - it is very
long (text extract of a newspaper page) and may not reproduce verbatim
(whitespace etc.) if I paste it here. 

iorixxx, koji - is it ok if I send the necessary artifacts (add XML, schema,
config) via email?

--
View this message in context: http://lucene.472066.n3.nabble.com/search-highlight-InvalidTokenOffsetsException-in-Solr-3-5-tp3560997p3793347.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
(12/03/02 6:05), Ahmet Arslan wrote:
>> I have the same problem. This happens
>> only for some documents in the index.
>
> Andrew, can you provide a document string and a query pair? I will try to re-produce the exception. Then we can create a test case that fails. Others can look into it.

+1. Please do it!

koji
-- 
Query Log Visualizer for Apache Solr
http://soleami.com/

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

Posted by Ahmet Arslan <io...@yahoo.com>.
> I have the same problem. This happens
> only for some documents in the index.

Andrew, can you provide a document string and a query pair? I will try to re-produce the exception. Then we can create a test case that fails. Others can look into it.