You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Bowen (JIRA)" <ji...@apache.org> on 2008/12/23 03:45:44 UTC

[jira] Created: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Highlighter throws StringIndexOutOfBoundsException
--------------------------------------------------

                 Key: LUCENE-1500
                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
             Project: Lucene - Java
          Issue Type: Bug
          Components: contrib/highlighter
    Affects Versions: 2.4
         Environment: Found this running the example code in Solr (latest version).
            Reporter: David Bowen
         Attachments: patch.txt

Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):

<add><doc>
  <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
  <field name="name">Some Name</field>
  <field name="manu">Acme, Inc.</field>
  <field name="features">Description of the features, mentioning various things</field>
  <field name="features">Features also is multivalued</field>
  <field name="popularity">6</field>
  <field name="inStock">true</field>
</doc></add>

and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.

I have a patch.  I don't know if it is completely correct, but it avoids this exception.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "David Bowen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658740#action_12658740 ] 

David Bowen commented on LUCENE-1500:
-------------------------------------

That makes sense, thank-you Koji.

I think it may be worth incorporating this patch anyway, since it is a small change which makes the method more robust.  It just eliminates the possibility of running off the end of the text string.


> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>         Attachments: patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677507#action_12677507 ] 

Mark Harwood commented on LUCENE-1500:
--------------------------------------

OK - choices are:

1) Throw a RuntimeException with a more useful diagnostic message
2) Throw a new checked Exception (introducing possible compile errors in existing client code)
3) Check for the error condition and ignore (as done in the current patch)

This feels to me like one of those "there's something seriously wrong with the codebase" problems rather than an invalid bit of data or user input which is external to the system so my personal preference is to lean towards 1). 



> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Harwood updated LUCENE-1500:
---------------------------------

    Attachment: Lucene-1500-NewException.patch

With updated Apache license header.

I'll commit soon if no objections

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Mark Harwood
>             Fix For: 2.9
>
>         Attachments: Lucene-1500-NewException.patch, Lucene-1500-NewException.patch, LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676682#action_12676682 ] 

Koji Sekiguchi commented on LUCENE-1500:
----------------------------------------

Can you post your document and schema so that others can reproduce your problem?

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676611#action_12676611 ] 

Michael McCandless commented on LUCENE-1500:
--------------------------------------------

I think the defensive coding (the current patch) makes sense, to avoid out-of-bounds substring call.

But what is the root cause here?  It seems likely the analyzer chain is not producing the right (original) start/end offset in the tokens?  What analyzer chain are you using?

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677535#action_12677535 ] 

Michael McCandless commented on LUCENE-1500:
--------------------------------------------

I thought the bug was in the analyzer (producing out-of-bounds token offsets), not highlighter?

The original Solr bug (SOLR-925), which I think was the analyzer producing invalid offsets, is now fixed.  Peter are you using Solr, or are you seeing this in a different context?

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677549#action_12677549 ] 

Hoss Man commented on LUCENE-1500:
----------------------------------

bq. The extent of my tracing suggests it's coming when the token stream is generated, which looks to be part of the lucene highlighter: org.apache.lucene.search.highlight.TokenSources

with my limited knowledge of solr highlighting, that really doesn't sound right.

If the field has TermVectors, then the TokenStream used comes from there, otherwise it coems from analyzing the stored field value -- either way the analyzer configured in Solr should be setting the offset values, correct? 

hence my question on the solr thread where this first came up...
http://www.nabble.com/Error-with-highlighter-and-UTF-8-chars--to22156161.html#a22207917

bq. so what does the analysis screen tell you about each token produced with that input text given your configuration?  in verbose mode it will show the start/end offsets for every token, so it should be fairly easy to identify where the bug is. 

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Peter Wolanin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676426#action_12676426 ] 

Peter Wolanin commented on LUCENE-1500:
---------------------------------------

Actually, looking at the Lucene source and the trace:

{code}
java.lang.StringIndexOutOfBoundsException: String index out of range: 2822
	at java.lang.String.substring(String.java:1765)
	at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:274)
	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:313)
	at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:84)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
       ...
{code}

I see now that getBestTextFragments() takes in a token stream - and each token in this steam already has start/end positions set.  So, this patch would mitigate the exception, but looks liek the real bug is in Solr.

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>         Attachments: patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676754#action_12676754 ] 

Michael McCandless commented on LUCENE-1500:
--------------------------------------------

bq. So to be consistent, where else in Lucene might an "IncorrectTokenOffsetsException" be a possibility - IndexWriter.addDocument(..)?

Good question!

I don't think we can throw it from addDocument, because we cannot
assume/assert that the int startOffset & endOffset are in fact
character offsets into the String/Reader we had been given?

Technically, to Lucene the start/end offsets are somewhat opaque (I
think?).  It's not until you actually do something with them (eg, call
the highlighter) that you are saying "these are really supposed to be
character offsets into the specific text I just provided you".


> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Peter Wolanin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676426#action_12676426 ] 

pwolanin edited comment on LUCENE-1500 at 2/24/09 2:15 PM:
----------------------------------------------------------------

Actually, looking at the Lucene source and the trace:

{code}
java.lang.StringIndexOutOfBoundsException: String index out of range: 2822
	at java.lang.String.substring(String.java:1765)
	at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:274)
	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:313)
	at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:84)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
       ...
{code}

I see now that getBestTextFragments() takes in a token stream - and each token in this steam already has start/end positions set.  So, this patch would mitigate the exception, but looks like the real bug is in Solr, or perhaps elsewhere in Lucene where the token stream is constructed.

      was (Author: pwolanin):
    Actually, looking at the Lucene source and the trace:

{code}
java.lang.StringIndexOutOfBoundsException: String index out of range: 2822
	at java.lang.String.substring(String.java:1765)
	at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:274)
	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:313)
	at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:84)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
       ...
{code}

I see now that getBestTextFragments() takes in a token stream - and each token in this steam already has start/end positions set.  So, this patch would mitigate the exception, but looks liek the real bug is in Solr.
  
> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>         Attachments: patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677968#action_12677968 ] 

Mark Harwood commented on LUCENE-1500:
--------------------------------------

Isn't your example predicated on being given an invalid Token with end<start?

What did you think of my suggestion to fix this problem at it's source - i.e. Token should never be in a state with end<start in the first place?

Acheiving this goal is complicated by the fact that offsets are not only set in the constructor - there are independent set methods for start and end offsets which can be called in any order.
One solution would be to deprecate Token.setStartOffset and Token.endOffset and replacing with a Token.setExtent(int startOffset, int endOffset) with the appropriate checks.





> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: Lucene-1500-NewException.patch, LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1500:
---------------------------------------

    Lucene Fields: [New, Patch Available]  (was: [Patch Available, New])
    Fix Version/s: 2.9
                   2.4.1

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Peter Wolanin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677517#action_12677517 ] 

Peter Wolanin commented on LUCENE-1500:
---------------------------------------

Well, this patch does not (obviously) solve the real bug.  Is it possible to combine #1 and #3, but possibly revert #3 later when we solve the real bug in the highlighter code?  

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Peter Wolanin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677620#action_12677620 ] 

Peter Wolanin commented on LUCENE-1500:
---------------------------------------

Ah, it occurs to me that we first saw this bug recently - and it seems likely it was only after starting to use :
{code}
 <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
{code}

for that field type.  I will investigate more and post a SOLR issue.

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Harwood updated LUCENE-1500:
---------------------------------

    Attachment: Lucene-1500-NewException.patch

Added support for testing both Token start or end offset >text.length.

Added javadoc comments for new exception

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Mark Harwood
>             Fix For: 2.9
>
>         Attachments: Lucene-1500-NewException.patch, LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Peter Wolanin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676449#action_12676449 ] 

Peter Wolanin commented on LUCENE-1500:
---------------------------------------

Actually - the initial patch does not avoid the exception I'm seeing, since the start of the token is ok, but the end is beyond the string's end.  Here is a slightly enhanced version that checks both the start and end of the token.

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>         Attachments: patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677372#action_12677372 ] 

Michael McCandless commented on LUCENE-1500:
--------------------------------------------

Mark, do you want/have time to take this one?  I'd like to resolve it before 2.4.1 if we can...

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658735#action_12658735 ] 

Koji Sekiguchi commented on LUCENE-1500:
----------------------------------------

I think the bug in Solr, not Lucene. Please see SOLR-925, a patch is there.

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>         Attachments: patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Peter Wolanin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676421#action_12676421 ] 

Peter Wolanin commented on LUCENE-1500:
---------------------------------------

I have run into this issue over the last couple days.  Also using Solr, but the error is triggered by content that has multi-byte characters (such as German).

It seems that somewhere lucene is counting bytes instead of characters, so each substring the highlighter tries to select is offset further forward in the string being matched.

here's an example trying to highlight the string 'Drupaltalk' with "strong" tags
{code}
<p class="search-snippet">
 Community ist - und dieses Portal Dr<strong>upaltalk.d</strong>e samt seinem schon eifrigen Benutzer- und Gästezulauf ( ... 
 nter "Dru<strong>paltalk001</strong>" könnt Ihr die erste Konferenz noch mal nachhören und erfahren, wie Selbstorganisation in der Drupal Szene funktioniert.  
 "Dru<strong>paltalk002</strong>" ist dann der Talk vom Dienstag zum Thema "Drupal Al</p>
{code}
        


> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>         Attachments: patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677956#action_12677956 ] 

Mark Harwood commented on LUCENE-1500:
--------------------------------------

My thoughts were that this exception solely traps inconsistencies with Tokens in relation to a particular provided chunk of text.

I think internal inconsistencies within a Token (e.g. endOffset <startOffset) should ideally be handled by Token (throwing something like an IllegalArgumentException in it's constructor).
I guess an open question there is can startOffset=endOffset in a Token? Either way, String.substring simply returns an empty string so I think that's probably OK in highlighter.


> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: Lucene-1500-NewException.patch, LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "David Bowen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Bowen updated LUCENE-1500:
--------------------------------

    Attachment: patch.txt

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>         Attachments: patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680330#action_12680330 ] 

Michael McCandless commented on LUCENE-1500:
--------------------------------------------

The InvalidTokenOffsetsException.java just needs a copyright header; otherwise patch looks good!

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Mark Harwood
>             Fix For: 2.9
>
>         Attachments: Lucene-1500-NewException.patch, LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677639#action_12677639 ] 

Koji Sekiguchi commented on LUCENE-1500:
----------------------------------------

Peter, thank you.

bq. In the thread you suggest that this API could be aded to lucene java?

I've proposed CharFilter feature for Lucene in LUCENE-1466 . If you like it, please vote for it. :)

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677524#action_12677524 ] 

Michael McCandless commented on LUCENE-1500:
--------------------------------------------


bq. This feels to me like one of those "there's something seriously wrong with the codebase" problems

My guess is it can happen quite easily and not be so obvious until the
"right" text goes through the analyzer.  If Solr's analysis (for
highlighting multi-valued fields) can hit this my guess is it's (wrong
offsets coming out of analyzer) not uncommon...

I'd still prefer #2, but I don't feel super strongly about it, so if
there's no consensus otherwise, it's fine with me to go forward
with #1.

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reassigned LUCENE-1500:
------------------------------------------

    Assignee: Michael McCandless

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Peter Wolanin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676421#action_12676421 ] 

pwolanin edited comment on LUCENE-1500 at 2/24/09 1:38 PM:
----------------------------------------------------------------

I have run into this issue over the last couple days.  Also using Solr, but the error is triggered by content that has multi-byte characters (such as German).

It seems that somewhere Lucene is counting bytes instead of characters, so each substring the highlighter tries to select is offset further forward in the string being matched.

here's an example trying to highlight the string 'Drupaltalk' with "strong" tags
{code}
<p class="search-snippet">
 Community ist - und dieses Portal Dr<strong>upaltalk.d</strong>e samt seinem schon eifrigen Benutzer- und Gästezulauf ( ... 
 nter "Dru<strong>paltalk001</strong>" könnt Ihr die erste Konferenz noch mal nachhören und erfahren, wie Selbstorganisation in der Drupal Szene funktioniert.  
 "Dru<strong>paltalk002</strong>" ist dann der Talk vom Dienstag zum Thema "Drupal Al</p>
{code}
        
So the attached patch would probably avoid the exception (and is a good idea) but would not fix the bug I'm seeing.

      was (Author: pwolanin):
    I have run into this issue over the last couple days.  Also using Solr, but the error is triggered by content that has multi-byte characters (such as German).

It seems that somewhere lucene is counting bytes instead of characters, so each substring the highlighter tries to select is offset further forward in the string being matched.

here's an example trying to highlight the string 'Drupaltalk' with "strong" tags
{code}
<p class="search-snippet">
 Community ist - und dieses Portal Dr<strong>upaltalk.d</strong>e samt seinem schon eifrigen Benutzer- und Gästezulauf ( ... 
 nter "Dru<strong>paltalk001</strong>" könnt Ihr die erste Konferenz noch mal nachhören und erfahren, wie Selbstorganisation in der Drupal Szene funktioniert.  
 "Dru<strong>paltalk002</strong>" ist dann der Talk vom Dienstag zum Thema "Drupal Al</p>
{code}
        
So the attached patch would probably avoid the exception (and is a good idea) but would not fix the bug I'm seeing.
  
> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>         Attachments: patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677991#action_12677991 ] 

Mark Harwood commented on LUCENE-1500:
--------------------------------------

I struggle to see why endOffset<startOffset should ever be acceptable but also share your concerns about the disruption of changing the Token API to enforce this.

So, I'll add code to the patch to check for bad startOffsets too. If we had more "points of use" for Token offsets outside of highlighting I'd be more concerned, but things being the way they are this seems like the most pragmatic option.

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: Lucene-1500-NewException.patch, LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Peter Wolanin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677629#action_12677629 ] 

Peter Wolanin commented on LUCENE-1500:
---------------------------------------

Koji - thanks - I was aware that not all worked with the mapping filter, but I was apparently misinformed since I was told that the "solr.HTMLStripWhitespaceTokenizerFactory" was also suitable for CharFilter.  Indeed your e-mail thread linked from SOLR-822 describes exactly the problem I have:

bq. As you can see, if you use CharFilter, Token offsets could be incorrect because CharFilters may convert 1 char to 2 chars or the other way around.

In the thread you suggest that this API could be aded to lucene java?

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Lars Olson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704022#action_12704022 ] 

Lars Olson edited comment on LUCENE-1500 at 4/29/09 12:47 AM:
--------------------------------------------------------------

Hi! with the risk of being a noob, I just have to ask...
The patch Lucene-1500-NewException.patch, is it incorporated in the Lucene nightly builds as of 2009-04-27?

I am now running my code against the nightly builds of Lucene20090427, and those libs give me the new error "InvalidTokenOffsetsException" at the exact same spot as the 2.4.1 libs return a StringIndexOutOfBounds exception. Should it not solve the issue?

If not, could you maybe suggest a client coding which circumvents it?

When using the test-documents with search query "tika"
The resulting 9 documents return 5 occurences of the exception. 

PS: I'm not running Solr, just POJO with lucene and Tika




      was (Author: genernic):
    Hi! with the risk of being a noob, I just have to ask...
The patch Lucene-1500-NewException.patch, is it incorporated in the Lucene nightly builds as of 2009-04-27?

I am now running my code against the nightly builds of Lucene20090427, and those libs give me the new error "InvalidTokenOffsetsException" at the exact same spot as the 2.4.1 libs return a StringIndexOutOfBounds exception. Should it not solve the issue?

If not, could you maybe suggest a client coding which circumvents it?

When using the test-documents with search query "tika"
The resulting 9 documents return 5 occurences of the exception. 




  
> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Mark Harwood
>             Fix For: 2.9
>
>         Attachments: Lucene-1500-NewException.patch, Lucene-1500-NewException.patch, LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Harwood updated LUCENE-1500:
---------------------------------

    Attachment:     (was: Lucene-1500-NewException.patch)

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Mark Harwood
>             Fix For: 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Peter Wolanin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676648#action_12676648 ] 

Peter Wolanin commented on LUCENE-1500:
---------------------------------------

Yes - this patch is not a fix - but a work-around.

The root cause is clearly somewhere in the code generating the token stream - tokens seem to be getting positions in bytes rather than characters.

DefaultSolrHighlighter.java has this code:



{code}
import org.apache.lucene.search.highlight.TokenSources;

...

            // create TokenStream
            try {
              // attempt term vectors
              if( tots == null )
                tots = new TermOffsetsTokenStream( TokenSources.getTokenStream(searcher.getReader(), docId, fieldName) );
              tstream = tots.getMultiValuedTokenStream( docTexts[j].length() );
            }
            catch (IllegalArgumentException e) {
              // fall back to anaylzer
              tstream = new TokenOrderingFilter(schema.getAnalyzer().tokenStream(fieldName, new StringReader(docTexts[j])), 10);
            }
{code}



> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reassigned LUCENE-1500:
------------------------------------------

    Assignee: Mark Harwood  (was: Michael McCandless)

Handing this one off to you Mark!

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Mark Harwood
>             Fix For: 2.9
>
>         Attachments: Lucene-1500-NewException.patch, LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Peter Wolanin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677531#action_12677531 ] 

Peter Wolanin commented on LUCENE-1500:
---------------------------------------

The bug we are seeing now happens on pretty much every document that contains multi-byte characters, but only sometime was it going past the end of the full string and hitting the exception.   With the patch, the bug is still very evident, it jsut prevents the exception.  I's a serious flaw in the highlighter - maybe using some only non-utf-8 aware method to calculate string lengths?

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677626#action_12677626 ] 

Koji Sekiguchi commented on LUCENE-1500:
----------------------------------------

bq. for that field type. I will investigate more and post a SOLR issue.

If you use charFilter, your Tokenizer must be a "CharStreamAware" Tokenizer. We have only two CharStreamAwareTokenizers - CharStreamAwareWhitespaceTokenizer and CharStreamAwareCJKTokenizer in Solr. Please consider using CharStreamAwareWhitespaceTokenizer in your case. To know what CharStreamAware Tokenizer is, see SOLR-822 .

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676714#action_12676714 ] 

Hoss Man commented on LUCENE-1500:
----------------------------------

bq. Perhaps we can turn this around and ask "under what conditions is it acceptable to provide a TokenStream with tokens whose offsets exceed the length of the text provided?". Not sure I see a justifiable case for supporting that as a legitimate scenario and I would prefer the reporting of an error in this case.

+ 1 to reporting an error ... but StringIndexOutOfounds is not a helpful error.  Code should certainly check the offsets before calling substring and throw a specific exception identifying which token has an invalid offset

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Peter Wolanin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677540#action_12677540 ] 

Peter Wolanin commented on LUCENE-1500:
---------------------------------------

I am using Solr, but with a single value field.  I'm using the current Solr build (includes the fix), so the bug I'm describing, which triggers the same exception as the prior Solr bug did, is still present and unrelated to SOLR-925.

The extent of my tracing suggests it's coming when the token stream is generated, which looks to be part of the lucene highlighter:  org.apache.lucene.search.highlight.TokenSources



> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Peter Wolanin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Wolanin updated LUCENE-1500:
----------------------------------

    Attachment: LUCENE-1500.patch

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Lars Olson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704022#action_12704022 ] 

Lars Olson edited comment on LUCENE-1500 at 4/29/09 12:48 AM:
--------------------------------------------------------------

Hi! with the risk of being a noob, I just have to ask...
The patch Lucene-1500-NewException.patch, is it incorporated in the Lucene nightly builds as of 2009-04-27?

I am now running my code against the nightly builds of Lucene20090427, and those libs give me the new error "InvalidTokenOffsetsException" at the exact same spot as the 2.4.1 libs return a StringIndexOutOfBounds exception. Should it not solve the issue?

If not, could you maybe suggest a client coding which circumvents it?

When using the Tika test-documents with search query "tika"
The resulting 9 documents return 5 occurences of the exception. 

PS: I'm not running Solr, just POJO with lucene and Tika




      was (Author: genernic):
    Hi! with the risk of being a noob, I just have to ask...
The patch Lucene-1500-NewException.patch, is it incorporated in the Lucene nightly builds as of 2009-04-27?

I am now running my code against the nightly builds of Lucene20090427, and those libs give me the new error "InvalidTokenOffsetsException" at the exact same spot as the 2.4.1 libs return a StringIndexOutOfBounds exception. Should it not solve the issue?

If not, could you maybe suggest a client coding which circumvents it?

When using the test-documents with search query "tika"
The resulting 9 documents return 5 occurences of the exception. 

PS: I'm not running Solr, just POJO with lucene and Tika



  
> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Mark Harwood
>             Fix For: 2.9
>
>         Attachments: Lucene-1500-NewException.patch, Lucene-1500-NewException.patch, LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Lars Olson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704022#action_12704022 ] 

Lars Olson commented on LUCENE-1500:
------------------------------------

Hi! with the risk of being a noob, I just have to ask...
The patch Lucene-1500-NewException.patch, is it incorporated in the Lucene nightly builds as of 2009-04-27?

I am now running my code against the nightly builds of Lucene20090427, and those libs give me the new error "InvalidTokenOffsetsException" at the exact same spot as the 2.4.1 libs return a StringIndexOutOfBounds exception. Should it not solve the issue?

If not, could you maybe suggest a client coding which circumvents it?

When using the test-documents with search query "tika"
The resulting 9 documents return 5 occurences of the exception. 





> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Mark Harwood
>             Fix For: 2.9
>
>         Attachments: Lucene-1500-NewException.patch, Lucene-1500-NewException.patch, LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1500:
---------------------------------------

    Lucene Fields: [New, Patch Available]  (was: [Patch Available, New])
    Fix Version/s:     (was: 2.4.1)

I think given the addition of a checked exception to the Highlighter API, we should not back-port this to 2.4.1.

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: Lucene-1500-NewException.patch, LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676633#action_12676633 ] 

Mark Harwood commented on LUCENE-1500:
--------------------------------------

Hmmm. I'm not so sure that this "defensive coding" patch is the right thing to do here. 

One could argue that it is obscuring an error condition further upstream (as you suggest, Mike - a dodgy analyzer). Commiting this will only make these errors harder to detect e.g. we'd get forum posts saying "why doesn't my term get highlighted?"

Perhaps we can turn this around and ask "under what conditions is it acceptable to provide a TokenStream with tokens whose offsets exceed the length of the text provided?". 
Not sure I see a justifiable case for supporting that as a legitimate scenario and I would prefer the reporting of an error in this case.




> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677948#action_12677948 ] 

Michael McCandless commented on LUCENE-1500:
--------------------------------------------

Patch looks great, thanks Mark!

Do you also need to check that startOffset is within bounds?  EG if both start & end offset == length of input text, what does substring do?

If endOffset is < startOffset (seriously invalid token offsets) what happens?

Maybe just add javadoc to the @throws  explaining why this new exception would be thrown?

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: Lucene-1500-NewException.patch, LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677963#action_12677963 ] 

Michael McCandless commented on LUCENE-1500:
--------------------------------------------

The problem is, this hits a StringIndexOutOfBoundsException:

{code}
String s = "abcd";
s.substring(3, 2);
{code}

{code}
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
    at java.lang.String.substring(String.java:1938)
{code}

I think highlighter should guard against this?

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: Lucene-1500-NewException.patch, LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677551#action_12677551 ] 

Michael McCandless commented on LUCENE-1500:
--------------------------------------------

But that class either uses TermVectors stored in the index, or re-analyzes using the analyzer passed to it, I think?  Are you using TermVectors?

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680125#action_12680125 ] 

Mark Harwood commented on LUCENE-1500:
--------------------------------------

Will submit a new patch tonight.

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Mark Harwood
>             Fix For: 2.9
>
>         Attachments: Lucene-1500-NewException.patch, LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677608#action_12677608 ] 

Hoss Man commented on LUCENE-1500:
----------------------------------

Peter: i tried some experiments with teh analyzer specified in the schema+fieldType you referenced, and i couldn't reproduce any problem.

I suggest you open a new Jira issue against Solr and attach some sort of reproducible test (either a junit test OR a patch against the example configs + an example doc to index and an example query that either triggers an error or highlights an incorrect substring) so people have a starting point for trying to figure where the incorrect offsets are coming from.

LUCENE-1500 can/should stay open, but should focus specifically on the issue of what Highlighter's behavior should be when dealing with offsets that go past the end of the string (there seems to be consensus that it should do something different then IndexOutOfBounds, it just doesn't seem to be clear what the new behavior should be)

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Harwood updated LUCENE-1500:
---------------------------------

    Attachment: Lucene-1500-NewException.patch

Attached a patch with new checked exception.
This will have a knock-on effect on all Highlighter client code (Solr?) as it introduces a new checked exception that must be handled.

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: Lucene-1500-NewException.patch, LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676745#action_12676745 ] 

Mark Harwood commented on LUCENE-1500:
--------------------------------------

So to be consistent, where else in Lucene might an "IncorrectTokenOffsetsException" be a possibility - IndexWriter.addDocument(..)?

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676728#action_12676728 ] 

Michael McCandless commented on LUCENE-1500:
--------------------------------------------


The thing is, since it's an unchecked exception, you could be well
into production, serving searches, etc, when suddenly unexpectedly
this exception kills the entire search request.  It's brittle.

(vs highlighting the wrong parts but still providing search results to
the user -- graceful degradation).

So I'm still torn on "brittle" (what we have now) vs "graceful"
(current "workaround" patch) handling.  For exceptions that can strike
arbitrarily deep into production/searching, I usually lean towards
being "graceful".

Or... maybe we could introduce a new exception to call this case out
(IncorrectTokenOffsetsException or something?), but: we make it a
checked exception so at compile time you need to handle it?

(I still don't like that we only catch the error intermittantly... but there's
nothing we could really do about that, so javadocs should spell that
out).

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Peter Wolanin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677561#action_12677561 ] 

Peter Wolanin commented on LUCENE-1500:
---------------------------------------

I'm still trying to get a handle on how these pieces fit together., so sorry if I've jumped to the wrong conclusion.  If the analyzer is where the offsets are calculated, then that sounds like the place to look.

The field does use term vectors.  The field uses this type from the Solr schema:

{code}
    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
{code}

The full schema is
http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.30&pathrev=DRUPAL-6--1

the field is
{code}
<field name="body" type="text" indexed="true" stored="true" termVectors="true"/>
{code}

in case it's relevant, the solrconfig is:
http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/apachesolr/solrconfig.xml?revision=1.1.2.15&pathrev=DRUPAL-6--1

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.4.1, 2.9
>
>         Attachments: LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Harwood resolved LUCENE-1500.
----------------------------------

    Resolution: Fixed
      Assignee: Mark Harwood  (was: Mark Harwood)

Committed in revision: 758460 

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Mark Harwood
>             Fix For: 2.9
>
>         Attachments: Lucene-1500-NewException.patch, Lucene-1500-NewException.patch, LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Peter Wolanin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676421#action_12676421 ] 

pwolanin edited comment on LUCENE-1500 at 2/24/09 1:37 PM:
----------------------------------------------------------------

I have run into this issue over the last couple days.  Also using Solr, but the error is triggered by content that has multi-byte characters (such as German).

It seems that somewhere lucene is counting bytes instead of characters, so each substring the highlighter tries to select is offset further forward in the string being matched.

here's an example trying to highlight the string 'Drupaltalk' with "strong" tags
{code}
<p class="search-snippet">
 Community ist - und dieses Portal Dr<strong>upaltalk.d</strong>e samt seinem schon eifrigen Benutzer- und Gästezulauf ( ... 
 nter "Dru<strong>paltalk001</strong>" könnt Ihr die erste Konferenz noch mal nachhören und erfahren, wie Selbstorganisation in der Drupal Szene funktioniert.  
 "Dru<strong>paltalk002</strong>" ist dann der Talk vom Dienstag zum Thema "Drupal Al</p>
{code}
        
So the attached patch would probably avoid the exception (and is a good idea) but would not fix the bug I'm seeing.

      was (Author: pwolanin):
    I have run into this issue over the last couple days.  Also using Solr, but the error is triggered by content that has multi-byte characters (such as German).

It seems that somewhere lucene is counting bytes instead of characters, so each substring the highlighter tries to select is offset further forward in the string being matched.

here's an example trying to highlight the string 'Drupaltalk' with "strong" tags
{code}
<p class="search-snippet">
 Community ist - und dieses Portal Dr<strong>upaltalk.d</strong>e samt seinem schon eifrigen Benutzer- und Gästezulauf ( ... 
 nter "Dru<strong>paltalk001</strong>" könnt Ihr die erste Konferenz noch mal nachhören und erfahren, wie Selbstorganisation in der Drupal Szene funktioniert.  
 "Dru<strong>paltalk002</strong>" ist dann der Talk vom Dienstag zum Thema "Drupal Al</p>
{code}
        

  
> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>         Attachments: patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677983#action_12677983 ] 

Michael McCandless commented on LUCENE-1500:
--------------------------------------------

bq. What did you think of my suggestion to fix this problem at it's source - i.e. Token should never be in a state with end<start in the first place?

I'm wary of this, for the same reason as cited above.  I think the
start/end offsets are rather opaque to Lucene, and I would lean
towards asserting as little as possible about them during indexing.

(Not to mention the challenges, as you enumerated, of actually doing
this checking with the current API).

I think at point-of-use (highlighting) we should assert the offsets
are valid, for highlighting.


> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: Lucene-1500-NewException.patch, LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704047#action_12704047 ] 

Michael McCandless commented on LUCENE-1500:
--------------------------------------------

bq. Should it not solve the issue?

The change to highlighter (which is committed, on trunk) is simply to throw a more meaningful, checked exception (InvalidTokenOffsetsException) when the Token has out-of-bounds offsets.  On upgrading to this, this is an exception your code needs to handle.

But this issue doesn't fix the root causes that actually produce Tokens with incorrect offsets -- that's very app specific.  EG, the particular case that led to this issue was SOLR-925 (which is now fixed).

Are you actually hitting this exception at runtime, or noting that you need to handle it at compilation time?

> Highlighter throws StringIndexOutOfBoundsException
> --------------------------------------------------
>
>                 Key: LUCENE-1500
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1500
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>         Environment: Found this running the example code in Solr (latest version).
>            Reporter: David Bowen
>            Assignee: Mark Harwood
>             Fix For: 2.9
>
>         Attachments: Lucene-1500-NewException.patch, Lucene-1500-NewException.patch, LUCENE-1500.patch, patch.txt
>
>
> Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh):
> <add><doc>
>   <field name="id">Test for Highlighting StringIndexOutOfBoundsExcdption</field>
>   <field name="name">Some Name</field>
>   <field name="manu">Acme, Inc.</field>
>   <field name="features">Description of the features, mentioning various things</field>
>   <field name="features">Features also is multivalued</field>
>   <field name="popularity">6</field>
>   <field name="inStock">true</field>
> </doc></add>
> and then the URL http://localhost:8983/solr/select/?q=features&hl=true&hl.fl=features caused the exception.
> I have a patch.  I don't know if it is completely correct, but it avoids this exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org