You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Brian Whitman (JIRA)" <ji...@apache.org> on 2008/04/28 21:59:55 UTC

[jira] Created: (SOLR-553) Highlighter does not match phrase queries correctly

Highlighter does not match phrase queries correctly
---------------------------------------------------

                 Key: SOLR-553
                 URL: https://issues.apache.org/jira/browse/SOLR-553
             Project: Solr
          Issue Type: Bug
          Components: highlighter
    Affects Versions: 1.2
         Environment: all
            Reporter: Brian Whitman


http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html

PhraseQueries like "A Long String" will return highlighting matches that only match "String" or "String Long" or any combination. We need them to return <span>A Long String</span> instead.

LUCENE-794 seems to be added to trunk now and corrects it from their end. 



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated SOLR-553:
---------------------------------

    Attachment: SOLR-553-SC.patch

Otis,

Here's a patch that fixes the Spell Checker test that gets broken when you upgrade the Lucene jars.

> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Otis Gospodnetic
>             Fix For: 1.3
>
>         Attachments: highlighttest.xml, SOLR-553-SC.patch, Solr-553.patch, Solr-553.patch, Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597680#action_12597680 ] 

Otis Gospodnetic commented on SOLR-553:
---------------------------------------

I think there are no pure vs. mixed situation any more.  If usePH=true we use SpanScorer otherwise we use QueryScorer, or at least that's how I read the patch.


{code:DefaultSolrHighlighter.java:295-304|borderStyle=solid}
          if (Boolean.valueOf(req.getParams().get(HighlightParams.USE_PHRASE_HIGHLIGHTER))) {
            // wrap CachingTokenFilter around TokenStream for reuse
            tstream = new CachingTokenFilter(tstream);
            
            // get highlighter
            highlighter = getPhraseHighlighter(query, fieldName, req, (CachingTokenFilter) tstream);
            
            // after highlighter initialization, reset tstream since construction of highlighter already used it
            tstream.reset();
          }
          else {
            // use "the old way"
            highlighter = getHighlighter(query, fieldName, req);
          }
{code}


> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Otis Gospodnetic
>         Attachments: highlighttest.xml, Solr-553.patch, Solr-553.patch, Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Mike Klaas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596914#action_12596914 ] 

Mike Klaas commented on SOLR-553:
---------------------------------

What do people think of making span highlighting the default behaviour if the query contains phrases?  It might be better to have the default behaviour that which people expect, even if it is technically different output from 1.2.

> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>         Attachments: highlighttest.xml, Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Bojan Smid (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596793#action_12596793 ] 

Bojan Smid commented on SOLR-553:
---------------------------------

I made a fix, patch is uploaded. LUCENE-794 is now incorporated into default Solr highlighter.

Old way of highlighting is still retained and will be used in case requests to Solr Highlighter remain the same as they were (same request parameters). New functionality is invoked by adding another request parameter to URL, hl.usePhraseHighlighter=true.

So, for URL:
http://localhost:8983/solr/select?q=features:%22ax%20bx%20cx%22&hl=on&hl.fl=features&hl.fragsize=20&hl.snippets=10

results will be the same as they were, but in case you want to use this fix (and have correct phrase highlighting), the URL would look like this:

http://localhost:8983/solr/select?q=features:%22ax%20bx%20cx%22&hl=on&hl.fl=features&hl.fragsize=20&hl.snippets=10&hl.usePhraseHighlighter=true

This patch needs latest lucene-highlighter-*.jar and lucene-memory-*.jar from trunk (since LUCENE-794 fix is committed there).

> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>         Attachments: highlighttest.xml
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Brian Whitman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Whitman updated SOLR-553:
-------------------------------

    Description: 
http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html

Say we search for the band "I Love You But I've Chosen Darkness"
.../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E

The highlight returns a snippet that does have the name altogether:

Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :

But also returns unrelated snips from the same page:

Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"

A correct highlighter should not return snippets that do not match the phrase exactly.

LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.

Related: SOLR-575 


  was:
http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html

Say we search for the band "I Love You But I've Chosen Darkness"
.../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E

The highlight returns a snippet that does have the name altogether:

Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :

But also returns unrelated snips from the same page:

Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"

A correct highlighter should only return

Lights (Live) : <span>I Love You But I've Chosen Darkness</span>

And no snippets that do not match the phrase exactly.

LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.




> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>         Attachments: highlighttest.xml
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12598494#action_12598494 ] 

Mark Miller commented on SOLR-553:
----------------------------------

>Probably best to create a new ticket (if necessary) about the <span>ax</span> <span>bx</span> instead of <span>ax bx</span> problem. That >highlights have incorrect matches is far worse. I'll adjust the problem description.

If I remember correctly, this was an ease of implementation issue. Part of it was fitting into the current Highlighter framework (individual tokens are scored and highlighted) and part of it was ease in general I think. I am not sure that it would be too easy to alter.

It's very easy to do with the new Highlighter I have been working on, the LargeDocHighlighter. It breaks from the current API, and makes this type of highlight markup quite easy. It may never see the light of day though...to do what I want, all parts of the query need to be located with the MemoryIndex, and the time this takes on non position sensitive queries clauses is almost equal to the savings I get from not iterating through and scoring each token in a TokenStream. I do still have hopes I can pull something off though, and it may end up being useful for something else.

For now though, Highlighting each each token seems a small inconvenience to retain all the old Highlighters tests, corner cases, and speed in non position sensitive scoring. Thats not to say there will not be a way if you take a look at the code though.

> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Otis Gospodnetic
>         Attachments: highlighttest.xml, Solr-553.patch, Solr-553.patch, Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Brian Whitman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596915#action_12596915 ] 

Brian Whitman commented on SOLR-553:
------------------------------------

+1 on making it default if there was a phrasequery. The "old" way comes across as a bad bug if you're displaying the highlights for your search results.



> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>         Attachments: highlighttest.xml, Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll reopened SOLR-553:
----------------------------------


Needs new Lucene jars, per the earlier comments.  Build is currently broken.

> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Otis Gospodnetic
>             Fix For: 1.3
>
>         Attachments: highlighttest.xml, Solr-553.patch, Solr-553.patch, Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Work started: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on SOLR-553 started by Otis Gospodnetic.

> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Otis Gospodnetic
>         Attachments: highlighttest.xml, Solr-553.patch, Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Brian Whitman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Whitman updated SOLR-553:
-------------------------------

    Description: 
http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html

Say we search for the band "I Love You But I've Chosen Darkness"
.../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E

The highlight returns a snippet that does have the name altogether:

Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :

But also returns unrelated snips from the same page:

Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"

A correct highlighter should only return

Lights (Live) : <span>I Love You But I've Chosen Darkness</span>

And no snippets that do not match the phrase exactly.

LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.



  was:
http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html

PhraseQueries like "A Long String" will return highlighting matches that only match "String" or "String Long" or any combination. We need them to return <span>A Long String</span> instead.

LUCENE-794 seems to be added to trunk now and corrects it from their end. 




> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: Bug
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should only return
> Lights (Live) : <span>I Love You But I've Chosen Darkness</span>
> And no snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599669#action_12599669 ] 

Mark Miller commented on SOLR-553:
----------------------------------

Just to point out, as I am not sure its clear, the SpanScorer is just as fast as the old Scorer when no Phrase's, or Span's are in the query. Mark H actually tested it as slightly faster, though thats a bit odd.

When there is a Span or Phrase, none Span/Phrase clauses of the Query are still highlighted the same and at the same speed as the original Scorer...it is just the Span/Phrase clauses that fire up a MemoryIndex and have getSpans called against it.

So you really only pay for the extra position sensitive part where actually needed.

> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Otis Gospodnetic
>             Fix For: 1.3
>
>         Attachments: highlighttest.xml, SOLR-553-SC.patch, Solr-553.patch, Solr-553.patch, Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Brian Whitman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596882#action_12596882 ] 

Brian Whitman commented on SOLR-553:
------------------------------------

Patch works for me on the highlighttest.xml. thanks Bojan!!



> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>         Attachments: highlighttest.xml, Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll resolved SOLR-553.
----------------------------------

    Resolution: Fixed
      Assignee: Grant Ingersoll  (was: Otis Gospodnetic)

I committed the new JARs and fixed the SpellChecker test

> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Grant Ingersoll
>             Fix For: 1.3
>
>         Attachments: highlighttest.xml, SOLR-553-SC.patch, Solr-553.patch, Solr-553.patch, Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Bojan Smid (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bojan Smid updated SOLR-553:
----------------------------

    Attachment: Solr-553.patch

Added unit test for this fix to the patch.

> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Otis Gospodnetic
>         Attachments: highlighttest.xml, Solr-553.patch, Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596924#action_12596924 ] 

Otis Gospodnetic commented on SOLR-553:
---------------------------------------

+1 for making it the default - it makes more sense than the old HL that highlighted other matching tokens
that were not a part of the given phrase.


> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>         Attachments: highlighttest.xml, Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599825#action_12599825 ] 

Otis Gospodnetic commented on SOLR-553:
---------------------------------------

If I understood Mark correctly, he is saying we can just have usePhraseHighlighter=true
be the default and it won't hurt performance.  Should we do that, and allow one to get the
old behaviour with usePhraseHighlighter=false if they really prefer the old highlighting?


> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Grant Ingersoll
>             Fix For: 1.3
>
>         Attachments: highlighttest.xml, SOLR-553-SC.patch, Solr-553.patch, Solr-553.patch, Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Brian Whitman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596402#action_12596402 ] 

Brian Whitman commented on SOLR-553:
------------------------------------

Probably best to create a new ticket (if necessary) about the <span>ax</span> <span>bx</span> instead of <span>ax bx</span> problem. That highlights have incorrect matches is far worse. I'll adjust the problem description.



> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>         Attachments: highlighttest.xml
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should only return
> Lights (Live) : <span>I Love You But I've Chosen Darkness</span>
> And no snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Bojan Smid (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bojan Smid updated SOLR-553:
----------------------------

    Attachment: Solr-553.patch

Patch for Solr-553 (uses Lucene-794 highlighting fix)

> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>         Attachments: highlighttest.xml, Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated SOLR-553:
----------------------------------

    Comment: was deleted

> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>         Attachments: highlighttest.xml, Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Bojan Smid (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596350#action_12596350 ] 

Bojan Smid commented on SOLR-553:
---------------------------------

I am playing around with LUCENE-794 integration into Solr. I have two options:

1) add LUCENE-794 code to current implementation in DefaultSolrHighlighter where client would provide request parameter (say useSpanScorer) if he wants to use new functionality. In case he didn't provide the parameter, he would get old functionality.

or

2) to provide LUCENE-794 highlighting in new SolrHighlighter, for instance in class PhraseQuerySolrHighlighter

I would appreciate any comments on this.

Also, since I already test some of this code, I noticed that we still wouldn't get exact behavior from description. For instance, in text  ax bx cx dx ax bx

for phrase query "ax bx cx"

the result is : <span>ax</span><span>bx</span><span>cx</span> dx ax bx

Which means that we got fix part of the problem (words from unrelated snippets are no longer highlighted), but we still wouldn't get whole phrase highlighted inside single tag.

> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>         Attachments: highlighttest.xml
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should only return
> Lights (Live) : <span>I Love You But I've Chosen Darkness</span>
> And no snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Brian Whitman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597647#action_12597647 ] 

Brian Whitman commented on SOLR-553:
------------------------------------

just FYI, I've tested this on a much larger/realworld index and it works great. 



> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Otis Gospodnetic
>         Attachments: highlighttest.xml, Solr-553.patch, Solr-553.patch, Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic resolved SOLR-553.
-----------------------------------

    Resolution: Fixed

Thanks Bojan.

Committed revision 659664.


> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Otis Gospodnetic
>             Fix For: 1.3
>
>         Attachments: highlighttest.xml, Solr-553.patch, Solr-553.patch, Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated SOLR-553:
----------------------------------

    Attachment: Solr-553.patch

Added explicit check for usePhraseHighlighter=true to avoid things like usePhraseHighlighter=false to turn it on.

I'll commit shortly, along with a fresh lucene-highlighter-2.4-dev.jar built from from Lucene trunk.


> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Otis Gospodnetic
>         Attachments: highlighttest.xml, Solr-553.patch, Solr-553.patch, Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Mike Klaas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597657#action_12597657 ] 

Mike Klaas commented on SOLR-553:
---------------------------------

[quote]Added explicit check for usePhraseHighlighter=true to avoid things like usePhraseHighlighter=false to turn it on.[/quote]

I'm not sure I follow you here.  Just to verify:

 - the default is to use SpanScorer when the query is a "pure" phrase query
 - you can force SS with usePhraseHighlighting
 - queries that are mixed queries with keywords and phrases are still problematic.

If this is correct, is there any point in the usePhraseHighlighter parameter?  I don't see where it would entail different behaviour.  Also, what are the consequences for dismax queries (pure or mixed)?

> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Otis Gospodnetic
>         Attachments: highlighttest.xml, Solr-553.patch, Solr-553.patch, Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Brian Whitman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Whitman updated SOLR-553:
-------------------------------

    Attachment: highlighttest.xml

Attaching a base test case document xml to post to the trunk solr example to see the problem. 

Steps to reproduce:
1) checkout solr-trunk
2) ant example
3) java -jar start.jar
4) post.sh highlighttest.xml
5) query: http://localhost:8983/solr/select?q=features:%22ax%20bx%20cx%22&hl=on&hl.fl=features&hl.fragsize=20&hl.snippets=10

Expected results: the only highlight snip results returned should be <em>ax bx cx</em> and nothing else. 


> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>         Attachments: highlighttest.xml
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should only return
> Lights (Live) : <span>I Love You But I've Chosen Darkness</span>
> And no snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Bojan Smid (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596350#action_12596350 ] 

bosmid edited comment on SOLR-553 at 5/13/08 4:06 AM:
----------------------------------------------------------

I am playing around with LUCENE-794 integration into Solr. I have two options:

1) add LUCENE-794 code to current implementation in DefaultSolrHighlighter where client would provide request parameter (say useSpanScorer) if he wants to use new functionality. In case he didn't provide the parameter, he would get old functionality.

or

2) to provide LUCENE-794 highlighting in new SolrHighlighter, for instance in class PhraseQuerySolrHighlighter

I would appreciate any comments on this.

Also, since I already test some of this code, I noticed that we still wouldn't get exact behavior from description. For instance, in text  ax bx cx dx ax bx

for phrase query "ax bx cx"

the result is : <span>ax</span><span>bx</span><span>cx</span> dx ax bx

Which means that we got a fix for part of the problem (words from unrelated snippets are no longer highlighted), but we still wouldn't get whole phrase highlighted inside single tag.

      was (Author: bosmid):
    I am playing around with LUCENE-794 integration into Solr. I have two options:

1) add LUCENE-794 code to current implementation in DefaultSolrHighlighter where client would provide request parameter (say useSpanScorer) if he wants to use new functionality. In case he didn't provide the parameter, he would get old functionality.

or

2) to provide LUCENE-794 highlighting in new SolrHighlighter, for instance in class PhraseQuerySolrHighlighter

I would appreciate any comments on this.

Also, since I already test some of this code, I noticed that we still wouldn't get exact behavior from description. For instance, in text  ax bx cx dx ax bx

for phrase query "ax bx cx"

the result is : <span>ax</span><span>bx</span><span>cx</span> dx ax bx

Which means that we got fix part of the problem (words from unrelated snippets are no longer highlighted), but we still wouldn't get whole phrase highlighted inside single tag.
  
> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>         Attachments: highlighttest.xml
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should only return
> Lights (Live) : <span>I Love You But I've Chosen Darkness</span>
> And no snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic reassigned SOLR-553:
-------------------------------------

    Assignee: Otis Gospodnetic

> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Otis Gospodnetic
>         Attachments: highlighttest.xml, Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596558#action_12596558 ] 

Otis Gospodnetic commented on SOLR-553:
---------------------------------------

For cross-reference - SOLR-575 will address merging of highlighted tokens that are part of a phrase.

As for the direction to take with integrating the new phrase highlighting support, I think the support for proper
highlighting of *pure phrase* queries should be added to DefaultSolrHighlighter (DSH) and enabled via useSpanScorer.
DSH could do query instanceof PhraseQuery and run the new code if useSS is on.
Sounds reasonable?

Note that I highlighted (eh) *pure phrase*, as it seems that LUCENE-794 doesn't fix cases where we have a phrase that
is a part of a BooleanQuery (e.g. foo AND bar OR "peanut butter")

Patch coming tomorrow...


> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>         Attachments: highlighttest.xml
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated SOLR-553:
----------------------------------

    Fix Version/s: 1.3

> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Otis Gospodnetic
>             Fix For: 1.3
>
>         Attachments: highlighttest.xml, Solr-553.patch, Solr-553.patch, Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597680#action_12597680 ] 

otis edited comment on SOLR-553 at 5/16/08 6:40 PM:
----------------------------------------------------------------

I think there are no pure vs. mixed situation any more.  If usePH=true we use SpanScorer otherwise we use QueryScorer, or at least that's how I read the patch.


{code:title=DefaultSolrHighlighter.java:295-304|borderStyle=solid}
          if (Boolean.valueOf(req.getParams().get(HighlightParams.USE_PHRASE_HIGHLIGHTER))) {
            // wrap CachingTokenFilter around TokenStream for reuse
            tstream = new CachingTokenFilter(tstream);
            
            // get highlighter
            highlighter = getPhraseHighlighter(query, fieldName, req, (CachingTokenFilter) tstream);
            
            // after highlighter initialization, reset tstream since construction of highlighter already used it
            tstream.reset();
          }
          else {
            // use "the old way"
            highlighter = getHighlighter(query, fieldName, req);
          }
{code}


      was (Author: otis):
    I think there are no pure vs. mixed situation any more.  If usePH=true we use SpanScorer otherwise we use QueryScorer, or at least that's how I read the patch.


{code:DefaultSolrHighlighter.java:295-304|borderStyle=solid}
          if (Boolean.valueOf(req.getParams().get(HighlightParams.USE_PHRASE_HIGHLIGHTER))) {
            // wrap CachingTokenFilter around TokenStream for reuse
            tstream = new CachingTokenFilter(tstream);
            
            // get highlighter
            highlighter = getPhraseHighlighter(query, fieldName, req, (CachingTokenFilter) tstream);
            
            // after highlighter initialization, reset tstream since construction of highlighter already used it
            tstream.reset();
          }
          else {
            // use "the old way"
            highlighter = getHighlighter(query, fieldName, req);
          }
{code}

  
> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Otis Gospodnetic
>         Attachments: highlighttest.xml, Solr-553.patch, Solr-553.patch, Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-553) Highlighter does not match phrase queries correctly

Posted by "Mike Klaas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Klaas updated SOLR-553:
----------------------------

    Issue Type: New Feature  (was: Bug)

Changed to feature request, since the current behaviour is expected.  I'd be happy to review a patch to use SpanScorer in Solr, though.

> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> <span>You</span>"
> A correct highlighter should only return
> Lights (Live) : <span>I Love You But I've Chosen Darkness</span>
> And no snippets that do not match the phrase exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem from the Lucene end. Solr should get it too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.