You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mark Miller (JIRA)" <ji...@apache.org> on 2009/06/11 21:50:07 UTC

[jira] Created: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Make the Highlighter use SpanScorer by default
----------------------------------------------

                 Key: LUCENE-1685
                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
             Project: Lucene - Java
          Issue Type: Improvement
            Reporter: Mark Miller
            Assignee: Mark Miller
            Priority: Minor


I've always thought this made sense, but frankly, it took me a year to get the SpanScorer included with Lucene at all, so I was pretty much ready to move on after I it got in, rather than push for it as a default.

I think it makes sense as the default in Solr as well, and I mentioned that back when it was put in, but alas, its an option there as well.

The Highlighter package has no back compat req, but custom has been conservative - one reason I havn't pushed for this change before. Might be best to actually make the switch in 3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty much have to stumble upon it, and figure out what its for.

I'll point out again that its just as fast as the standard scorer for any clause of a query that is not position sensitive. Position sensitive query clauses will obviously be somewhat slower to highlight, but that is because they will be highlighted correctly rather than ignoring position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718855#action_12718855 ] 

Michael McCandless commented on LUCENE-1685:
--------------------------------------------

I consider it a bug that QueryScorer will separately highlight "foo" and "bar" when the PhraseQuery "foo bar" was searched on.

Are there actually compelling things that QueryScorer does over SpanScorer?

bq. Actually, perhaps we deprecate SpanScorer and add the functionality to QueryScorer with the switch, default to position sensitive.

+1

And that way we keep the more consumable name (QueryScorer).

> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
>                 Key: LUCENE-1685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>
> I've always thought this made sense, but frankly, it took me a year to get the SpanScorer included with Lucene at all, so I was pretty much ready to move on after I it got in, rather than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that back when it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been conservative - one reason I havn't pushed for this change before. Might be best to actually make the switch in 3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty much have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any clause of a query that is not position sensitive. Position sensitive query clauses will obviously be somewhat slower to highlight, but that is because they will be highlighted correctly rather than ignoring position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718858#action_12718858 ] 

Yonik Seeley commented on LUCENE-1685:
--------------------------------------

bq. I consider it a bug that QueryScorer will separately highlight "foo" and "bar" when the PhraseQuery "foo bar" was searched on. 

Right... but not everyone will agree.
We shouldn't deprecate functionality that we don't have a replacement for yet (esp since we'll be quickly removing deprecated stuff in 3.0).

> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
>                 Key: LUCENE-1685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>
> I've always thought this made sense, but frankly, it took me a year to get the SpanScorer included with Lucene at all, so I was pretty much ready to move on after I it got in, rather than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that back when it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been conservative - one reason I havn't pushed for this change before. Might be best to actually make the switch in 3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty much have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any clause of a query that is not position sensitive. Position sensitive query clauses will obviously be somewhat slower to highlight, but that is because they will be highlighted correctly rather than ignoring position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-1685:
--------------------------------

    Attachment: LUCENE-1685.patch

Another rev making things a little easier.

QueryScorer now takes a TokenStream rather than a CachingTokenFilter - if there are any position sensitive clauses, the TokenStream will be wrapped in a CachingTokenFilter if it is not already a CachingTokenFilter.

This also removes having to call setTokenStream after constructing a QueryScorer and between calls to getBestFragment - instead, the new init(TokenStream) that the Highlighter already calls is used. This frees the user from having to make that call.

init(TokenStream) now can return a new TokenStream for the Highlighter to continue using (ie the QueryScorer may return a CachingTokenFilter if their is a position sensitive clause in the query) or null to keep using the same TokenStream.

Now you can use the SpanScorer (as QueryScorer now) the same way you could use the old QueryScorer impl:

    QueryScorer scorer =  new QueryScorer(query, FIELD_NAME);
    Highlighter highlighter = new Highlighter(this,scorer);
    highlighter.setTextFragmenter(new SimpleFragmenter(40));
    
    for (int i = 0; i < hits.length(); i++) {
      String text = hits.doc(i).get(FIELD_NAME);
      TokenStream tokenStream = analyzer.tokenStream(FIELD_NAME, new StringReader(text));

      String result = highlighter.getBestFragments(tokenStream, text, maxNumFragmentsRequired,
          "...");
      System.out.println("\t" + result);
    }

> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
>                 Key: LUCENE-1685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1685.patch, LUCENE-1685.patch
>
>
> I've always thought this made sense, but frankly, it took me a year to get the SpanScorer included with Lucene at all, so I was pretty much ready to move on after I it got in, rather than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that back when it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been conservative - one reason I havn't pushed for this change before. Might be best to actually make the switch in 3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty much have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any clause of a query that is not position sensitive. Position sensitive query clauses will obviously be somewhat slower to highlight, but that is because they will be highlighted correctly rather than ignoring position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-1685:
--------------------------------

    Attachment: LUCENE-1685.patch

Changed the constructors for QueryScorer to more closely match what was available before. Also, expandMultiTerm now defaults to true, and instead of being a constructor option, can be disabled with a method.

Also cleaned up a bit more of the test class and added a Changes entry.

> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
>                 Key: LUCENE-1685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1685.patch, LUCENE-1685.patch, LUCENE-1685.patch
>
>
> I've always thought this made sense, but frankly, it took me a year to get the SpanScorer included with Lucene at all, so I was pretty much ready to move on after I it got in, rather than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that back when it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been conservative - one reason I havn't pushed for this change before. Might be best to actually make the switch in 3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty much have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any clause of a query that is not position sensitive. Position sensitive query clauses will obviously be somewhat slower to highlight, but that is because they will be highlighted correctly rather than ignoring position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718660#action_12718660 ] 

Mark Miller commented on LUCENE-1685:
-------------------------------------

which reminds me, highlighter has no changes file. I'll make one as well.

> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
>                 Key: LUCENE-1685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>
> I've always thought this made sense, but frankly, it took me a year to get the SpanScorer included with Lucene at all, so I was pretty much ready to move on after I it got in, rather than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that back when it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been conservative - one reason I havn't pushed for this change before. Might be best to actually make the switch in 3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty much have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any clause of a query that is not position sensitive. Position sensitive query clauses will obviously be somewhat slower to highlight, but that is because they will be highlighted correctly rather than ignoring position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718861#action_12718861 ] 

Michael McCandless commented on LUCENE-1685:
--------------------------------------------

OK.

I think Mark's idea is great: absorb both into QueryScorer, making "position aware" the default.  Then if people somehow want the buggy PhraseQuery highlighting, they can switch it back.

> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
>                 Key: LUCENE-1685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>
> I've always thought this made sense, but frankly, it took me a year to get the SpanScorer included with Lucene at all, so I was pretty much ready to move on after I it got in, rather than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that back when it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been conservative - one reason I havn't pushed for this change before. Might be best to actually make the switch in 3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty much have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any clause of a query that is not position sensitive. Position sensitive query clauses will obviously be somewhat slower to highlight, but that is because they will be highlighted correctly rather than ignoring position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller resolved LUCENE-1685.
---------------------------------

       Resolution: Fixed
    Lucene Fields: [New, Patch Available]  (was: [New])

> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
>                 Key: LUCENE-1685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1685.patch, LUCENE-1685.patch, LUCENE-1685.patch
>
>
> I've always thought this made sense, but frankly, it took me a year to get the SpanScorer included with Lucene at all, so I was pretty much ready to move on after I it got in, rather than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that back when it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been conservative - one reason I havn't pushed for this change before. Might be best to actually make the switch in 3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty much have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any clause of a query that is not position sensitive. Position sensitive query clauses will obviously be somewhat slower to highlight, but that is because they will be highlighted correctly rather than ignoring position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738560#action_12738560 ] 

Mark Miller commented on LUCENE-1685:
-------------------------------------

I'll commit this within a few days

> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
>                 Key: LUCENE-1685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1685.patch, LUCENE-1685.patch, LUCENE-1685.patch
>
>
> I've always thought this made sense, but frankly, it took me a year to get the SpanScorer included with Lucene at all, so I was pretty much ready to move on after I it got in, rather than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that back when it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been conservative - one reason I havn't pushed for this change before. Might be best to actually make the switch in 3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty much have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any clause of a query that is not position sensitive. Position sensitive query clauses will obviously be somewhat slower to highlight, but that is because they will be highlighted correctly rather than ignoring position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718773#action_12718773 ] 

Michael McCandless commented on LUCENE-1685:
--------------------------------------------

Why not deprecate QueryScorer?  It's buggy, and leaving it in there, with such a juicy name, looking like the right choice, just makes Lucene's (highlighter's) quality look bad.  Correctness trumps performance.

And then the javadocs should clearly favor SpanScorer... and I would include a clear code fragment showing how to use it all, in context.  EG this is what LIA2 currently has, which is fine to copy/modify/etc. to get into the javadocs:

{code}
  public void testHits() throws Exception {
    IndexSearcher searcher = new IndexSearcher(TestUtil.getBookIndexDirectory());
    TermQuery query = new TermQuery(new Term("title", "action"));
    TopDocs hits = searcher.search(query, 10);

    Highlighter highlighter = new Highlighter(null);
    Analyzer analyzer = new SimpleAnalyzer();
    
    for (int i = 0; i < hits.scoreDocs.length; i++) {
      Document doc = searcher.doc(hits.scoreDocs[i].doc);
      String title = doc.get("title");

      TokenStream stream = TokenSources.getAnyTokenStream(searcher.getIndexReader(),
                                                          hits.scoreDocs[i].doc,
                                                          "title",
                                                          doc,
                                                          analyzer);
      SpanScorer scorer = new SpanScorer(query, "title",
                                         new CachingTokenFilter(stream));
      Fragmenter fragmenter = new SimpleSpanFragmenter(scorer);
      highlighter.setFragmentScorer(scorer);
      highlighter.setTextFragmenter(fragmenter);

      String fragment =
          highlighter.getBestFragment(stream, title);

      System.out.println(fragment);
    }
  }
{code}

It would also be nice to simplify that usage, eg, is there some way to not have to make a SpanScorer (and, by extension, fragmenter) per query, but instead make it up-front and add a setter for the new TokenStream for each doc?  (Having to create Highlighter(null) is awkward).  Or I suppose we could simply make a new Highlighter, SpanScorer, SimpleSpanFragmenter per-hit, but that seems wasteful.

> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
>                 Key: LUCENE-1685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>
> I've always thought this made sense, but frankly, it took me a year to get the SpanScorer included with Lucene at all, so I was pretty much ready to move on after I it got in, rather than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that back when it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been conservative - one reason I havn't pushed for this change before. Might be best to actually make the switch in 3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty much have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any clause of a query that is not position sensitive. Position sensitive query clauses will obviously be somewhat slower to highlight, but that is because they will be highlighted correctly rather than ignoring position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718658#action_12718658 ] 

Mark Miller commented on LUCENE-1685:
-------------------------------------

Didn't even have the actual highlighter code in my mind - you have to pass the Scorer to construct one anyway, so no back compat issue to speak of in any case.

The real change will be in the documentation, and I suppose adding something to changes mentioning that you should probably switch? Can't bring myself to say that we should deprecate the QueryScorer - why not have both - but it would be nice to point out that the SpanScorer is the new "default" Scorer for correct highlighting.

I'll work on a patch for the documentation and a changes entry suggestion. I'm not sure there is anything stronger we can do here.

> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
>                 Key: LUCENE-1685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>
> I've always thought this made sense, but frankly, it took me a year to get the SpanScorer included with Lucene at all, so I was pretty much ready to move on after I it got in, rather than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that back when it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been conservative - one reason I havn't pushed for this change before. Might be best to actually make the switch in 3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty much have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any clause of a query that is not position sensitive. Position sensitive query clauses will obviously be somewhat slower to highlight, but that is because they will be highlighted correctly rather than ignoring position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Reopened: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller reopened LUCENE-1685:
---------------------------------


> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
>                 Key: LUCENE-1685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1685.patch, LUCENE-1685.patch, LUCENE-1685.patch
>
>
> I've always thought this made sense, but frankly, it took me a year to get the SpanScorer included with Lucene at all, so I was pretty much ready to move on after I it got in, rather than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that back when it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been conservative - one reason I havn't pushed for this change before. Might be best to actually make the switch in 3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty much have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any clause of a query that is not position sensitive. Position sensitive query clauses will obviously be somewhat slower to highlight, but that is because they will be highlighted correctly rather than ignoring position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-1685:
--------------------------------

    Attachment: LUCENE-1685.patch

Has the broad stokes - SpanScorer becomes QueryScorer, QueryScorer becomes QueryTermScorer, and QueryScorer gets a setTokenStream rather than passing it in the constructor.

Not sure how to best preserve any history here since SpanScorer is moving to QueryScorer.

> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
>                 Key: LUCENE-1685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1685.patch
>
>
> I've always thought this made sense, but frankly, it took me a year to get the SpanScorer included with Lucene at all, so I was pretty much ready to move on after I it got in, rather than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that back when it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been conservative - one reason I havn't pushed for this change before. Might be best to actually make the switch in 3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty much have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any clause of a query that is not position sensitive. Position sensitive query clauses will obviously be somewhat slower to highlight, but that is because they will be highlighted correctly rather than ignoring position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739833#action_12739833 ] 

Michael McCandless commented on LUCENE-1685:
--------------------------------------------

Should we also default the fragmenter to SimpleSpanFragmenter?

> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
>                 Key: LUCENE-1685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1685.patch, LUCENE-1685.patch, LUCENE-1685.patch
>
>
> I've always thought this made sense, but frankly, it took me a year to get the SpanScorer included with Lucene at all, so I was pretty much ready to move on after I it got in, rather than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that back when it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been conservative - one reason I havn't pushed for this change before. Might be best to actually make the switch in 3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty much have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any clause of a query that is not position sensitive. Position sensitive query clauses will obviously be somewhat slower to highlight, but that is because they will be highlighted correctly rather than ignoring position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718827#action_12718827 ] 

Yonik Seeley commented on LUCENE-1685:
--------------------------------------

I've never gone deep into the highlighters, but I don't think we should deprecate QueryScorer unless SpanScorer is a true superset (i.e. you can get SpanScorer to act like QueryScorer if you want... minus any real bugs).  Highlighting is not an exact science.  Given a query of 
{code}"foo bar" -baz{code}
Not everyone will agree (and it will be application specific) exactly what instances of foo, bar, and baz should be highlighted in the document.  But I agree that by default, we should try to only highlight what matches the query.

> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
>                 Key: LUCENE-1685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>
> I've always thought this made sense, but frankly, it took me a year to get the SpanScorer included with Lucene at all, so I was pretty much ready to move on after I it got in, rather than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that back when it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been conservative - one reason I havn't pushed for this change before. Might be best to actually make the switch in 3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty much have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any clause of a query that is not position sensitive. Position sensitive query clauses will obviously be somewhat slower to highlight, but that is because they will be highlighted correctly rather than ignoring position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Commented: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Posted by Mark Harwood <ma...@yahoo.co.uk>.
+1


On 11 Jun 2009, at 21:32, Michael McCandless (JIRA) wrote:

>
>    [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718629 
> #action_12718629 ]
>
> Michael McCandless commented on LUCENE-1685:
> --------------------------------------------
>
> bq. Make the Highlighter use SpanScorer by default
>
> +1!
>
> bq. I think it makes sense as the default in Solr as well, and I  
> mentioned that back when it was put in, but alas, its an option  
> there as well.
>
> +1
>
> bq. It has never been listed in a changes entry and its not in LIA  
> 1, so you pretty much have to stumble upon it, and figure out what  
> its for.
>
> And... in working on LIA2, I had to ask for help on how to use it ;)
>
> Consumability is important.
>
> Can we do this for 2.9?
>
> I think not being buggy by default (w/ PhraseQuery, eg) is far more  
> important that a small loss in performance.  Performance is  
> secondary to correctness.
>
>> Make the Highlighter use SpanScorer by default
>> ----------------------------------------------
>>
>>                Key: LUCENE-1685
>>                URL: https://issues.apache.org/jira/browse/LUCENE-1685
>>            Project: Lucene - Java
>>         Issue Type: Improvement
>>           Reporter: Mark Miller
>>           Assignee: Mark Miller
>>           Priority: Minor
>>
>> I've always thought this made sense, but frankly, it took me a year  
>> to get the SpanScorer included with Lucene at all, so I was pretty  
>> much ready to move on after I it got in, rather than push for it as  
>> a default.
>> I think it makes sense as the default in Solr as well, and I  
>> mentioned that back when it was put in, but alas, its an option  
>> there as well.
>> The Highlighter package has no back compat req, but custom has been  
>> conservative - one reason I havn't pushed for this change before.  
>> Might be best to actually make the switch in 3? I could go either  
>> way - as is, I know a bunch of people use it, but I'm betting its  
>> the large minority. It has never been listed in a changes entry and  
>> its not in LIA 1, so you pretty much have to stumble upon it, and  
>> figure out what its for.
>> I'll point out again that its just as fast as the standard scorer  
>> for any clause of a query that is not position sensitive. Position  
>> sensitive query clauses will obviously be somewhat slower to  
>> highlight, but that is because they will be highlighted correctly  
>> rather than ignoring position.
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718629#action_12718629 ] 

Michael McCandless commented on LUCENE-1685:
--------------------------------------------

bq. Make the Highlighter use SpanScorer by default

+1!

bq. I think it makes sense as the default in Solr as well, and I mentioned that back when it was put in, but alas, its an option there as well.

+1

bq. It has never been listed in a changes entry and its not in LIA 1, so you pretty much have to stumble upon it, and figure out what its for.

And... in working on LIA2, I had to ask for help on how to use it ;)

Consumability is important.

Can we do this for 2.9?

I think not being buggy by default (w/ PhraseQuery, eg) is far more important that a small loss in performance.  Performance is secondary to correctness.

> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
>                 Key: LUCENE-1685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>
> I've always thought this made sense, but frankly, it took me a year to get the SpanScorer included with Lucene at all, so I was pretty much ready to move on after I it got in, rather than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that back when it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been conservative - one reason I havn't pushed for this change before. Might be best to actually make the switch in 3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty much have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any clause of a query that is not position sensitive. Position sensitive query clauses will obviously be somewhat slower to highlight, but that is because they will be highlighted correctly rather than ignoring position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739319#action_12739319 ] 

Mark Miller commented on LUCENE-1685:
-------------------------------------

I reopened this because I saw that changing benchmark to use QueryScorer rather than QueryTermScorer was failing, and at first it looked like it wasn't producing Highlights - I think the issue is with the Benchmark code (it just wans't counting the highlights), but I have reopened this just in case. When I can test and know for sure, I'll resolve this again.

> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
>                 Key: LUCENE-1685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1685.patch, LUCENE-1685.patch, LUCENE-1685.patch
>
>
> I've always thought this made sense, but frankly, it took me a year to get the SpanScorer included with Lucene at all, so I was pretty much ready to move on after I it got in, rather than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that back when it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been conservative - one reason I havn't pushed for this change before. Might be best to actually make the switch in 3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty much have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any clause of a query that is not position sensitive. Position sensitive query clauses will obviously be somewhat slower to highlight, but that is because they will be highlighted correctly rather than ignoring position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller resolved LUCENE-1685.
---------------------------------

    Resolution: Fixed

> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
>                 Key: LUCENE-1685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1685.patch, LUCENE-1685.patch, LUCENE-1685.patch
>
>
> I've always thought this made sense, but frankly, it took me a year to get the SpanScorer included with Lucene at all, so I was pretty much ready to move on after I it got in, rather than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that back when it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been conservative - one reason I havn't pushed for this change before. Might be best to actually make the switch in 3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty much have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any clause of a query that is not position sensitive. Position sensitive query clauses will obviously be somewhat slower to highlight, but that is because they will be highlighted correctly rather than ignoring position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-1685:
--------------------------------

    Fix Version/s: 2.9

> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
>                 Key: LUCENE-1685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>
> I've always thought this made sense, but frankly, it took me a year to get the SpanScorer included with Lucene at all, so I was pretty much ready to move on after I it got in, rather than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that back when it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been conservative - one reason I havn't pushed for this change before. Might be best to actually make the switch in 3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty much have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any clause of a query that is not position sensitive. Position sensitive query clauses will obviously be somewhat slower to highlight, but that is because they will be highlighted correctly rather than ignoring position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1685) Make the Highlighter use SpanScorer by default

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718838#action_12718838 ] 

Mark Miller commented on LUCENE-1685:
-------------------------------------

Agreed on all that Mike, I'll try to do that for 2.9.

I also wanted to deprecate QueryScorer for a while, but I agree with Yonik that its kind of a feature, and we shouldn't toss it. You have a great point that keeping it around keeps things
confusing though. I can probably make the SpanScorer easily flip between both modes, defaulting to position sensitive.

Actually, perhaps we deprecate SpanScorer and add the functionality to QueryScorer with the switch, default to position sensitive.

> Make the Highlighter use SpanScorer by default
> ----------------------------------------------
>
>                 Key: LUCENE-1685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1685
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>
> I've always thought this made sense, but frankly, it took me a year to get the SpanScorer included with Lucene at all, so I was pretty much ready to move on after I it got in, rather than push for it as a default.
> I think it makes sense as the default in Solr as well, and I mentioned that back when it was put in, but alas, its an option there as well.
> The Highlighter package has no back compat req, but custom has been conservative - one reason I havn't pushed for this change before. Might be best to actually make the switch in 3? I could go either way - as is, I know a bunch of people use it, but I'm betting its the large minority. It has never been listed in a changes entry and its not in LIA 1, so you pretty much have to stumble upon it, and figure out what its for.
> I'll point out again that its just as fast as the standard scorer for any clause of a query that is not position sensitive. Position sensitive query clauses will obviously be somewhat slower to highlight, but that is because they will be highlighted correctly rather than ignoring position.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org