You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Tavi Nathanson (JIRA)" <ji...@apache.org> on 2008/07/08 07:02:31 UTC

[jira] Commented: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

    [ https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611453#action_12611453 ] 

Tavi Nathanson commented on LUCENE-794:
---------------------------------------

Hey everyone,

I'm having some trouble getting SpanScorer to act the way I'd like for proper highlighting, and I'm wondering if anyone has any suggestions.

I have two fields: text_raw and text_stemmed. text_raw, as the name suggests, stores unstemmed (tokenized) text while text_stemmed stores stemmed (tokenized) text.

I have queries that look over both fields. For, example, I may have the query +(text_raw:"apple sauce" text_stemmed:orange). This query matches "apple sauce oranges" but it does not match "apples sauces orange" (because "apple sauce" is not stemmed). I'd like to be able to highlight accordingly: I want "apple," "sauce," and "oranges" to all be highlighted.

So, even though it is in fact the raw text that ends up getting highlighted, I'm looking for a way to build SpanScorer such that I don't need to limit myself to one field ("field" is one of the arguments to the constructor).

Thanks!

Tavi


> Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  ConstantScoreRangeQuery
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Other
>            Reporter: Mark Miller
>            Priority: Minor
>             Fix For: 2.3.2
>
>         Attachments: MultiPhraseQueryExtraction.patch, SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, SpanHighlighter-02-10-2008.patch, SpanHighlighter-RemovSysOut.patch, spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, spanhighlighter_patch_4.zip
>
>
> This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter package that scores just like QueryScorer, but scores a 0 for Terms that did not cause the Query hit. This gives 'actual' hit highlighting for the range of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are easy to add. There is also a new Fragmenter that attempts to fragment without breaking up Spans.
> See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
> There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org