You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Erik Hatcher <er...@ehatchersolutions.com> on 2005/07/01 12:15:26 UTC

Re: Does highlighter highlight phrases only?

On Jun 30, 2005, at 4:35 PM, markharw00d wrote:

> Hi Erik,
> Yes I was thinking that code could form the basis of a new  
> highlighter.
>
> I've just attached a QuerySpansExtractor to the bugzilla entry for  
> the new highlighter. This class produces Spans from queries other  
> than SpanXxxxQueries eg phrase, term and booleans.
> I'm thinking you can throw the text to be highligted  as a single  
> doc into a MemIndex , extracts the spans using the  
> QuerySpansExtractor and the  MemIndex's reader (need to expose a  
> getReader method on this - I'm working on it), then use some new  
> highlighting logic on the Spans.
>
> Sound reasonable?

I think so.

One minor issue... a SpanNearQuery is not entirely equal to a  
PhraseQuery when there is slop involved.  You have this:

     SpanNearQuery sp = new SpanNearQuery(clauses,query.getSlop 
(),false);

Here's a test from Lucene in Action that demonstrates:

public void testSpanNearQuery() throws Exception {
   SpanQuery[] quick_brown_dog =
       new SpanQuery[]{quick, brown, dog};
   SpanNearQuery snq =
       new SpanNearQuery(quick_brown_dog, 0, true);
   assertNoMatches(snq);
   snq = new SpanNearQuery(quick_brown_dog, 4, true);
   assertNoMatches(snq);
   snq = new SpanNearQuery(quick_brown_dog, 5, true);
   assertOnlyBrownFox(snq);

   // interesting - even a sloppy phrase query would require
   // more slop to match
   snq = new SpanNearQuery(new SpanQuery[]{lazy, fox}, 3, false);
   assertOnlyBrownFox(snq);

   PhraseQuery pq = new PhraseQuery();
   pq.add(new Term("f", "lazy"));
   pq.add(new Term("f", "fox"));
   pq.setSlop(4);
   assertNoMatches(pq);

   pq.setSlop(5);
   assertOnlyBrownFox(pq);
}

So to be entirely accurate, an offset will be needed to get  
SpanNearQuery to match PhraseQuery, though I have a feeling (I'm not  
thinking through the details at the moment) that there is an edge  
case or two that is not compatible.  A PhraseQuery with slop of 1,  
for example - can a SpanNearQuery be set up to match that exactly?  I  
don't think so... a PhraseQuery with slop of 1 cannot match in  
reverse order, only in order with an optional hole between terms.

But, I like the idea of highlighting spans by converting other query  
types to get the Spans.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org