You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Erik Hatcher <er...@ehatchersolutions.com> on 2005/11/13 02:51:44 UTC

Highlighter using spans

Mark et al,

I'm delving into highlighting text (per-field, like the recent topic  
on java-user) and need it to highlight the exact spots used in the  
query, not just all query terms as the current Highlighter does.  And  
I need it to be SpanQuery savvy.  Further, it seems the Spans  
capability is the best way to accomplish this.

At this point I don't need fragmenting - the requirement is  
highlighting the full text on a per-field basis.

Using Spans requires an IndexReader.  And because, of course, I'm  
only highlighting a single document at a time, the document id is  
needed to narrow down the Spans iteration.

This is about as far as I've gotten thus far.  I'm implementing this  
from scratch rather than trying to start with the current  
Highlighter, and hope to be able to merge this back into Highlighter  
in some way in the future.

I'm sending this to the list to solicit comments and suggestions on  
this approach.  Thoughts?

Thanks,
     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Highlighter using spans

Posted by Martin Haye <m1...@snyder-haye.com>.
I might mention that, if you're converting your query to spans anyway, you
could avoid running it twice (once over the document set, another time for
highlighting) by using it as a main query and recording the spans as you go
along. I'm not sure this is better, but it's what XTF uses.

--Martin


On 11/13/05, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>
> On 13 Nov 2005, at 16:34, Jason Calabrese wrote:
> > Adding Span and Phrase support tot the highlighter is something
> > that we've
> > been needing in my project for a while.
> >
> > Would adding support for spans also add support for phrases?
>
> The idea is to modify Highlighter to utilize the SpanQuery
> capability, via getSpans(), to produce the regions that match. This
> requires that all queries, except BooleanQuery, be converted to an
> equivalent SpanQuery. It's pretty simple to convert a PhraseQuery to
> a SpanQuery, and Mark's SpanExtractor does this very thing. (note
> that a PhraseQuery and SpanNearQuery have some edge cases where they
> are not entirely equivalent, when slop is involved). So yes, if the
> Highlighter supports SpanQuery's, it would be possible to highlight
> phrases accurately.
>
> Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Highlighter using spans

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On 13 Nov 2005, at 16:34, Jason Calabrese wrote:
> Adding Span and Phrase support tot the highlighter is something  
> that we've
> been needing in my project for a while.
>
> Would adding support for spans also add support for phrases?

The idea is to modify Highlighter to utilize the SpanQuery  
capability, via getSpans(), to produce the regions that match.  This  
requires that all queries, except BooleanQuery, be converted to an  
equivalent SpanQuery.  It's pretty simple to convert a PhraseQuery to  
a SpanQuery, and Mark's SpanExtractor does this very thing.  (note  
that a PhraseQuery and SpanNearQuery have some edge cases where they  
are not entirely equivalent, when slop is involved).  So yes, if the  
Highlighter supports SpanQuery's, it would be possible to highlight  
phrases accurately.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Highlighter using spans

Posted by Jason Calabrese <ma...@jasoncalabrese.com>.
Adding Span and Phrase support tot the highlighter is something that we've 
been needing in my project for a while.

Would adding support for spans also add support for phrases?

We would also like this to be used when generating fragments so we can create 
high quality snippets for search results when our users enter queries with 
quotes.

If someone was willing to add these features I'm sure we could help fund the 
development.  If anyone is interested in this please contact me off list.

--Jason

On Sunday 13 November 2005 10:49 am, Erik Hatcher wrote:
> On 13 Nov 2005, at 05:38, markharw00d wrote:
> > I posted what I thought would be the best approach to fixing this
> > here along with pointers to some existing code:
> >    http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2
>
> SpansExtractor, unfortunately, doesn't take the field to be
> highlighted into account.  Given a some text to highlight, it would
> seem generally desirable to only highlight a specific fields worth of
> spans.
>
> For my immediate needs, I only have to highlight the full text with
> no fragmenting and it must be field aware.  I was able to achieve
> this with just a bit of custom code that doesn't really apply
> generically.  It would be good, though, to have a general framework
> for converting a Query to a SpanQuery and also to extract field-
> specific Spans - though at this point my consulting code isn't
> general purpose enough yet.  Perhaps at some point in the future I
> can distill some of it into the Highlighter codebase though.
>
>      Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Highlighter using spans

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On 13 Nov 2005, at 05:38, markharw00d wrote:
> I posted what I thought would be the best approach to fixing this  
> here along with pointers to some existing code:
>    http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2

SpansExtractor, unfortunately, doesn't take the field to be  
highlighted into account.  Given a some text to highlight, it would  
seem generally desirable to only highlight a specific fields worth of  
spans.

For my immediate needs, I only have to highlight the full text with  
no fragmenting and it must be field aware.  I was able to achieve  
this with just a bit of custom code that doesn't really apply  
generically.  It would be good, though, to have a general framework  
for converting a Query to a SpanQuery and also to extract field- 
specific Spans - though at this point my consulting code isn't  
general purpose enough yet.  Perhaps at some point in the future I  
can distill some of it into the Highlighter codebase though.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Highlighter using spans

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On 13 Nov 2005, at 05:38, markharw00d wrote:
> Hi Erik,
> I posted what I thought would be the best approach to fixing this  
> here along with pointers to some existing code:
>    http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2

Mark - thanks for the refresher on where we stood with these ideas.

> >>and hope to be able to merge this back into Highlighter  in some  
> way in the future.
>
> The current highlighter has some stuff that may be generally useful  
> but the fragmenting code would certainly have to change to avoid  
> breaking span "runs".

At least in my case, fragmenting the text is not desired.  The entire  
text will be highlighted.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Highlighter using spans

Posted by markharw00d <ma...@yahoo.co.uk>.
Hi Erik,
I posted what I thought would be the best approach to fixing this here 
along with pointers to some existing code:
    http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2

Unfortunately I've been way too caught up in other work lately to 
implement this and this particular "itch" hasn't been high on my list of 
things for me to scratch.

 >>and hope to be able to merge this back into Highlighter  in some way 
in the future.

The current highlighter has some stuff that may be generally useful but 
the fragmenting code would certainly have to change to avoid breaking 
span "runs".

I see there was some discussion recently about "queries as filters" for 
non relevance-ranked criteria eg RangeQueries which seems to make sense. 
Once rewritten as filters these sets of criteria will not be as easily 
reverse-engineered to a set of terms for use in highlighting. I suspect 
most people won't want to highlight these non relevance-ranked terms 
anyway so they can probably be ignored for highlighting purposes?

Cheers,
Mark



Erik Hatcher wrote:

> Mark et al,
>
> I'm delving into highlighting text (per-field, like the recent topic  
> on java-user) and need it to highlight the exact spots used in the  
> query, not just all query terms as the current Highlighter does.  And  
> I need it to be SpanQuery savvy.  Further, it seems the Spans  
> capability is the best way to accomplish this.
>
> At this point I don't need fragmenting - the requirement is  
> highlighting the full text on a per-field basis.
>
> Using Spans requires an IndexReader.  And because, of course, I'm  
> only highlighting a single document at a time, the document id is  
> needed to narrow down the Spans iteration.
>
> This is about as far as I've gotten thus far.  I'm implementing this  
> from scratch rather than trying to start with the current  
> Highlighter, and hope to be able to merge this back into Highlighter  
> in some way in the future.
>
> I'm sending this to the list to solicit comments and suggestions on  
> this approach.  Thoughts?
>
> Thanks,
>     Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>



	
	
		
___________________________________________________________ 
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org