You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Erik Hatcher <er...@ehatchersolutions.com> on 2005/11/13 02:51:44 UTC
Highlighter using spans
Mark et al,
I'm delving into highlighting text (per-field, like the recent topic
on java-user) and need it to highlight the exact spots used in the
query, not just all query terms as the current Highlighter does. And
I need it to be SpanQuery savvy. Further, it seems the Spans
capability is the best way to accomplish this.
At this point I don't need fragmenting - the requirement is
highlighting the full text on a per-field basis.
Using Spans requires an IndexReader. And because, of course, I'm
only highlighting a single document at a time, the document id is
needed to narrow down the Spans iteration.
This is about as far as I've gotten thus far. I'm implementing this
from scratch rather than trying to start with the current
Highlighter, and hope to be able to merge this back into Highlighter
in some way in the future.
I'm sending this to the list to solicit comments and suggestions on
this approach. Thoughts?
Thanks,
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Highlighter using spans
Posted by Martin Haye <m1...@snyder-haye.com>.
I might mention that, if you're converting your query to spans anyway, you
could avoid running it twice (once over the document set, another time for
highlighting) by using it as a main query and recording the spans as you go
along. I'm not sure this is better, but it's what XTF uses.
--Martin
On 11/13/05, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>
> On 13 Nov 2005, at 16:34, Jason Calabrese wrote:
> > Adding Span and Phrase support tot the highlighter is something
> > that we've
> > been needing in my project for a while.
> >
> > Would adding support for spans also add support for phrases?
>
> The idea is to modify Highlighter to utilize the SpanQuery
> capability, via getSpans(), to produce the regions that match. This
> requires that all queries, except BooleanQuery, be converted to an
> equivalent SpanQuery. It's pretty simple to convert a PhraseQuery to
> a SpanQuery, and Mark's SpanExtractor does this very thing. (note
> that a PhraseQuery and SpanNearQuery have some edge cases where they
> are not entirely equivalent, when slop is involved). So yes, if the
> Highlighter supports SpanQuery's, it would be possible to highlight
> phrases accurately.
>
> Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
Re: Highlighter using spans
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On 13 Nov 2005, at 16:34, Jason Calabrese wrote:
> Adding Span and Phrase support tot the highlighter is something
> that we've
> been needing in my project for a while.
>
> Would adding support for spans also add support for phrases?
The idea is to modify Highlighter to utilize the SpanQuery
capability, via getSpans(), to produce the regions that match. This
requires that all queries, except BooleanQuery, be converted to an
equivalent SpanQuery. It's pretty simple to convert a PhraseQuery to
a SpanQuery, and Mark's SpanExtractor does this very thing. (note
that a PhraseQuery and SpanNearQuery have some edge cases where they
are not entirely equivalent, when slop is involved). So yes, if the
Highlighter supports SpanQuery's, it would be possible to highlight
phrases accurately.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Highlighter using spans
Posted by Jason Calabrese <ma...@jasoncalabrese.com>.
Adding Span and Phrase support tot the highlighter is something that we've
been needing in my project for a while.
Would adding support for spans also add support for phrases?
We would also like this to be used when generating fragments so we can create
high quality snippets for search results when our users enter queries with
quotes.
If someone was willing to add these features I'm sure we could help fund the
development. If anyone is interested in this please contact me off list.
--Jason
On Sunday 13 November 2005 10:49 am, Erik Hatcher wrote:
> On 13 Nov 2005, at 05:38, markharw00d wrote:
> > I posted what I thought would be the best approach to fixing this
> > here along with pointers to some existing code:
> > http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2
>
> SpansExtractor, unfortunately, doesn't take the field to be
> highlighted into account. Given a some text to highlight, it would
> seem generally desirable to only highlight a specific fields worth of
> spans.
>
> For my immediate needs, I only have to highlight the full text with
> no fragmenting and it must be field aware. I was able to achieve
> this with just a bit of custom code that doesn't really apply
> generically. It would be good, though, to have a general framework
> for converting a Query to a SpanQuery and also to extract field-
> specific Spans - though at this point my consulting code isn't
> general purpose enough yet. Perhaps at some point in the future I
> can distill some of it into the Highlighter codebase though.
>
> Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Highlighter using spans
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On 13 Nov 2005, at 05:38, markharw00d wrote:
> I posted what I thought would be the best approach to fixing this
> here along with pointers to some existing code:
> http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2
SpansExtractor, unfortunately, doesn't take the field to be
highlighted into account. Given a some text to highlight, it would
seem generally desirable to only highlight a specific fields worth of
spans.
For my immediate needs, I only have to highlight the full text with
no fragmenting and it must be field aware. I was able to achieve
this with just a bit of custom code that doesn't really apply
generically. It would be good, though, to have a general framework
for converting a Query to a SpanQuery and also to extract field-
specific Spans - though at this point my consulting code isn't
general purpose enough yet. Perhaps at some point in the future I
can distill some of it into the Highlighter codebase though.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Highlighter using spans
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On 13 Nov 2005, at 05:38, markharw00d wrote:
> Hi Erik,
> I posted what I thought would be the best approach to fixing this
> here along with pointers to some existing code:
> http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2
Mark - thanks for the refresher on where we stood with these ideas.
> >>and hope to be able to merge this back into Highlighter in some
> way in the future.
>
> The current highlighter has some stuff that may be generally useful
> but the fragmenting code would certainly have to change to avoid
> breaking span "runs".
At least in my case, fragmenting the text is not desired. The entire
text will be highlighted.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Highlighter using spans
Posted by markharw00d <ma...@yahoo.co.uk>.
Hi Erik,
I posted what I thought would be the best approach to fixing this here
along with pointers to some existing code:
http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2
Unfortunately I've been way too caught up in other work lately to
implement this and this particular "itch" hasn't been high on my list of
things for me to scratch.
>>and hope to be able to merge this back into Highlighter in some way
in the future.
The current highlighter has some stuff that may be generally useful but
the fragmenting code would certainly have to change to avoid breaking
span "runs".
I see there was some discussion recently about "queries as filters" for
non relevance-ranked criteria eg RangeQueries which seems to make sense.
Once rewritten as filters these sets of criteria will not be as easily
reverse-engineered to a set of terms for use in highlighting. I suspect
most people won't want to highlight these non relevance-ranked terms
anyway so they can probably be ignored for highlighting purposes?
Cheers,
Mark
Erik Hatcher wrote:
> Mark et al,
>
> I'm delving into highlighting text (per-field, like the recent topic
> on java-user) and need it to highlight the exact spots used in the
> query, not just all query terms as the current Highlighter does. And
> I need it to be SpanQuery savvy. Further, it seems the Spans
> capability is the best way to accomplish this.
>
> At this point I don't need fragmenting - the requirement is
> highlighting the full text on a per-field basis.
>
> Using Spans requires an IndexReader. And because, of course, I'm
> only highlighting a single document at a time, the document id is
> needed to narrow down the Spans iteration.
>
> This is about as far as I've gotten thus far. I'm implementing this
> from scratch rather than trying to start with the current
> Highlighter, and hope to be able to merge this back into Highlighter
> in some way in the future.
>
> I'm sending this to the list to solicit comments and suggestions on
> this approach. Thoughts?
>
> Thanks,
> Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
___________________________________________________________
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org