You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Sean O'Connor <se...@oconeco.com> on 2005/09/06 08:21:52 UTC
Hits document offset information? Span query or Surround?
I believe I have heard that Span queries provide some way to access
document offset information for their hits somehow. Does anyone know if
this is true, and if so, how I would go about it?
Alternatively (preferably actually) does the surround code from the SVN
development area have a way of returning offsets for the matching hits?
I believe the current highlighter code matches all query terms in a hit
document, not just those satisfying a query criteria. I need a more
precise way to access the hit term offsets. I am working on hit
highlighting, hit excepts and summaries, and compound queries (is this
called search vectors?). I am still working through the surround code in
dev. to see if that gives me the compound queries I need.
I am willing to spend a few days to work on implementing adding offsets
to the returned hits (or something similar) if this is not currently
available. It is something I need, even at the cost of search efficiency.
Thanks
Sean
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Hits document offset information? Span query or Surround? - thanks
Posted by Sean O'Connor <se...@oconeco.com>.
Thanks for the input. I am looking at the suggested links now. If I make
any progress I will return to see if any of my work would be appropriate
to contribute back.
Sean
Paul Elschot wrote:
>On Tuesday 06 September 2005 08:52, markharw00d wrote:
>
>
>> >>I believe I have heard that Span queries provide some way to access
>>document offset information for their hits somehow.
>>
>>See http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2
>>
>>Faithfully selecting extracts based *exactly* on query criteria will be
>>hard given complex queries eg with nested Boolean logic.
>>
>>The current highlighter matches based on ANY query terms found in the
>>provided doc text
>>The proposal above matches based on any spans/phrases/terms
>>
>>Both options still fail to take into account any boolean logic and show
>>the real basis for the match eg the query
>> (author:"Doug Cutting"AND title:"Lucene in Action") OR (author:Erik
>>AND author:Otis)
>>would still highlight references to "Doug Cutting" and "Lucene In
>>Action" for the LIA book, despite the fact that the match was actually
>>for Erik and Otis (the true authors).
>>For most people this is a problem they can live with.
>>
>>
>
>The person who solves that might also write a SpanAndQuery :)
>
>Regards,
>Paul Elschot
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Hits document offset information? Span query or Surround?
Posted by Paul Elschot <pa...@xs4all.nl>.
On Tuesday 06 September 2005 08:52, markharw00d wrote:
> >>I believe I have heard that Span queries provide some way to access
> document offset information for their hits somehow.
>
> See http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2
>
> Faithfully selecting extracts based *exactly* on query criteria will be
> hard given complex queries eg with nested Boolean logic.
>
> The current highlighter matches based on ANY query terms found in the
> provided doc text
> The proposal above matches based on any spans/phrases/terms
>
> Both options still fail to take into account any boolean logic and show
> the real basis for the match eg the query
> (author:"Doug Cutting"AND title:"Lucene in Action") OR (author:Erik
> AND author:Otis)
> would still highlight references to "Doug Cutting" and "Lucene In
> Action" for the LIA book, despite the fact that the match was actually
> for Erik and Otis (the true authors).
> For most people this is a problem they can live with.
The person who solves that might also write a SpanAndQuery :)
Regards,
Paul Elschot
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Hits document offset information? Span query or Surround?
Posted by markharw00d <ma...@yahoo.co.uk>.
>>I believe I have heard that Span queries provide some way to access
document offset information for their hits somehow.
See http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2
Faithfully selecting extracts based *exactly* on query criteria will be
hard given complex queries eg with nested Boolean logic.
The current highlighter matches based on ANY query terms found in the
provided doc text
The proposal above matches based on any spans/phrases/terms
Both options still fail to take into account any boolean logic and show
the real basis for the match eg the query
(author:"Doug Cutting"AND title:"Lucene in Action") OR (author:Erik
AND author:Otis)
would still highlight references to "Doug Cutting" and "Lucene In
Action" for the LIA book, despite the fact that the match was actually
for Erik and Otis (the true authors).
For most people this is a problem they can live with.
Cheers
Mark
___________________________________________________________
To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Hits document offset information? Span query or Surround?
Posted by Paul Elschot <pa...@xs4all.nl>.
On Tuesday 06 September 2005 08:21, Sean O'Connor wrote:
> I believe I have heard that Span queries provide some way to access
> document offset information for their hits somehow. Does anyone know if
> this is true, and if so, how I would go about it?
>
> Alternatively (preferably actually) does the surround code from the SVN
> development area have a way of returning offsets for the matching hits?
Using getSpans(reader) on the span query will provide the Spans that
match the query. A Spans iterates through begin/end offset pairs within
the matching docs. This is provided by Lucene.
>
> I believe the current highlighter code matches all query terms in a hit
> document, not just those satisfying a query criteria. I need a more
> precise way to access the hit term offsets. I am working on hit
> highlighting, hit excepts and summaries, and compound queries (is this
> called search vectors?). I am still working through the surround code in
> dev. to see if that gives me the compound queries I need.
>
> I am willing to spend a few days to work on implementing adding offsets
> to the returned hits (or something similar) if this is not currently
> available. It is something I need, even at the cost of search efficiency.
See also the thread on better highlighting that started on 25 August
and this:
http://issues.apache.org/bugzilla/show_bug.cgi?id=35518
Regards,
Paul Elschot
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org