You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Ruslan Sivak <rs...@istandfor.com> on 2006/12/07 22:57:17 UTC

non-overlapping Span queries

I see back in Jul 2005 there was a thread about SpanNearQueries which 
were overlapping.

http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200507.mbox/%3C200507052054.54859.paul.elschot@xs4all.nl%3E

A fix was posted by Paul Elschot at that time.  Did this fix ever make 
it into 2.0?  I'm having problems with SpanNearQuerie's matching the 
same thing again, example

I'm searching for ((Brooklyn) near (Brooklyn near NY slop 0) slop 10) 
and it matches Brooklyn, NY  I want it to not match unless the phrase is 
something like

Brooklyn High which is in Brooklyn, NY

Russ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: non-overlapping Span queries

Posted by Ruslan Sivak <rs...@istandfor.com>.
Paul Elschot wrote:
> On Thursday 07 December 2006 22:57, Ruslan Sivak wrote:
>   
>> I see back in Jul 2005 there was a thread about SpanNearQueries which 
>> were overlapping.
>>
>>
>>     
> http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200507.mbox/%3C200507052054.54859.paul.elschot@xs4all.nl%3E
>   
>> A fix was posted by Paul Elschot at that time.  Did this fix ever make 
>> it into 2.0?  I'm having problems with SpanNearQuerie's matching the 
>> same thing again, example
>>     
>
>   
>> I'm searching for ((Brooklyn) near (Brooklyn near NY slop 0) slop 10) 
>> and it matches Brooklyn, NY  I want it to not match unless the phrase is 
>> something like
>>
>> Brooklyn High which is in Brooklyn, NY
>>     
>  
> That requires a minimum distance between the matches of the
> subqueries, and that is not yet implemented.
>
> The previous fix is in the trunk:
> http://issues.apache.org/jira/browse/LUCENE-569
>
> You can try the svn head, or a nightly build instead of 2.0:
> http://cvs.apache.org/dist/lucene/java/nightly/
> but a minimum span distance facility is not in there afaik.
>
> Regards,
> Paul Elschot
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>   
I tried the latest nightly build, but it still doesn't help.  BTW i was 
hitting the same error before as the one mentioned in the original 
thread, so perhaps the fix is not yet complete?  I was getting an error 
when doing this kind of query:

(brooklyn) near (brooklyn) slop 10

Russ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: non-overlapping Span queries

Posted by Paul Elschot <pa...@xs4all.nl>.
On Friday 08 December 2006 00:09, Chris Hostetter wrote:
> 
> : > Brooklyn High which is in Brooklyn, NY
> :
> : That requires a minimum distance between the matches of the
> : subqueries, and that is not yet implemented.
> 
> I was about to suggest that adding that seems like it would be fairly
> easy, just add a new "int minDistance" to SpanNearQuery and then use it in
> NearSpansOrdered.docSpansOrdered to ensure that "end1 + minDistance <=
> start2" and in NearSpansUnordered.atMatch to test that "min.end() +
> minDistance <= max.start()" ... but then it orruced to me that the whole
> issue isn't thatsimple when you have a SpanNearQuery with more then two
> clauses.

It can be as simple as you suggest. Iirc I implemented the ordered case 
initially like you're suggesting with minDistance == 0 at Lucene issue 413 
(see also the comments at Lucene issue 569),
http://issues.apache.org/jira/browse/LUCENE-413 in
NearSpansOrdered.java there.

Ruslan, chances are that the 413 version works in the way you need,
but only for the ordered case. When you also need the non ordered case,
you can simply combine (Boolean Or / SpanOr) both possible orders.

> 
> I'm not even sure what a three clause SpanNearQuery with a miDistance of N
> would even mean .. is that the min distance between each clause, or
> between the outer most?
> 
> Paul: you under stand Span queries a lot better then i do: if you had a
> two clause SpanNear would my suggestion make sense?
> 
> we could allways add minDistance to SpanNearQuery, but make it private
> only only setable from a new constructor that explicitly only takes in two
> SpanQuery clauses (instead of an array).

Basically there are two independent ways in which spans can match:
overlapping / non overlapping, and ordered / non ordered.
In the current trunk the overlapping ordered and non ordered cases 
are implemented.
At Lucene issue 413 there is an implementation of the non overlapping
ordered case. That leaves the non overlapping non ordered case
to be implemented.

When there is no overlap, a minimum distance between the matching spans
makes sense. With overlap, one might try and define some negative distance
as the overlap, but I can't think of any real life cases for that.

At the moment I don't recall the details of the maximally allowed slop,
I have not yet looked at the code again.
Ideally the overlaps, distances and slops would be taken into
one minimum and one maximum to be passed to the constructor.

One simple way out of this would be to have non ordered span queries
with overlap, and to have ordered span queries without overlap.
This could be done by replacing the trunk NearSpansOrdered.java
by the one at Lucene issue 413.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: non-overlapping Span queries

Posted by Chris Hostetter <ho...@fucit.org>.
: > Brooklyn High which is in Brooklyn, NY
:
: That requires a minimum distance between the matches of the
: subqueries, and that is not yet implemented.

I was about to suggest that adding that seems like it would be fairly
easy, just add a new "int minDistance" to SpanNearQuery and then use it in
NearSpansOrdered.docSpansOrdered to ensure that "end1 + minDistance <=
start2" and in NearSpansUnordered.atMatch to test that "min.end() +
minDistance <= max.start()" ... but then it orruced to me that the whole
issue isn't thatsimple when you have a SpanNearQuery with more then two
clauses.

I'm not even sure what a three clause SpanNearQuery with a miDistance of N
would even mean .. is that the min distance between each clause, or
between the outer most?

Paul: you under stand Span queries a lot better then i do: if you had a
two clause SpanNear would my suggestion make sense?

we could allways add minDistance to SpanNearQuery, but make it private
only only setable from a new constructor that explicitly only takes in two
SpanQuery clauses (instead of an array).


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: non-overlapping Span queries

Posted by Paul Elschot <pa...@xs4all.nl>.
On Thursday 07 December 2006 22:57, Ruslan Sivak wrote:
> I see back in Jul 2005 there was a thread about SpanNearQueries which 
> were overlapping.
> 
> 
http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200507.mbox/%3C200507052054.54859.paul.elschot@xs4all.nl%3E
> 
> A fix was posted by Paul Elschot at that time.  Did this fix ever make 
> it into 2.0?  I'm having problems with SpanNearQuerie's matching the 
> same thing again, example

> I'm searching for ((Brooklyn) near (Brooklyn near NY slop 0) slop 10) 
> and it matches Brooklyn, NY  I want it to not match unless the phrase is 
> something like
> 
> Brooklyn High which is in Brooklyn, NY
 
That requires a minimum distance between the matches of the
subqueries, and that is not yet implemented.

The previous fix is in the trunk:
http://issues.apache.org/jira/browse/LUCENE-569

You can try the svn head, or a nightly build instead of 2.0:
http://cvs.apache.org/dist/lucene/java/nightly/
but a minimum span distance facility is not in there afaik.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org