You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Trejkaz <tr...@trypticon.org> on 2011/12/06 05:42:22 UTC

SpanNearQuery and matching spans inside the first span

Supposing I have a document with just "hi there" as the text.

If I do a span query like this:

    near(near(term('hi'), term('there'), slop=0, forwards),
term('hi'), slop=1, any-direction)

that returns no hits.  However, if I do a span query like this:

    near(near(term('hi'), term('there'), slop=0, forwards),
term('there'), slop=1, any-direction)

that returns the document.

It seems that the rule is that if the two spans *start* at the same
position, then they are not considered "near" each other.  But from
the POV of a user (and from this developer) this is lop-sided because
in both situations, the second span was inside the first span.  It
seems like they should either both be considered hits, or both be
considered non-hits.

I am wondering what others think about this and whether there is any
way to manipulate/rewrite the query to get a more balanced-looking
result.

(I'm sure it gets particularly hairy, though, when your two spans
overlap only partially... is that "near" or not?)

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: SpanNearQuery and matching spans inside the first span

Posted by Ian Lea <ia...@gmail.com>.
Have you read http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/?
 Might help explain some of the behaviour you are seeing.


--
Ian.


On Tue, Dec 6, 2011 at 4:42 AM, Trejkaz <tr...@trypticon.org> wrote:
> Supposing I have a document with just "hi there" as the text.
>
> If I do a span query like this:
>
>    near(near(term('hi'), term('there'), slop=0, forwards),
> term('hi'), slop=1, any-direction)
>
> that returns no hits.  However, if I do a span query like this:
>
>    near(near(term('hi'), term('there'), slop=0, forwards),
> term('there'), slop=1, any-direction)
>
> that returns the document.
>
> It seems that the rule is that if the two spans *start* at the same
> position, then they are not considered "near" each other.  But from
> the POV of a user (and from this developer) this is lop-sided because
> in both situations, the second span was inside the first span.  It
> seems like they should either both be considered hits, or both be
> considered non-hits.
>
> I am wondering what others think about this and whether there is any
> way to manipulate/rewrite the query to get a more balanced-looking
> result.
>
> (I'm sure it gets particularly hairy, though, when your two spans
> overlap only partially... is that "near" or not?)
>
> TX
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org