You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Yu Zhou <j_...@yahoo.com> on 2013/11/04 20:18:31 UTC

SpanNearQuery behaviour?

Hi,

We use SpanNearQueries intensively for proximity searching. However, we are confused by two different ways to use them. Could anybody explain in details what we can expect for nested and flatten SpanNearQueries?

We used to build nested SpanNearQueries. However, we found that using nested SpanNearQueries doesn't always work. We also tried to switch to flatten SpanNearQueries. Then we found out that it breaks in some other cases. Below, we're including some test cases for both scenarios.

Another observation is that some of those failed queries include repreating terms. Further we don't fully undertand the concept of span overlaps and how they impact searches, can you shed some light on this.

All examples below are slop=2, inOrder=false. And we are using Lucene 4.4.0.

Attached is a program that will show all cases described below.


-------------------------------------------

Examples:

1) nested queries:

context: an exact phrase of each query below is in a document

a) Failing case: KE : a b c d d b c e�

b) Failing case: KE : one ring to rule them all one ring to find them one ring to bring


2) flatten queries:

Context: a phrase of "Task Force on Teaching as a Profession" is in a document

a) Failing case: SU: Force Teaching Profession�

b) Working case: SU: Force on Teaching Profession

c) both above cases work on nested SpanNearQueries


3) a specific query is interesting and a bit confusing:�

context: It is a long query that there is an exact match in an document.�

TI: A New Congress Should Enforce Accountability Over Abstinence Only Programs Non Partisan Report Documents Bush Administration Abstinence Only Programs Running Amuck With Over One Billion in Taxpayer Dollars

a) Failing case: for nested query, it fails for the whole sentence
TI: A New Congress Should Enforce Accountability Over Abstinence Only Programs Non Partisan Report Documents Bush Administration Abstinence Only Programs Running Amuck With Over One Billion in Taxpayer Dollars

b) Working case: for nested query, it works for the up to 19 terms in the query
TI: A New Congress Should Enforce Accountability Over Abstinence Only Programs Non Partisan Report Documents Bush Administration Abstinence Only Programs

c) Failing case: for nested query, adding an term at the end of the 19 term query, it fails
TI: A New Congress Should Enforce Accountability Over Abstinence Only Programs Non Partisan Report Documents Bush Administration Abstinence Only Programs Running�

d) Working case: for nested query, adding an term at the beginning of the 19 term query, it works
TI: And A New Congress Should Enforce Accountability Over Abstinence Only Programs Non Partisan Report Documents Bush Administration Abstinence Only Programs

e) Working case: for nested query, adding an term at the beginning of the query, it works for the whole sentence
TI: And A New Congress Should Enforce Accountability Over Abstinence Only Programs Non Partisan Report Documents Bush Administration Abstinence Only Programs Running Amuck With Over One Billion in Taxpayer Dollars

f) Working case: for flatten structure, it works for the whole sentence
TI: A New Congress Should Enforce Accountability Over Abstinence Only Programs Non Partisan Report Documents Bush Administration Abstinence Only Programs Running Amuck With Over One Billion in Taxpayer Dollars


Thanks.

Jerry