You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ivan Vasilev <iv...@sirma.bg> on 2008/04/03 12:03:40 UTC

PhraseQuery little bug?

Hi Guys,

I make the following test – I create 2 files. File1.txt with content:
“apple 2 3 4 pear”
And File2.txt with content:
“pear 2 3 4 apple”

I made the following searching tests:
1. Using Luke Search tab.

1.1. When searching for:
content:"pear apple"~3
Then the File1.txt is returned.

1.2. When searching for:
content:"pear apple"~4
Result is the same – File1.txt

1.3. When searching for:
content:"pear apple"~5
Both File1.txt and File2.txt are returned.

2. Using simple app that uses the class PhraseQuery – the results are 
the same as in Test 1.
PhraseQuery pq = new PhraseQuery();
pq.add(new Term("content", "apple"));
pq.add(new Term("content", "pear"));

2.1. pq.setSlop(3);
Then the File1.txt is returned.

2.2. pq.setSlop(4);
Result is the same – File1.txt

2.3. pq.setSlop(5);
Both File1.txt and File2.txt are returned.

3. Using simple app that uses the class SpanNearQuery – the results are 
now different (on my opinion these are the correct results):
new SpanNearQuery(new SpanQuery[]{
new SpanTermQuery(new Term("content", "apple")),
new SpanTermQuery(new Term("content", "pear"))}, 3, false);

Both File1.txt and File2.txt are returned.
When changing the slop from 3 to 2 (in the constructor) – no results found.

When changing it to 4 – again 2 results are returned.
When changing the inOrder boolean (in the constructor) from false to 
true – then of course File2.txt is not returned under any other conditions.

Although this is not very important but I think PhraseQuery has a bit 
buggy behavior. When searching with slop equal to the number of words 
that are between the two searched words than the files that contain them 
IN THE SAME ORDER are returned.
When slop is with 2 greater – ranging search words, as well as, in 
between words – then THE ORDER of the searched words does not matter.

Best Regards,
Ivan


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: PhraseQuery little bug?

Posted by Ivan Vasilev <iv...@sirma.bg>.
Of cours in our system I can use SpanNearQuery instead of PhraseQuery.
My question is is there known performance differences between the two 
classes?



Ivan Vasilev wrote:
> Hi Guys,
>
> I make the following test – I create 2 files. File1.txt with content:
> “apple 2 3 4 pear”
> And File2.txt with content:
> “pear 2 3 4 apple”
>
> I made the following searching tests:
> 1. Using Luke Search tab.
>
> 1.1. When searching for:
> content:"pear apple"~3
> Then the File1.txt is returned.
>
> 1.2. When searching for:
> content:"pear apple"~4
> Result is the same – File1.txt
>
> 1.3. When searching for:
> content:"pear apple"~5
> Both File1.txt and File2.txt are returned.
>
> 2. Using simple app that uses the class PhraseQuery – the results are 
> the same as in Test 1.
> PhraseQuery pq = new PhraseQuery();
> pq.add(new Term("content", "apple"));
> pq.add(new Term("content", "pear"));
>
> 2.1. pq.setSlop(3);
> Then the File1.txt is returned.
>
> 2.2. pq.setSlop(4);
> Result is the same – File1.txt
>
> 2.3. pq.setSlop(5);
> Both File1.txt and File2.txt are returned.
>
> 3. Using simple app that uses the class SpanNearQuery – the results 
> are now different (on my opinion these are the correct results):
> new SpanNearQuery(new SpanQuery[]{
> new SpanTermQuery(new Term("content", "apple")),
> new SpanTermQuery(new Term("content", "pear"))}, 3, false);
>
> Both File1.txt and File2.txt are returned.
> When changing the slop from 3 to 2 (in the constructor) – no results 
> found.
>
> When changing it to 4 – again 2 results are returned.
> When changing the inOrder boolean (in the constructor) from false to 
> true – then of course File2.txt is not returned under any other 
> conditions.
>
> Although this is not very important but I think PhraseQuery has a bit 
> buggy behavior. When searching with slop equal to the number of words 
> that are between the two searched words than the files that contain 
> them IN THE SAME ORDER are returned.
> When slop is with 2 greater – ranging search words, as well as, in 
> between words – then THE ORDER of the searched words does not matter.
>
> Best Regards,
> Ivan
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> __________ NOD32 2996 (20080403) Information __________
>
> This message was checked by NOD32 antivirus system.
> http://www.eset.com
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: PhraseQuery little bug?

Posted by Ivan Vasilev <iv...@sirma.bg>.
In Lucene sytax in htis case ~5 means slop=5 - this is for Span queries.
I think the problem is that in the class PhraseQuery the slop that we 
set some times is interpreted as inclusive other times exclusive. When 
it is considered inclusive then the distance between "apple" and "pear" 
is 5, because the words "apple 2 3 4 pear" are 5. But when this slop is 
considered exclusive then we exclude "apple" and "pear" and the distance 
is 3.
I think htis is the problem - the class PhraseQuery considers the given 
slop inclusive when counting words forward and consithers it exclusive 
when counting backwards.


Darren Govoni wrote:
> One interpretation of the query with ~5 is that your text has 5 words
> and ~5 would imply a word in any position can match. Could it be this?
>
> ----- Original Message ----- From: "Ivan Vasilev" <iv...@sirma.bg>
> To: "LUCENE MAIL LIST" <ja...@lucene.apache.org>
> Sent: Thursday, April 03, 2008 6:03 AM
> Subject: PhraseQuery little bug?
>
>
>> Hi Guys,
>>
>> I make the following test – I create 2 files. File1.txt with content:
>> “apple 2 3 4 pear”
>> And File2.txt with content:
>> “pear 2 3 4 apple”
>>
>> I made the following searching tests:
>> 1. Using Luke Search tab.
>>
>> 1.1. When searching for:
>> content:"pear apple"~3
>> Then the File1.txt is returned.
>>
>> 1.2. When searching for:
>> content:"pear apple"~4
>> Result is the same – File1.txt
>>
>> 1.3. When searching for:
>> content:"pear apple"~5
>> Both File1.txt and File2.txt are returned.
>>
>> 2. Using simple app that uses the class PhraseQuery – the results are 
>> the same as in Test 1.
>> PhraseQuery pq = new PhraseQuery();
>> pq.add(new Term("content", "apple"));
>> pq.add(new Term("content", "pear"));
>>
>> 2.1. pq.setSlop(3);
>> Then the File1.txt is returned.
>>
>> 2.2. pq.setSlop(4);
>> Result is the same – File1.txt
>>
>> 2.3. pq.setSlop(5);
>> Both File1.txt and File2.txt are returned.
>>
>> 3. Using simple app that uses the class SpanNearQuery – the results 
>> are now different (on my opinion these are the correct results):
>> new SpanNearQuery(new SpanQuery[]{
>> new SpanTermQuery(new Term("content", "apple")),
>> new SpanTermQuery(new Term("content", "pear"))}, 3, false);
>>
>> Both File1.txt and File2.txt are returned.
>> When changing the slop from 3 to 2 (in the constructor) – no results 
>> found.
>>
>> When changing it to 4 – again 2 results are returned.
>> When changing the inOrder boolean (in the constructor) from false to 
>> true – then of course File2.txt is not returned under any other 
>> conditions.
>>
>> Although this is not very important but I think PhraseQuery has a bit 
>> buggy behavior. When searching with slop equal to the number of words 
>> that are between the two searched words than the files that contain 
>> them IN THE SAME ORDER are returned.
>> When slop is with 2 greater – ranging search words, as well as, in 
>> between words – then THE ORDER of the searched words does not matter.
>>
>> Best Regards,
>> Ivan
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> __________ NOD32 2998 (20080403) Information __________
>
> This message was checked by NOD32 antivirus system.
> http://www.eset.com
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: PhraseQuery little bug?

Posted by Darren Govoni <da...@ontrenet.com>.
One interpretation of the query with ~5 is that your text has 5 words
and ~5 would imply a word in any position can match. Could it be this?

----- Original Message ----- 
From: "Ivan Vasilev" <iv...@sirma.bg>
To: "LUCENE MAIL LIST" <ja...@lucene.apache.org>
Sent: Thursday, April 03, 2008 6:03 AM
Subject: PhraseQuery little bug?


> Hi Guys,
>
> I make the following test – I create 2 files. File1.txt with content:
> “apple 2 3 4 pear”
> And File2.txt with content:
> “pear 2 3 4 apple”
>
> I made the following searching tests:
> 1. Using Luke Search tab.
>
> 1.1. When searching for:
> content:"pear apple"~3
> Then the File1.txt is returned.
>
> 1.2. When searching for:
> content:"pear apple"~4
> Result is the same – File1.txt
>
> 1.3. When searching for:
> content:"pear apple"~5
> Both File1.txt and File2.txt are returned.
>
> 2. Using simple app that uses the class PhraseQuery – the results are the 
> same as in Test 1.
> PhraseQuery pq = new PhraseQuery();
> pq.add(new Term("content", "apple"));
> pq.add(new Term("content", "pear"));
>
> 2.1. pq.setSlop(3);
> Then the File1.txt is returned.
>
> 2.2. pq.setSlop(4);
> Result is the same – File1.txt
>
> 2.3. pq.setSlop(5);
> Both File1.txt and File2.txt are returned.
>
> 3. Using simple app that uses the class SpanNearQuery – the results are 
> now different (on my opinion these are the correct results):
> new SpanNearQuery(new SpanQuery[]{
> new SpanTermQuery(new Term("content", "apple")),
> new SpanTermQuery(new Term("content", "pear"))}, 3, false);
>
> Both File1.txt and File2.txt are returned.
> When changing the slop from 3 to 2 (in the constructor) – no results 
> found.
>
> When changing it to 4 – again 2 results are returned.
> When changing the inOrder boolean (in the constructor) from false to 
> true – then of course File2.txt is not returned under any other 
> conditions.
>
> Although this is not very important but I think PhraseQuery has a bit 
> buggy behavior. When searching with slop equal to the number of words that 
> are between the two searched words than the files that contain them IN THE 
> SAME ORDER are returned.
> When slop is with 2 greater – ranging search words, as well as, in between 
> words – then THE ORDER of the searched words does not matter.
>
> Best Regards,
> Ivan
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org