You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by varun srivastava <va...@gmail.com> on 2012/12/24 19:53:58 UTC

SloppyPhraseScorer behavior change

Hi,
  Due to following bug fix
https://issues.apache.org/jira/browse/LUCENE-3215 observing a change
in behavior of SloppyPhraseScorer. I just wanted to
confirm my understanding with you all.

After solr 3.5 ( bug is fixed in 3.5), if there is a document "a b c d e",
then in solr 3.4 only query "a b" will match with document, but in solr 3.5
onwards, both  query "a b" and "b a" will match. Is it right ?


Thanks
Varun

Re: SloppyPhraseScorer behavior change

Posted by varun srivastava <va...@gmail.com>.

Moreover just checked .. autoGeneratePhraseQueries="true" is set for both
3.4 and 4.0 in my schema.

Thanks
Varun

On Fri, Jan 11, 2013 at 1:04 PM, varun srivastava <va...@gmail.com>wrote:

> Hi Jack,
>  Is this a new change done in solr 4.0 ? Seems autoGeneratePhraseQueries
> option is present from solr 3.1. Just wanted to confirm this is the
> difference causing change in behavior between 3.4 and 4.0.
>
>
> Thanks
> Varun
>
>
> On Mon, Dec 24, 2012 at 3:00 PM, Jack Krupansky <ja...@basetechnology.com>wrote:
>
>> Thanks. Sloppy phrase requires that the query terms be in a phrase, but
>> you don't have any quotes in your query.
>>
>> Depending on your schema field type you may be running into a change in
>> how auto-generated phrase queries are handled. It used to be that
>> apple0ipad would always be treated as the quoted phrase "apple 0 ipad", but
>> now that is only true if your field type has autoGeneratePhraseQueries=true
>> set. Now, if you don't have that option set, the term gets treated as
>> (apple OR 0 OR ipad), which is a lot looser than the exact phrase.
>>
>> Look at the new example schema for the "text_en_splitting" field type as
>> an example.
>>
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: varun srivastava
>> Sent: Monday, December 24, 2012 5:49 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: SloppyPhraseScorer behavior change
>>
>>
>> Hi Jack,
>> My query was simple /solr/select?query=ipad apple apple0ipad
>> and doc contained "apple ipad" .
>>
>> If you see the patch attached with the bug 3215 , you will find following
>> comment. I want to confirm whether the behaviour I am observing is in sync
>> with what the patch developer intended or its just some regression bug. In
>> solr 3.4 phrase order is honored, whereas in solr 4.0 phrase order is not
>> honored, i.e. "apple ipad" and "ipad apple" both treated as same.
>>
>>
>>
>> ""
>>
>> /**
>> +   * Score a candidate doc for all slop-valid position-combinations
>> (matches)
>> +   * encountered while traversing/hopping the PhrasePositions.
>> +   * <br> The score contribution of a match depends on the distance:
>> +   * <br> - highest score for distance=0 (exact match).
>> +   * <br> - score gets lower as distance gets higher.
>> +   * <br>Example: for query "a b"~2, a document "x a b a y" can be
>> scored twice:
>> +   * once for "a b" (distance=0), and once for "b a" (distance=2).
>> +   * <br>Possibly not all valid combinations are encountered, because
>> for efficiency
>> +   * we always propagate the least PhrasePosition. This allows to base on
>> +   * PriorityQueue and move forward faster.
>> +   * As result, for example, document "a b c b a"
>> +   * would score differently for queries "a b c"~4 and "c b a"~4,
>> although
>> +   * they really are equivalent.
>> +   * Similarly, for doc "a b c b a f g", query "c b"~2
>> +   * would get same score as "g f"~2, although "c b"~2 could be matched
>> twice.
>> +   * We may want to fix this in the future (currently not, for
>> performance reasons).
>> +   */
>>
>> ""
>>
>>
>>
>> On Mon, Dec 24, 2012 at 1:21 PM, Jack Krupansky <ja...@basetechnology.com>
>> **wrote:
>>
>>  Could you post the full query URL, so we can see exactly what your query
>>> was? Or, post the output of &debug=query, which will show us what Lucene
>>> query was generated.
>>>
>>> -- Jack Krupansky
>>>
>>> -----Original Message----- From: varun srivastava
>>> Sent: Monday, December 24, 2012 1:53 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: SloppyPhraseScorer behavior change
>>>
>>>
>>> Hi,
>>>  Due to following bug fix
>>> https://issues.apache.org/****jira/browse/LUCENE-3215<https://issues.apache.org/**jira/browse/LUCENE-3215>
>>> <https:**//issues.apache.org/jira/**browse/LUCENE-3215<https://issues.apache.org/jira/browse/LUCENE-3215>>observing
>>> a change
>>>
>>> in behavior of SloppyPhraseScorer. I just wanted to
>>> confirm my understanding with you all.
>>>
>>> After solr 3.5 ( bug is fixed in 3.5), if there is a document "a b c d
>>> e",
>>> then in solr 3.4 only query "a b" will match with document, but in solr
>>> 3.5
>>> onwards, both  query "a b" and "b a" will match. Is it right ?
>>>
>>>
>>> Thanks
>>> Varun
>>>
>>>
>>
>

Re: SloppyPhraseScorer behavior change

Posted by varun srivastava <va...@gmail.com>.

Hi Jack,
 Is this a new change done in solr 4.0 ? Seems autoGeneratePhraseQueries
option is present from solr 3.1. Just wanted to confirm this is the
difference causing change in behavior between 3.4 and 4.0.


Thanks
Varun

On Mon, Dec 24, 2012 at 3:00 PM, Jack Krupansky <ja...@basetechnology.com>wrote:

> Thanks. Sloppy phrase requires that the query terms be in a phrase, but
> you don't have any quotes in your query.
>
> Depending on your schema field type you may be running into a change in
> how auto-generated phrase queries are handled. It used to be that
> apple0ipad would always be treated as the quoted phrase "apple 0 ipad", but
> now that is only true if your field type has autoGeneratePhraseQueries=true
> set. Now, if you don't have that option set, the term gets treated as
> (apple OR 0 OR ipad), which is a lot looser than the exact phrase.
>
> Look at the new example schema for the "text_en_splitting" field type as
> an example.
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: varun srivastava
> Sent: Monday, December 24, 2012 5:49 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SloppyPhraseScorer behavior change
>
>
> Hi Jack,
> My query was simple /solr/select?query=ipad apple apple0ipad
> and doc contained "apple ipad" .
>
> If you see the patch attached with the bug 3215 , you will find following
> comment. I want to confirm whether the behaviour I am observing is in sync
> with what the patch developer intended or its just some regression bug. In
> solr 3.4 phrase order is honored, whereas in solr 4.0 phrase order is not
> honored, i.e. "apple ipad" and "ipad apple" both treated as same.
>
>
>
> ""
>
> /**
> +   * Score a candidate doc for all slop-valid position-combinations
> (matches)
> +   * encountered while traversing/hopping the PhrasePositions.
> +   * <br> The score contribution of a match depends on the distance:
> +   * <br> - highest score for distance=0 (exact match).
> +   * <br> - score gets lower as distance gets higher.
> +   * <br>Example: for query "a b"~2, a document "x a b a y" can be
> scored twice:
> +   * once for "a b" (distance=0), and once for "b a" (distance=2).
> +   * <br>Possibly not all valid combinations are encountered, because
> for efficiency
> +   * we always propagate the least PhrasePosition. This allows to base on
> +   * PriorityQueue and move forward faster.
> +   * As result, for example, document "a b c b a"
> +   * would score differently for queries "a b c"~4 and "c b a"~4, although
> +   * they really are equivalent.
> +   * Similarly, for doc "a b c b a f g", query "c b"~2
> +   * would get same score as "g f"~2, although "c b"~2 could be matched
> twice.
> +   * We may want to fix this in the future (currently not, for
> performance reasons).
> +   */
>
> ""
>
>
>
> On Mon, Dec 24, 2012 at 1:21 PM, Jack Krupansky <ja...@basetechnology.com>*
> *wrote:
>
>  Could you post the full query URL, so we can see exactly what your query
>> was? Or, post the output of &debug=query, which will show us what Lucene
>> query was generated.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: varun srivastava
>> Sent: Monday, December 24, 2012 1:53 PM
>> To: solr-user@lucene.apache.org
>> Subject: SloppyPhraseScorer behavior change
>>
>>
>> Hi,
>>  Due to following bug fix
>> https://issues.apache.org/****jira/browse/LUCENE-3215<https://issues.apache.org/**jira/browse/LUCENE-3215>
>> <https:**//issues.apache.org/jira/**browse/LUCENE-3215<https://issues.apache.org/jira/browse/LUCENE-3215>>observing
>> a change
>>
>> in behavior of SloppyPhraseScorer. I just wanted to
>> confirm my understanding with you all.
>>
>> After solr 3.5 ( bug is fixed in 3.5), if there is a document "a b c d e",
>> then in solr 3.4 only query "a b" will match with document, but in solr
>> 3.5
>> onwards, both  query "a b" and "b a" will match. Is it right ?
>>
>>
>> Thanks
>> Varun
>>
>>
>

Re: SloppyPhraseScorer behavior change

Posted by Jack Krupansky <ja...@basetechnology.com>.

Thanks. Sloppy phrase requires that the query terms be in a phrase, but you 
don't have any quotes in your query.

Depending on your schema field type you may be running into a change in how 
auto-generated phrase queries are handled. It used to be that apple0ipad 
would always be treated as the quoted phrase "apple 0 ipad", but now that is 
only true if your field type has autoGeneratePhraseQueries=true set. Now, if 
you don't have that option set, the term gets treated as (apple OR 0 OR 
ipad), which is a lot looser than the exact phrase.

Look at the new example schema for the "text_en_splitting" field type as an 
example.

-- Jack Krupansky

-----Original Message----- 
From: varun srivastava
Sent: Monday, December 24, 2012 5:49 PM
To: solr-user@lucene.apache.org
Subject: Re: SloppyPhraseScorer behavior change

Hi Jack,
My query was simple /solr/select?query=ipad apple apple0ipad
and doc contained "apple ipad" .

If you see the patch attached with the bug 3215 , you will find following
comment. I want to confirm whether the behaviour I am observing is in sync
with what the patch developer intended or its just some regression bug. In
solr 3.4 phrase order is honored, whereas in solr 4.0 phrase order is not
honored, i.e. "apple ipad" and "ipad apple" both treated as same.

""

/**
+   * Score a candidate doc for all slop-valid position-combinations 
(matches)
+   * encountered while traversing/hopping the PhrasePositions.
+   * <br> The score contribution of a match depends on the distance:
+   * <br> - highest score for distance=0 (exact match).
+   * <br> - score gets lower as distance gets higher.
+   * <br>Example: for query "a b"~2, a document "x a b a y" can be
scored twice:
+   * once for "a b" (distance=0), and once for "b a" (distance=2).
+   * <br>Possibly not all valid combinations are encountered, because
for efficiency
+   * we always propagate the least PhrasePosition. This allows to base on
+   * PriorityQueue and move forward faster.
+   * As result, for example, document "a b c b a"
+   * would score differently for queries "a b c"~4 and "c b a"~4, although
+   * they really are equivalent.
+   * Similarly, for doc "a b c b a f g", query "c b"~2
+   * would get same score as "g f"~2, although "c b"~2 could be matched 
twice.
+   * We may want to fix this in the future (currently not, for
performance reasons).
+   */

""

On Mon, Dec 24, 2012 at 1:21 PM, Jack Krupansky 
<ja...@basetechnology.com>wrote:

> Could you post the full query URL, so we can see exactly what your query
> was? Or, post the output of &debug=query, which will show us what Lucene
> query was generated.
>
> -- Jack Krupansky
>
> -----Original Message----- From: varun srivastava
> Sent: Monday, December 24, 2012 1:53 PM
> To: solr-user@lucene.apache.org
> Subject: SloppyPhraseScorer behavior change
>
>
> Hi,
>  Due to following bug fix
> https://issues.apache.org/**jira/browse/LUCENE-3215<https://issues.apache.org/jira/browse/LUCENE-3215>observing 
> a change
> in behavior of SloppyPhraseScorer. I just wanted to
> confirm my understanding with you all.
>
> After solr 3.5 ( bug is fixed in 3.5), if there is a document "a b c d e",
> then in solr 3.4 only query "a b" will match with document, but in solr 
> 3.5
> onwards, both  query "a b" and "b a" will match. Is it right ?
>
>
> Thanks
> Varun
>

Re: SloppyPhraseScorer behavior change

Posted by varun srivastava <va...@gmail.com>.

Hi Jack,
 My query was simple /solr/select?query=ipad apple apple0ipad
and doc contained "apple ipad" .

If you see the patch attached with the bug 3215 , you will find following
comment. I want to confirm whether the behaviour I am observing is in sync
with what the patch developer intended or its just some regression bug. In
solr 3.4 phrase order is honored, whereas in solr 4.0 phrase order is not
honored, i.e. "apple ipad" and "ipad apple" both treated as same.



""

 /**
+   * Score a candidate doc for all slop-valid position-combinations (matches)
+   * encountered while traversing/hopping the PhrasePositions.
+   * <br> The score contribution of a match depends on the distance:
+   * <br> - highest score for distance=0 (exact match).
+   * <br> - score gets lower as distance gets higher.
+   * <br>Example: for query "a b"~2, a document "x a b a y" can be
scored twice:
+   * once for "a b" (distance=0), and once for "b a" (distance=2).
+   * <br>Possibly not all valid combinations are encountered, because
for efficiency
+   * we always propagate the least PhrasePosition. This allows to base on
+   * PriorityQueue and move forward faster.
+   * As result, for example, document "a b c b a"
+   * would score differently for queries "a b c"~4 and "c b a"~4, although
+   * they really are equivalent.
+   * Similarly, for doc "a b c b a f g", query "c b"~2
+   * would get same score as "g f"~2, although "c b"~2 could be matched twice.
+   * We may want to fix this in the future (currently not, for
performance reasons).
+   */

""



On Mon, Dec 24, 2012 at 1:21 PM, Jack Krupansky <ja...@basetechnology.com>wrote:

> Could you post the full query URL, so we can see exactly what your query
> was? Or, post the output of &debug=query, which will show us what Lucene
> query was generated.
>
> -- Jack Krupansky
>
> -----Original Message----- From: varun srivastava
> Sent: Monday, December 24, 2012 1:53 PM
> To: solr-user@lucene.apache.org
> Subject: SloppyPhraseScorer behavior change
>
>
> Hi,
>  Due to following bug fix
> https://issues.apache.org/**jira/browse/LUCENE-3215<https://issues.apache.org/jira/browse/LUCENE-3215>observing a change
> in behavior of SloppyPhraseScorer. I just wanted to
> confirm my understanding with you all.
>
> After solr 3.5 ( bug is fixed in 3.5), if there is a document "a b c d e",
> then in solr 3.4 only query "a b" will match with document, but in solr 3.5
> onwards, both  query "a b" and "b a" will match. Is it right ?
>
>
> Thanks
> Varun
>

Re: SloppyPhraseScorer behavior change

Posted by Jack Krupansky <ja...@basetechnology.com>.

Could you post the full query URL, so we can see exactly what your query 
was? Or, post the output of &debug=query, which will show us what Lucene 
query was generated.

-- Jack Krupansky

-----Original Message----- 
From: varun srivastava
Sent: Monday, December 24, 2012 1:53 PM
To: solr-user@lucene.apache.org
Subject: SloppyPhraseScorer behavior change

Hi,
  Due to following bug fix
https://issues.apache.org/jira/browse/LUCENE-3215 observing a change
in behavior of SloppyPhraseScorer. I just wanted to
confirm my understanding with you all.

After solr 3.5 ( bug is fixed in 3.5), if there is a document "a b c d e",
then in solr 3.4 only query "a b" will match with document, but in solr 3.5
onwards, both  query "a b" and "b a" will match. Is it right ?

Thanks
Varun