You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by nettadalet <ns...@dalet.com> on 2020/12/24 14:35:22 UTC

Why do I get different results for the same query with two Solr versions?

Hello,

I have the the same field type defined in Solr 4.6 and Solr 7.5. When I
search with both versions, I get different results, and I don't know why

I have the following *field type definition in Solr 4.6*:
<fieldType name="text_type1" class="solr.TextField"
positionIncrementGap="1000">
	<analyzer type="index">
		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
		<filter class="solr.ASCIIFoldingFilterFactory" />
		<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
		<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="0"/>
		<filter class="solr.LowerCaseFilterFactory"/>
	</analyzer>
	<analyzer type="query">
		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
		<filter class="solr.ASCIIFoldingFilterFactory" />
		<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
		<filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                />
		<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0"/>
		<filter class="solr.LowerCaseFilterFactory"/>
	</analyzer>
</fieldType>


I have the following *field type definition in Solr 7.5*:
<fieldType name="text_type1" class="solr.TextField"
positionIncrementGap="1000">
	<analyzer type="index">
		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
		<filter class="solr.ASCIIFoldingFilterFactory" />
		<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
		<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="0"/>
		<filter class="solr.FlattenGraphFilterFactory"/>
		<filter class="solr.LowerCaseFilterFactory"/>
	</analyzer>
	<analyzer type="query">
		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
		<filter class="solr.ASCIIFoldingFilterFactory" />
		<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
		<filter class="solr.StopFilterFactory"
                                   ignoreCase="true"
                                   words="stopwords.txt"
                                       />
		<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0"/>
		<filter class="solr.LowerCaseFilterFactory"/>
	</analyzer>
</fieldType>

* I tried to use solr.WordDelimiterFilterFactory with Solr 7.5 instead of
solr.WordDelimiterGraphFilterFactory so the field types will be more alike,
but the result was the same.

I have the following *6 values set for field text1 of type text_type1 for 6
different documents* (the type(s) from above):
KI_d5e7b43a
KI_b7c490bd
KI_7df2f026
KI_fa7d129d
KI_5867aec7
KI_7c3c0b93


My query is *text1=KI_7*.
Using Solr 4.6, I get 2 result - KI_7df2f026, KI_7c3c0b93
Using Solr 7.5, I get all 6 results.

Questions:
1. How come I get different results with the same data, when my fields
definitions are the same (as far as I can tell)?

2. What are the expected results?
I think that the results Solr 7.5 returns are the correct ones, since at the
end of the of the analysis I get *KA* as a term and *7* as a term, both
during the indexing analysis and the query analysis, so, to my
understanding, all 6 results should be found.
Is this correct? if not, what am I missing? what don't I understand
correctly?

I would very much appreciate a full/partial answer, but even a link that
could explain at least the expected results part would be great. 

Thanks in advance, I know this might be a tough one to answer [Hope not  :)]



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Why do I get different results for the same query with two Solr versions?

Posted by nettadalet <ns...@dalet.com>.
Tulsi wrote
> Can you post the managed schema and solrconfig content here ?

Schema for the 4.6 index (I omitted all non-relevant data):
<schema name="ItemCodeIndex46_0_English" version="1.3">
	<types>
		<fieldType name="textgen-ai" class="solr.TextField"
positionIncrementGap="1000">
			<analyzer type="index">
				<tokenizer class="solr.WhitespaceTokenizerFactory"/>
				<filter class="solr.ASCIIFoldingFilterFactory"/>
				<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
				<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="0"/>
				<filter class="solr.LowerCaseFilterFactory"/>
			</analyzer>
			<analyzer type="query">
				<tokenizer class="solr.WhitespaceTokenizerFactory"/>
				<filter class="solr.ASCIIFoldingFilterFactory"/>
				<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
				<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
				<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0"/>
				<filter class="solr.LowerCaseFilterFactory"/>
			</analyzer>
		</fieldType>
	</types>
	<fields>
		<field name="TITLE_ItemCode_t" type="textgen-ai" indexed="true"
stored="true" omitNorms="false" multiValued="false" termVectors="true"
termPositions="true"/>
		<field name="TITLE_ItemCode_s" type="textgen-ai" indexed="true"
stored="true" omitNorms="false" multiValued="false" termVectors="false"
termPositions="false"/>
		<field name="TITLE_ItemCode_u" type="textgen-ai" indexed="true"
stored="true" omitNorms="false" multiValued="false" termVectors="true"
termPositions="true"/>
	</fields>
	<copyField source="TITLE_ItemCode_u" dest="TITLE_ItemCode_t"/>
	<copyField source="TITLE_ItemCode_u" dest="TITLE_ItemCode_s"/>
	<solrQueryParser defaultOperator="AND"/>
</schema>

Schema for the 7.5 index (I omitted all non-relevant data):
<schema name="ItemCodeIndex75_4_English" version="1.6">
	<types>
		<fieldType name="textgen-ai" class="solr.TextField"
positionIncrementGap="1000">
			<analyzer type="index">
				<tokenizer class="solr.WhitespaceTokenizerFactory"/>
				<filter class="solr.ASCIIFoldingFilterFactory"/>
				<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
				<filter class="solr.WordDelimiterGraphFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="1" splitOnCaseChange="0"/>
				<filter class="solr.FlattenGraphFilterFactory"/>
				<filter class="solr.LowerCaseFilterFactory"/>
			</analyzer>
			<analyzer type="query">
				<tokenizer class="solr.WhitespaceTokenizerFactory"/>
				<filter class="solr.ASCIIFoldingFilterFactory"/>
				<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
				<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
				<filter class="solr.WordDelimiterGraphFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"/>
				<filter class="solr.LowerCaseFilterFactory"/>
			</analyzer>
		</fieldType>
	</types>
	<fields>
		<field name="TITLE_ItemCode_t" type="textgen-ai" indexed="true"
stored="true" omitNorms="false" multiValued="false" termVectors="true"
termPositions="true"/>
		<field name="TITLE_ItemCode_s" type="textgen-ai" indexed="true"
stored="true" omitNorms="false" multiValued="false" termVectors="false"
termPositions="false"/>
		<field name="TITLE_ItemCode_u" type="textgen-ai" indexed="true"
stored="true" omitNorms="false" multiValued="false" termVectors="true"
termPositions="true"/>
	</fields>
	<copyField source="TITLE_ItemCode_u" dest="TITLE_ItemCode_t"/>
	<copyField source="TITLE_ItemCode_u" dest="TITLE_ItemCode_s"/>
</schema>

About the solrconfig.xml file - I don't think I can share it because it may
contain sensitive information. Is there something specific from this file
that may be relevant for our discussion?


Tulsi wrote
> Do try the solr admin analysis screen
> once as well to see the behaviour for this field.
> https://lucene.apache.org/solr/guide/7_6/index.html

I looked at the analysis screen, but it wasn't helpful. That's why I started
using the "debug=query" parameter and the content of parsedquery.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Why do I get different results for the same query with two Solr versions?

Posted by Tulsi Das <tu...@gmail.com>.
Can you post the managed schema and solrconfig content here ?

Do try the solr admin analysis screen
once as well to see the behaviour for this field.

https://lucene.apache.org/solr/guide/7_6/index.html

On Sun, 27 Dec, 2020, 6:54 pm nettadalet, <ns...@dalet.com> wrote:

> Thank you, that was helpful!
>
> For Solr 4.6 I get
> "parsedquery": "PhraseQuery(TITLE_ItemCode_t:\"ki 7\")"
>
> For Solr 7.5 I get
> "parsedquery":"+(+(TITLE_ItemCode_t:ki7 (+TITLE_ItemCode_t:ki
> +TITLE_ItemCode_t:7)))"
>
> So this is the cause of the difference in the search result, but I still
> don't know why the parsedquery is different between the two versions.
> Any idea/guess?
> Is it some internal implementation that changed sometime between 4.6 and
> 7.5?
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re:Re: Re:Re: Why do I get different results for the same query with two Solr versions?

Posted by xiefengchang <fe...@163.com>.


SOW default to false?
but this seems to be true right??
For Solr 7.5 I get
"parsedquery":"+(+(text1:ki7 (+text1:ki
+text1:7)))"














At 2020-12-28 01:13:29, "Tulsi Das" <tu...@gmail.com> wrote:
>Hi ,
>Yes this look like related to sow (split on whitespace) param default
>behaviour change in solr 7.
>
>The sow parameter (short for "Split on Whitespace") now defaults to
>false, which allows support for multi-word synonyms out of the box.
>This parameter is used with the eDismax and standard/"lucene" query
>parsers. If this parameter is not explicitly specified as true, query
>text will not be split on whitespace before analysis.
>
>https://lucene.apache.org/solr/guide/7_0/major-changes-in-solr-7.html
>
>
>On Sun, 27 Dec, 2020, 8:25 pm nettadalet, <ns...@dalet.com> wrote:
>
>> I added "defType=lucene" to both searches to make sure I use the same query
>> parser, but it didn't change the results.
>>
>>
>>
>> --
>> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>

Re: Re:Re: Why do I get different results for the same query with two Solr versions?

Posted by nettadalet <ns...@dalet.com>.
Hi,
thank for the comment, but I tried to use both "sow=false" and "saw=true"
and I still get the same result. For query (TITLE_ItemCode_t:KI_7) I still
see:
Solr 4.6: "parsedquery": "PhraseQuery(TITLE_ItemCode_t:\"ki 7\")"
Solr 7.5: "parsedquery":"+(+(TITLE_ItemCode_t:ki7 (+TITLE_ItemCode_t:ki
+TITLE_ItemCode_t:7)))"



Tulsi wrote
> Hi ,
> Yes this look like related to sow (split on whitespace) param default
> behaviour change in solr 7.
> 
> The sow parameter (short for "Split on Whitespace") now defaults to
> false, which allows support for multi-word synonyms out of the box.
> This parameter is used with the eDismax and standard/"lucene" query
> parsers. If this parameter is not explicitly specified as true, query
> text will not be split on whitespace before analysis.
> 
> https://lucene.apache.org/solr/guide/7_0/major-changes-in-solr-7.html
> 
> 
> On Sun, 27 Dec, 2020, 8:25 pm nettadalet, &lt;

> nsteinberg@

> &gt; wrote:
> 
>> I added "defType=lucene" to both searches to make sure I use the same
>> query
>> parser, but it didn't change the results.
>>
>>
>>
>> --
>> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>





--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Re:Re: Why do I get different results for the same query with two Solr versions?

Posted by Tulsi Das <tu...@gmail.com>.
Hi ,
Yes this look like related to sow (split on whitespace) param default
behaviour change in solr 7.

The sow parameter (short for "Split on Whitespace") now defaults to
false, which allows support for multi-word synonyms out of the box.
This parameter is used with the eDismax and standard/"lucene" query
parsers. If this parameter is not explicitly specified as true, query
text will not be split on whitespace before analysis.

https://lucene.apache.org/solr/guide/7_0/major-changes-in-solr-7.html


On Sun, 27 Dec, 2020, 8:25 pm nettadalet, <ns...@dalet.com> wrote:

> I added "defType=lucene" to both searches to make sure I use the same query
> parser, but it didn't change the results.
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: Re:Re: Why do I get different results for the same query with two Solr versions?

Posted by nettadalet <ns...@dalet.com>.
I added "defType=lucene" to both searches to make sure I use the same query
parser, but it didn't change the results.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Re:Re: Why do I get different results for the same query with two Solr versions?

Posted by nettadalet <ns...@dalet.com>.
I'm not sure how to check the implementation of the query parser, or how to
change the query parser that I use. I think I'm using the standard query
parser.

I use Solr Admin to run the queries. If I look at the URL, I see
Solr 4.6:
select?q=TITLE_ItemCode_t:KI_7&fl=TITLE_ItemCode_t
Solr 7.5:
select?q=TITLE_ItemCode_t:KI_7&fl=TITLE_ItemCode_t

Should I change something?
Where should I look?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re:Re: Why do I get different results for the same query with two Solr versions?

Posted by xiefengchang <fe...@163.com>.
which query parser are you using? I think to answer your question, you need to check the implementation of the query parser

















At 2020-12-27 21:23:59, "nettadalet" <ns...@dalet.com> wrote:
>Thank you, that was helpful!
>
>For Solr 4.6 I get 
>"parsedquery": "PhraseQuery(TITLE_ItemCode_t:\"ki 7\")"
>
>For Solr 7.5 I get
>"parsedquery":"+(+(TITLE_ItemCode_t:ki7 (+TITLE_ItemCode_t:ki
>+TITLE_ItemCode_t:7)))"
>
>So this is the cause of the difference in the search result, but I still
>don't know why the parsedquery is different between the two versions.
>Any idea/guess?
>Is it some internal implementation that changed sometime between 4.6 and
>7.5?
>
>
>
>--
>Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Why do I get different results for the same query with two Solr versions?

Posted by nettadalet <ns...@dalet.com>.
Thank you, that was helpful!

For Solr 4.6 I get 
"parsedquery": "PhraseQuery(TITLE_ItemCode_t:\"ki 7\")"

For Solr 7.5 I get
"parsedquery":"+(+(TITLE_ItemCode_t:ki7 (+TITLE_ItemCode_t:ki
+TITLE_ItemCode_t:7)))"

So this is the cause of the difference in the search result, but I still
don't know why the parsedquery is different between the two versions.
Any idea/guess?
Is it some internal implementation that changed sometime between 4.6 and
7.5?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Why do I get different results for the same query with two Solr versions?

Posted by Tulsi Das <tu...@gmail.com>.
Hi,
Try adding debug=true or debug=query in the url and see the formed query at
the end .
You will get to know why the results are different.


On Thu, 24 Dec, 2020, 8:05 pm nettadalet, <ns...@dalet.com> wrote:

> Hello,
>
> I have the the same field type defined in Solr 4.6 and Solr 7.5. When I
> search with both versions, I get different results, and I don't know why
>
> I have the following *field type definition in Solr 4.6*:
> <fieldType name="text_type1" class="solr.TextField"
> positionIncrementGap="1000">
>         <analyzer type="index">
>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                 <filter class="solr.ASCIIFoldingFilterFactory" />
>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>                 <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="0"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>         </analyzer>
>         <analyzer type="query">
>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                 <filter class="solr.ASCIIFoldingFilterFactory" />
>                 <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>                 <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords.txt"
>                 />
>                 <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>         </analyzer>
> </fieldType>
>
>
> I have the following *field type definition in Solr 7.5*:
> <fieldType name="text_type1" class="solr.TextField"
> positionIncrementGap="1000">
>         <analyzer type="index">
>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                 <filter class="solr.ASCIIFoldingFilterFactory" />
>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>                 <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="0"/>
>                 <filter class="solr.FlattenGraphFilterFactory"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>         </analyzer>
>         <analyzer type="query">
>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                 <filter class="solr.ASCIIFoldingFilterFactory" />
>                 <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>                 <filter class="solr.StopFilterFactory"
>                                    ignoreCase="true"
>                                    words="stopwords.txt"
>                                        />
>                 <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>         </analyzer>
> </fieldType>
>
> * I tried to use solr.WordDelimiterFilterFactory with Solr 7.5 instead of
> solr.WordDelimiterGraphFilterFactory so the field types will be more alike,
> but the result was the same.
>
> I have the following *6 values set for field text1 of type text_type1 for 6
> different documents* (the type(s) from above):
> KI_d5e7b43a
> KI_b7c490bd
> KI_7df2f026
> KI_fa7d129d
> KI_5867aec7
> KI_7c3c0b93
>
>
> My query is *text1=KI_7*.
> Using Solr 4.6, I get 2 result - KI_7df2f026, KI_7c3c0b93
> Using Solr 7.5, I get all 6 results.
>
> Questions:
> 1. How come I get different results with the same data, when my fields
> definitions are the same (as far as I can tell)?
>
> 2. What are the expected results?
> I think that the results Solr 7.5 returns are the correct ones, since at
> the
> end of the of the analysis I get *KA* as a term and *7* as a term, both
> during the indexing analysis and the query analysis, so, to my
> understanding, all 6 results should be found.
> Is this correct? if not, what am I missing? what don't I understand
> correctly?
>
> I would very much appreciate a full/partial answer, but even a link that
> could explain at least the expected results part would be great.
>
> Thanks in advance, I know this might be a tough one to answer [Hope not
> :)]
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>