You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by SOLR4189 <Kl...@yandex.ru> on 2017/08/04 07:02:31 UTC

Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

Hey all,
I need to upgrade from SOLR-4.10.3 to SOLR-6.5.1 in production environment. 
When I checked it in the test environment, I noticed the order of returned
docs for each query is different. The score has changed as well. I use same
similarity algorithm - OccapiBM25 as in previous version. Number of shards
and number of docs by shards also haven't changed.

Does it normal? 
What might be the causes for such behavior?

Regards.



--
View this message in context: http://lucene.472066.n3.nabble.com/Different-order-of-docs-between-SOLR-4-10-4-to-SOLR-6-5-1-tp4349021.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

Posted by Dave <ha...@gmail.com>.
Rebuild your index. It's just the safest way. 

On Aug 13, 2017, at 2:02 PM, SOLR4189 <Kl...@yandex.ru> wrote:

>> If you are changing things like WordDelimiterFilterFactory to the graph 
>> version, you'll definitely want to reindex
> 
> What does it mean "*want to reindex*"? If I change
> WordDelimiterFilterFactory to the graph and use IndexUpgrader is it mistake?
> Or changes will not be affected only?
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Different-order-of-docs-between-SOLR-4-10-4-to-SOLR-6-5-1-tp4349021p4350413.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

Posted by SOLR4189 <Kl...@yandex.ru>.
> If you are changing things like WordDelimiterFilterFactory to the graph 
> version, you'll definitely want to reindex

What does it mean "*want to reindex*"? If I change
WordDelimiterFilterFactory to the graph and use IndexUpgrader is it mistake?
Or changes will not be affected only?



--
View this message in context: http://lucene.472066.n3.nabble.com/Different-order-of-docs-between-SOLR-4-10-4-to-SOLR-6-5-1-tp4349021p4350413.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

Posted by Shawn Heisey <ap...@elyograg.org>.
On 8/11/2017 2:52 AM, SOLR4189 wrote:
> Yes, only because I'm seeing different results. 
>
> For example, changing *WordDelimiterFilterFactory *to
> *WordDelimiterGraphFilterFactory * can change order of docs? (
> http://lucene.apache.org/core//6_5_1/analyzers-common/index.html?deprecated-list.html
> <http://lucene.apache.org/core//6_5_1/analyzers-common/index.html?deprecated-list.html> 
> )

I can't say for sure, but if that difference changes what parts of your
query match or don't match, that is very likely to affect document scores.

> For building index I tried 2 ways: 1) Dataimport from SOLR-4 to SOLR-6 and
> 2) IndexUpgraderTool
> And in both ways order of docs is different.

If you are changing things like WordDelimiterFilterFactory to the graph
version, you'll definitely want to reindex.  The IndexUpgrader tool is
not a reindex.  If the Solr 4 index meets the requirements of having all
relevant fields stored, then doing a dataimport from 4 to 6 would be the
same as a reindex.

Thanks,
Shawn


Re: Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

Posted by SOLR4189 <Kl...@yandex.ru>.
Yes, only because I'm seeing different results. 

For example, changing *WordDelimiterFilterFactory *to
*WordDelimiterGraphFilterFactory * can change order of docs? (
http://lucene.apache.org/core//6_5_1/analyzers-common/index.html?deprecated-list.html
<http://lucene.apache.org/core//6_5_1/analyzers-common/index.html?deprecated-list.html> 
)

For building index I tried 2 ways: 1) Dataimport from SOLR-4 to SOLR-6 and
2) IndexUpgraderTool
And in both ways order of docs is different.



--
View this message in context: http://lucene.472066.n3.nabble.com/Different-order-of-docs-between-SOLR-4-10-4-to-SOLR-6-5-1-tp4349021p4350172.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

Posted by Erick Erickson <er...@gmail.com>.
In addition to Shawn's comments, deleted but not merged documents
alter the statistics used for scoring, so the only hope that the
scores are comparable would be on an optimized index. And note that I
would recommend optimizing _only_ for testing, don't use it in a
production system unless the index is static. I.e. if your pattern is
build once a day and optimize, optimizing is fine, but not on a
continuously changing index.

Best,
Erick

On Fri, Aug 4, 2017 at 5:52 AM, Shawn Heisey <ap...@elyograg.org> wrote:
> On 8/4/2017 1:02 AM, SOLR4189 wrote:
>> I need to upgrade from SOLR-4.10.3 to SOLR-6.5.1 in production environment.
>> When I checked it in the test environment, I noticed the order of returned docs for each query is different. The score has changed as well. I use same similarity algorithm - OccapiBM25 as in previous version. Number of shards and number of docs by shards also haven't changed.
>
> You're comparing versions released more than two years apart, and across
> two major version upgrades.
>
> Solr is an application built around Lucene.  The score calculation in
> Lucene is frequently tweaked, producing slightly different results even
> with identical data.  Over such a large version discrepancy, I would be
> very surprised if the order and the scores were the same.
>
> Is the index identical between the versions?  If the indexes were each
> built from scratch by their respective versions, rather than going
> through an index upgrade procedure, they are very likely NOT completely
> identical.  Text analysis components are also tweaked frequently, to fix
> bugs and improve behavior.
>
> If the shard hash ranges are not the same on the old and new versions,
> that could contribute to differences in scoring as well.
>
> Are you writing because you're seeing different results, or because you
> think the order you're seeing in the newer version is wrong?
>
> Thanks,
> Shawn
>

Re: Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

Posted by Shawn Heisey <ap...@elyograg.org>.
On 8/4/2017 1:02 AM, SOLR4189 wrote:
> I need to upgrade from SOLR-4.10.3 to SOLR-6.5.1 in production environment. 
> When I checked it in the test environment, I noticed the order of returned docs for each query is different. The score has changed as well. I use same similarity algorithm - OccapiBM25 as in previous version. Number of shards and number of docs by shards also haven't changed.

You're comparing versions released more than two years apart, and across
two major version upgrades.

Solr is an application built around Lucene.  The score calculation in
Lucene is frequently tweaked, producing slightly different results even
with identical data.  Over such a large version discrepancy, I would be
very surprised if the order and the scores were the same.

Is the index identical between the versions?  If the indexes were each
built from scratch by their respective versions, rather than going
through an index upgrade procedure, they are very likely NOT completely
identical.  Text analysis components are also tweaked frequently, to fix
bugs and improve behavior.

If the shard hash ranges are not the same on the old and new versions,
that could contribute to differences in scoring as well.

Are you writing because you're seeing different results, or because you
think the order you're seeing in the newer version is wrong?

Thanks,
Shawn