You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Brian Narsi <bn...@gmail.com> on 2016/01/15 14:34:59 UTC

Query results change

We have an index of 25 fields. Currently number of records in index is
about 120,000. We are using

parser: edismax

qf: contains 8 fields

fq: 1 field

mm = 1

qs = 6

pf: containing g 3 fields

bf: containing 1 field

We have noticed that sometimes results change between two searches even if
everything is constant.

What we have identified is if we reindex data and optimize it remedies the
situation.

Is that expected behavior? Or should we also look into other factors?

Thanks

Re: Query results change

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Mon, 2016-01-25 at 20:38 -0700, Shawn Heisey wrote:
> Very likely what's happening is that sometimes your shards are
> responding on a different timescale with each request, so the pieces
> that get combined into the final result set arrive in a different
> order.  This causes the Java object containing the results to get
> populated in a different order.

But is should not. Deterministic sort order is essential for paging.

Standard score-based sorting uses the shard-ID as tie breaker. If I am
not mistaken, that happens in the MergeSortQueue in the TopDocs?

- Toke Eskildsen, State and University Library, Denmark



Re: Query results change

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/25/2016 7:47 PM, Brian Narsi wrote:
> We have increased the number of documents in the SolrCloud collection to
> several million now and are seeing the "issue" again:
>
> If there are 10 documents each with exactly the same highest score and we
> run the query again and again, the order of documents changes. So strictly
> speaking although all documents are equally relevant, it will be very nice
> if the order can stay the same so that users are confident about query
> results.
>
> How can we make sure that the order does not change when the query is run
> again and again for documents that are equally relevant (i.e. their score
> is exactly the same)?

Very likely what's happening is that sometimes your shards are
responding on a different timescale with each request, so the pieces
that get combined into the final result set arrive in a different
order.  This causes the Java object containing the results to get
populated in a different order.

If you absolutely require a deterministic order when the score is the
same, then you must supply a secondary sort parameter, to break ties. 
It sounds like you are doing the default relevance sorting (no sort
parameter at all), so you would need something like this:

sort=score desc,id asc

Thanks,
Shawn


Re: Query results change

Posted by Brian Narsi <bn...@gmail.com>.
We have increased the number of documents in the SolrCloud collection to
several million now and are seeing the "issue" again:

If there are 10 documents each with exactly the same highest score and we
run the query again and again, the order of documents changes. So strictly
speaking although all documents are equally relevant, it will be very nice
if the order can stay the same so that users are confident about query
results.

How can we make sure that the order does not change when the query is run
again and again for documents that are equally relevant (i.e. their score
is exactly the same)?

Thanks

On Fri, Jan 15, 2016 at 3:12 PM, Brian Narsi <bn...@gmail.com> wrote:

> Data is indexed using Data Import Handler with clean=true, commit=true and
> optimize=true. After that there are no updates or delete.
>
> The setup is SolrCloud with 2 shards and 2 replicas each.
>
> If the data and query has not changed, one expects to see the same results
> on repeated searches; so it is a matter of users confidence in search
> results.
>
> Thanks
>
> On Fri, Jan 15, 2016 at 10:12 AM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> Probably the fact that information from deleted/updated
>> documents is still hanging around in the corpus until
>> merged away.
>>
>> The nub of the issue is that terms in deleted documents
>> (or the replaced doc if you update) still influence tf/idf
>> calculations. If you optimize as Binoy suggests, all of
>> the information relating to deleted docs is removed.
>>
>> If this is a SolrCloud setup, you can be getting
>> scores from different replicas of the same shard. Due to
>> the fact that merging (which purges deleted information)
>> can occur at different times on different replicas, the scores
>> calculated for a particular doc might be different depending
>> on which replica calculated it.
>>
>> In either setup (SolrCloud or not), background merging can
>> change the result order by removing information associated
>> with deleted docs.
>>
>> All that said, does this have _practical_ consequences or
>> is this mostly a curiosity question?
>>
>> Best,
>> Erick
>>
>> On Fri, Jan 15, 2016 at 5:40 AM, Binoy Dalal <bi...@gmail.com>
>> wrote:
>> > You should try debugging such queries to see how exactly they're being
>> > executed.
>> > That will give you an idea as to why you're seeing the results you see.
>> >
>> > On Fri, 15 Jan 2016, 19:05 Brian Narsi <bn...@gmail.com> wrote:
>> >
>> >> We have an index of 25 fields. Currently number of records in index is
>> >> about 120,000. We are using
>> >>
>> >> parser: edismax
>> >>
>> >> qf: contains 8 fields
>> >>
>> >> fq: 1 field
>> >>
>> >> mm = 1
>> >>
>> >> qs = 6
>> >>
>> >> pf: containing g 3 fields
>> >>
>> >> bf: containing 1 field
>> >>
>> >> We have noticed that sometimes results change between two searches
>> even if
>> >> everything is constant.
>> >>
>> >> What we have identified is if we reindex data and optimize it remedies
>> the
>> >> situation.
>> >>
>> >> Is that expected behavior? Or should we also look into other factors?
>> >>
>> >> Thanks
>> >>
>> > --
>> > Regards,
>> > Binoy Dalal
>>
>
>

Re: Query results change

Posted by Brian Narsi <bn...@gmail.com>.
Data is indexed using Data Import Handler with clean=true, commit=true and
optimize=true. After that there are no updates or delete.

The setup is SolrCloud with 2 shards and 2 replicas each.

If the data and query has not changed, one expects to see the same results
on repeated searches; so it is a matter of users confidence in search
results.

Thanks

On Fri, Jan 15, 2016 at 10:12 AM, Erick Erickson <er...@gmail.com>
wrote:

> Probably the fact that information from deleted/updated
> documents is still hanging around in the corpus until
> merged away.
>
> The nub of the issue is that terms in deleted documents
> (or the replaced doc if you update) still influence tf/idf
> calculations. If you optimize as Binoy suggests, all of
> the information relating to deleted docs is removed.
>
> If this is a SolrCloud setup, you can be getting
> scores from different replicas of the same shard. Due to
> the fact that merging (which purges deleted information)
> can occur at different times on different replicas, the scores
> calculated for a particular doc might be different depending
> on which replica calculated it.
>
> In either setup (SolrCloud or not), background merging can
> change the result order by removing information associated
> with deleted docs.
>
> All that said, does this have _practical_ consequences or
> is this mostly a curiosity question?
>
> Best,
> Erick
>
> On Fri, Jan 15, 2016 at 5:40 AM, Binoy Dalal <bi...@gmail.com>
> wrote:
> > You should try debugging such queries to see how exactly they're being
> > executed.
> > That will give you an idea as to why you're seeing the results you see.
> >
> > On Fri, 15 Jan 2016, 19:05 Brian Narsi <bn...@gmail.com> wrote:
> >
> >> We have an index of 25 fields. Currently number of records in index is
> >> about 120,000. We are using
> >>
> >> parser: edismax
> >>
> >> qf: contains 8 fields
> >>
> >> fq: 1 field
> >>
> >> mm = 1
> >>
> >> qs = 6
> >>
> >> pf: containing g 3 fields
> >>
> >> bf: containing 1 field
> >>
> >> We have noticed that sometimes results change between two searches even
> if
> >> everything is constant.
> >>
> >> What we have identified is if we reindex data and optimize it remedies
> the
> >> situation.
> >>
> >> Is that expected behavior? Or should we also look into other factors?
> >>
> >> Thanks
> >>
> > --
> > Regards,
> > Binoy Dalal
>

Re: Query results change

Posted by Erick Erickson <er...@gmail.com>.
Probably the fact that information from deleted/updated
documents is still hanging around in the corpus until
merged away.

The nub of the issue is that terms in deleted documents
(or the replaced doc if you update) still influence tf/idf
calculations. If you optimize as Binoy suggests, all of
the information relating to deleted docs is removed.

If this is a SolrCloud setup, you can be getting
scores from different replicas of the same shard. Due to
the fact that merging (which purges deleted information)
can occur at different times on different replicas, the scores
calculated for a particular doc might be different depending
on which replica calculated it.

In either setup (SolrCloud or not), background merging can
change the result order by removing information associated
with deleted docs.

All that said, does this have _practical_ consequences or
is this mostly a curiosity question?

Best,
Erick

On Fri, Jan 15, 2016 at 5:40 AM, Binoy Dalal <bi...@gmail.com> wrote:
> You should try debugging such queries to see how exactly they're being
> executed.
> That will give you an idea as to why you're seeing the results you see.
>
> On Fri, 15 Jan 2016, 19:05 Brian Narsi <bn...@gmail.com> wrote:
>
>> We have an index of 25 fields. Currently number of records in index is
>> about 120,000. We are using
>>
>> parser: edismax
>>
>> qf: contains 8 fields
>>
>> fq: 1 field
>>
>> mm = 1
>>
>> qs = 6
>>
>> pf: containing g 3 fields
>>
>> bf: containing 1 field
>>
>> We have noticed that sometimes results change between two searches even if
>> everything is constant.
>>
>> What we have identified is if we reindex data and optimize it remedies the
>> situation.
>>
>> Is that expected behavior? Or should we also look into other factors?
>>
>> Thanks
>>
> --
> Regards,
> Binoy Dalal

Re: Query results change

Posted by Binoy Dalal <bi...@gmail.com>.
You should try debugging such queries to see how exactly they're being
executed.
That will give you an idea as to why you're seeing the results you see.

On Fri, 15 Jan 2016, 19:05 Brian Narsi <bn...@gmail.com> wrote:

> We have an index of 25 fields. Currently number of records in index is
> about 120,000. We are using
>
> parser: edismax
>
> qf: contains 8 fields
>
> fq: 1 field
>
> mm = 1
>
> qs = 6
>
> pf: containing g 3 fields
>
> bf: containing 1 field
>
> We have noticed that sometimes results change between two searches even if
> everything is constant.
>
> What we have identified is if we reindex data and optimize it remedies the
> situation.
>
> Is that expected behavior? Or should we also look into other factors?
>
> Thanks
>
-- 
Regards,
Binoy Dalal