You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Ashish Bisht <bi...@gmail.com> on 2019/02/04 05:54:08 UTC

Re: Solr relevancy score different on replicated nodes

Thanks Erick and everyone.We are checking on stats cache.

I noticed stats skew again and optimized the index to correct the same.As
per the documents.

https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
and 
https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/

wanted to check on below points considering we want stats skew to be
corrected.

1.When optimized single segment won't be natural merged easily.As we might
be doing manual optimize every time,what I visualize is at a certain point
in future we might be having a single large segment.What impact this large
segment is going to have?
Our index ~30k documents i.e files with content(Segment size <1Gb as of now)

1.Do you recommend going for optimize in these situations?Probably it will
be done only when stats skew.Is it safe?

Regards
Ashish

 




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr relevancy score different on replicated nodes

Posted by Aman Tandon <am...@gmail.com>.

Thanks Erick for your suggestions and time.

On Tue, Feb 12, 2019, 22:32 Erick Erickson <erickerickson@gmail.com wrote:

> You really only have four
> 1> use exactstats. This won't guarantee precise matches, but they'll be
> closer
> 2> optimize (not particularly recommended, but if you're willing to do
> it periodically it'll have the stats match until the next updates).
> 3> use TLOG/PULL replicas and confine the requests to the PULL
> replicas. There'll _still_ be some window for mismatches,
>     specifically the default is commit_interval/2
> 4> define the problem away.
>
> Best,
> Erick
>
> On Tue, Feb 12, 2019 at 2:42 AM Aman Tandon <am...@gmail.com>
> wrote:
> >
> > Hi Erick,
> >
> > Any suggestions on this?
> >
> > Regards,
> > Aman
> >
> > On Fri, Feb 8, 2019, 17:07 Aman Tandon <amantandon.10@gmail.com wrote:
> >
> > > Hi Erick,
> > >
> > > I find this thread very relevant to the people who are facing the same
> > > problem.
> > >
> > > In our case, we have a signals aggregation collection which is having
> > > total of around 8 million records. We have Solr cloud architecture(3
> shards
> > > and 4 replicas) and the whole size of index is of around 2.5 GB.
> > >
> > > We use this collection to fetch the most clicked products against a
> query
> > > and boost in search results. Boost score is the query score on
> aggregation
> > > collection.
> > >
> > > But when the query goes to different replica we get different boost
> score
> > > for some of the keywords, hence on page refresh results ordering keep
> on
> > > changing.
> > >
> > > In order to solve we tried the exactstats cache for distributed IDF
> and on
> > > debug level I am seeing global stats merge in logs but still the
> different
> > > scores coming on refreshing the results from aggregation collection.
> > >
> > > Our indexing occur once a day so should we do daily optimization or
> should
> > > we reduce merge segment count to 2/3 currently it is -1.
> > >
> > > What are your suggestions on this?
> > >
> > > Regards,
> > > Aman
> > >
> > > On Fri, Feb 8, 2019, 00:15 Erick Erickson <erickerickson@gmail.com
> wrote:
> > >
> > >> Optimization is safe. The large segment is irrelevant, you'll
> > >> lose a little parallelization, but on an index with this few
> > >> documents I doubt you'll notice.
> > >>
> > >> As of Solr 5, optimize will respect the max segment size
> > >> which defaults to 5G, but you're well under that limit.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Sun, Feb 3, 2019 at 11:54 PM Ashish Bisht <bishtashish77@gmail.com
> >
> > >> wrote:
> > >> >
> > >> > Thanks Erick and everyone.We are checking on stats cache.
> > >> >
> > >> > I noticed stats skew again and optimized the index to correct the
> > >> same.As
> > >> > per the documents.
> > >> >
> > >> >
> > >>
> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
> > >> > and
> > >> >
> > >>
> https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
> > >> >
> > >> > wanted to check on below points considering we want stats skew to be
> > >> > corrected.
> > >> >
> > >> > 1.When optimized single segment won't be natural merged easily.As we
> > >> might
> > >> > be doing manual optimize every time,what I visualize is at a certain
> > >> point
> > >> > in future we might be having a single large segment.What impact this
> > >> large
> > >> > segment is going to have?
> > >> > Our index ~30k documents i.e files with content(Segment size <1Gb
> as of
> > >> now)
> > >> >
> > >> > 1.Do you recommend going for optimize in these situations?Probably
> it
> > >> will
> > >> > be done only when stats skew.Is it safe?
> > >> >
> > >> > Regards
> > >> > Ashish
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Sent from:
> http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> > >>
> > >
>

Re: Solr relevancy score different on replicated nodes

Posted by Erick Erickson <er...@gmail.com>.

You really only have four
1> use exactstats. This won't guarantee precise matches, but they'll be closer
2> optimize (not particularly recommended, but if you're willing to do
it periodically it'll have the stats match until the next updates).
3> use TLOG/PULL replicas and confine the requests to the PULL
replicas. There'll _still_ be some window for mismatches,
    specifically the default is commit_interval/2
4> define the problem away.

Best,
Erick

On Tue, Feb 12, 2019 at 2:42 AM Aman Tandon <am...@gmail.com> wrote:
>
> Hi Erick,
>
> Any suggestions on this?
>
> Regards,
> Aman
>
> On Fri, Feb 8, 2019, 17:07 Aman Tandon <amantandon.10@gmail.com wrote:
>
> > Hi Erick,
> >
> > I find this thread very relevant to the people who are facing the same
> > problem.
> >
> > In our case, we have a signals aggregation collection which is having
> > total of around 8 million records. We have Solr cloud architecture(3 shards
> > and 4 replicas) and the whole size of index is of around 2.5 GB.
> >
> > We use this collection to fetch the most clicked products against a query
> > and boost in search results. Boost score is the query score on aggregation
> > collection.
> >
> > But when the query goes to different replica we get different boost score
> > for some of the keywords, hence on page refresh results ordering keep on
> > changing.
> >
> > In order to solve we tried the exactstats cache for distributed IDF and on
> > debug level I am seeing global stats merge in logs but still the different
> > scores coming on refreshing the results from aggregation collection.
> >
> > Our indexing occur once a day so should we do daily optimization or should
> > we reduce merge segment count to 2/3 currently it is -1.
> >
> > What are your suggestions on this?
> >
> > Regards,
> > Aman
> >
> > On Fri, Feb 8, 2019, 00:15 Erick Erickson <erickerickson@gmail.com wrote:
> >
> >> Optimization is safe. The large segment is irrelevant, you'll
> >> lose a little parallelization, but on an index with this few
> >> documents I doubt you'll notice.
> >>
> >> As of Solr 5, optimize will respect the max segment size
> >> which defaults to 5G, but you're well under that limit.
> >>
> >> Best,
> >> Erick
> >>
> >> On Sun, Feb 3, 2019 at 11:54 PM Ashish Bisht <bi...@gmail.com>
> >> wrote:
> >> >
> >> > Thanks Erick and everyone.We are checking on stats cache.
> >> >
> >> > I noticed stats skew again and optimized the index to correct the
> >> same.As
> >> > per the documents.
> >> >
> >> >
> >> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
> >> > and
> >> >
> >> https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
> >> >
> >> > wanted to check on below points considering we want stats skew to be
> >> > corrected.
> >> >
> >> > 1.When optimized single segment won't be natural merged easily.As we
> >> might
> >> > be doing manual optimize every time,what I visualize is at a certain
> >> point
> >> > in future we might be having a single large segment.What impact this
> >> large
> >> > segment is going to have?
> >> > Our index ~30k documents i.e files with content(Segment size <1Gb as of
> >> now)
> >> >
> >> > 1.Do you recommend going for optimize in these situations?Probably it
> >> will
> >> > be done only when stats skew.Is it safe?
> >> >
> >> > Regards
> >> > Ashish
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >>
> >

Re: Solr relevancy score different on replicated nodes

Posted by Aman Tandon <am...@gmail.com>.

Hi Erick,

Any suggestions on this?

Regards,
Aman

On Fri, Feb 8, 2019, 17:07 Aman Tandon <amantandon.10@gmail.com wrote:

> Hi Erick,
>
> I find this thread very relevant to the people who are facing the same
> problem.
>
> In our case, we have a signals aggregation collection which is having
> total of around 8 million records. We have Solr cloud architecture(3 shards
> and 4 replicas) and the whole size of index is of around 2.5 GB.
>
> We use this collection to fetch the most clicked products against a query
> and boost in search results. Boost score is the query score on aggregation
> collection.
>
> But when the query goes to different replica we get different boost score
> for some of the keywords, hence on page refresh results ordering keep on
> changing.
>
> In order to solve we tried the exactstats cache for distributed IDF and on
> debug level I am seeing global stats merge in logs but still the different
> scores coming on refreshing the results from aggregation collection.
>
> Our indexing occur once a day so should we do daily optimization or should
> we reduce merge segment count to 2/3 currently it is -1.
>
> What are your suggestions on this?
>
> Regards,
> Aman
>
> On Fri, Feb 8, 2019, 00:15 Erick Erickson <erickerickson@gmail.com wrote:
>
>> Optimization is safe. The large segment is irrelevant, you'll
>> lose a little parallelization, but on an index with this few
>> documents I doubt you'll notice.
>>
>> As of Solr 5, optimize will respect the max segment size
>> which defaults to 5G, but you're well under that limit.
>>
>> Best,
>> Erick
>>
>> On Sun, Feb 3, 2019 at 11:54 PM Ashish Bisht <bi...@gmail.com>
>> wrote:
>> >
>> > Thanks Erick and everyone.We are checking on stats cache.
>> >
>> > I noticed stats skew again and optimized the index to correct the
>> same.As
>> > per the documents.
>> >
>> >
>> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
>> > and
>> >
>> https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
>> >
>> > wanted to check on below points considering we want stats skew to be
>> > corrected.
>> >
>> > 1.When optimized single segment won't be natural merged easily.As we
>> might
>> > be doing manual optimize every time,what I visualize is at a certain
>> point
>> > in future we might be having a single large segment.What impact this
>> large
>> > segment is going to have?
>> > Our index ~30k documents i.e files with content(Segment size <1Gb as of
>> now)
>> >
>> > 1.Do you recommend going for optimize in these situations?Probably it
>> will
>> > be done only when stats skew.Is it safe?
>> >
>> > Regards
>> > Ashish
>> >
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>

Re: Solr relevancy score different on replicated nodes

Posted by Aman Tandon <am...@gmail.com>.

Hi Erick,

I find this thread very relevant to the people who are facing the same
problem.

In our case, we have a signals aggregation collection which is having total
of around 8 million records. We have Solr cloud architecture(3 shards and 4
replicas) and the whole size of index is of around 2.5 GB.

We use this collection to fetch the most clicked products against a query
and boost in search results. Boost score is the query score on aggregation
collection.

But when the query goes to different replica we get different boost score
for some of the keywords, hence on page refresh results ordering keep on
changing.

In order to solve we tried the exactstats cache for distributed IDF and on
debug level I am seeing global stats merge in logs but still the different
scores coming on refreshing the results from aggregation collection.

Our indexing occur once a day so should we do daily optimization or should
we reduce merge segment count to 2/3 currently it is -1.

What are your suggestions on this?

Regards,
Aman

On Fri, Feb 8, 2019, 00:15 Erick Erickson <erickerickson@gmail.com wrote:

> Optimization is safe. The large segment is irrelevant, you'll
> lose a little parallelization, but on an index with this few
> documents I doubt you'll notice.
>
> As of Solr 5, optimize will respect the max segment size
> which defaults to 5G, but you're well under that limit.
>
> Best,
> Erick
>
> On Sun, Feb 3, 2019 at 11:54 PM Ashish Bisht <bi...@gmail.com>
> wrote:
> >
> > Thanks Erick and everyone.We are checking on stats cache.
> >
> > I noticed stats skew again and optimized the index to correct the same.As
> > per the documents.
> >
> >
> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
> > and
> >
> https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
> >
> > wanted to check on below points considering we want stats skew to be
> > corrected.
> >
> > 1.When optimized single segment won't be natural merged easily.As we
> might
> > be doing manual optimize every time,what I visualize is at a certain
> point
> > in future we might be having a single large segment.What impact this
> large
> > segment is going to have?
> > Our index ~30k documents i.e files with content(Segment size <1Gb as of
> now)
> >
> > 1.Do you recommend going for optimize in these situations?Probably it
> will
> > be done only when stats skew.Is it safe?
> >
> > Regards
> > Ashish
> >
> >
> >
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: Solr relevancy score different on replicated nodes

Posted by Erick Erickson <er...@gmail.com>.

Optimization is safe. The large segment is irrelevant, you'll
lose a little parallelization, but on an index with this few
documents I doubt you'll notice.

As of Solr 5, optimize will respect the max segment size
which defaults to 5G, but you're well under that limit.

Best,
Erick

On Sun, Feb 3, 2019 at 11:54 PM Ashish Bisht <bi...@gmail.com> wrote:
>
> Thanks Erick and everyone.We are checking on stats cache.
>
> I noticed stats skew again and optimized the index to correct the same.As
> per the documents.
>
> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
> and
> https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
>
> wanted to check on below points considering we want stats skew to be
> corrected.
>
> 1.When optimized single segment won't be natural merged easily.As we might
> be doing manual optimize every time,what I visualize is at a certain point
> in future we might be having a single large segment.What impact this large
> segment is going to have?
> Our index ~30k documents i.e files with content(Segment size <1Gb as of now)
>
> 1.Do you recommend going for optimize in these situations?Probably it will
> be done only when stats skew.Is it safe?
>
> Regards
> Ashish
>
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html