You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by jena <st...@gmail.com> on 2019/06/07 05:27:34 UTC

Urgent help on solr optimisation issue !!

Hello guys,

We have 4 solr(version 4.4) instance on production environment, which are
linked/associated with zookeeper for replication. We do heavy deleted & add
operations. We have around 26million records and the index size is around
70GB. We serve 100k+ requests per day.


Because of heavy indexing & deletion, we optimise solr instance everyday,
because of that our solr cloud getting unstable , every solr instance go on
recovery mode & our search is getting affected & very slow because of that.
Optimisation takes around 1hr 30minutes. 
We are not able fix this issue, please help.

Thanks & Regards



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Urgent help on solr optimisation issue !!

Posted by jena <st...@gmail.com>.
Thanks @Erick for the suggestions. That looks so bad, yes your assumptions
are right, we have lot of delete & index documents as well. 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Urgent help on solr optimisation issue !!

Posted by Erick Erickson <er...@gmail.com>.

> On Jun 7, 2019, at 7:53 AM, David Santamauro <da...@gmail.com> wrote:
> 
> So is this new optimize maxSegments / commit expungeDeletes behavior in 7.5? My experience, and I watch the my optimize process very closely, is that using maxSgements does not touch every segment with a deleted document. expungeDeletes merges all segments that have deleted documents that have been touched with said commit.
> 

Which part? 

The  different thing about 7.5 is that an optimize that doesn’t specify maxSegments will remove all deleted docs from an index without creating massive segments. Prior to 7.5 a simple optimize would create a single segment by default, no matter how large.

If, after the end of an optimize on a quiescent index, you see a difference between maxDoc and numDocs (or  deletedDocs  > 0) for a core, then that’s entirely unexpected  for any version of Solr.  NOTE: If you are actively indexing while optimizing you may see deleted docs in your index after optimize since optimize works on the segments it sees when the operation starts….

ExpungeDeletes has always, IIUC, defaulted to only merging segments  with > 10% deleted docs.

Best,
Erick

> After reading LUCENE-7976, it seems this is, indeed, new behavior.
> 
> 
> On 6/7/19, 10:31 AM, "Erick Erickson" <er...@gmail.com> wrote:
> 
>    Optimizing guarantees that there will be _no_ deleted documents in an index when done. If a segment has even one deleted document, it’s merged, no matter what you specify for maxSegments. 
> 
>    Segments are write-once, so to remove deleted data from a segment it must be at least rewritten into a new segment, whether or not it’s merged with another segment on optimize.
> 
>    expungeDeletes  does _not_ merge every segment that has deleted documents. It merges segments that have > 10% (the default) deleted documents. If your index happens to have all segments with > 10% deleted docs, then it will, indeed, merge all of them.
> 
>    In your example, if you look closely you should find that all segments that had any deleted documents were written (merged) to new segments. I’d expect that segments with _no_ deleted documents might mostly be left alone. And two of the segments were chosen to merge together.
> 
>    See LUCENE-7976 for a long discussion of how this changed starting  with SOLR 7.5.
> 
>    Best,
>    Erick
> 
>> On Jun 7, 2019, at 7:07 AM, David Santamauro <da...@gmail.com> wrote:
>> 
>> Erick, on 6.0.1, optimize with maxSegments only merges down to the specified number. E.g., given an index with 75 segments, optimize with maxSegments=74 will only merge 2 segments leaving 74 segments. It will choose a segment to merge that has deleted documents, but does not merge every segment with deleted documents.
>> 
>> I think you are thinking about the expungeDeletes parameter on the commit request. That will merge every segment that has a deleted document.
>> 
>> 
>> On 6/7/19, 10:00 AM, "Erick Erickson" <er...@gmail.com> wrote:
>> 
>>   This isn’t quite right. Solr will rewrite _all_ segments that have _any_ deleted documents in them when optimizing, even one. Given your description, I’d guess that all your segments will have deleted documents, so even if you do specify maxSegments on the optimize command, the entire index will be rewritten.
>> 
>>   You’re in a bind, see: https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/. You have this one massive segment and it will _not_ be merged until it’s almost all deleted documents, see the link above for a fuller explanation.
>> 
>>   Prior to Solr 7.5 you don’t have many options except to re-index and _not_ optimize. So if possible I’d reindex from scratch into a new collection and do not optimize. Or restructure your process such that you can optimize in a quiet period when little indexing is going on.
>> 
>>   Best,
>>   Erick
>> 
>>> On Jun 7, 2019, at 2:51 AM, jena <st...@gmail.com> wrote:
>>> 
>>> Thanks @Nicolas Franck for reply, i don't see any any segment info for 4.4
>>> version. Is there any API i can use to get my segment information ? Will try
>>> to use maxSegments and see if it can help us during optimization.
>>> 
>>> 
>>> 
>>> --
>>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>> 
>> 
> 
> 


Re: Urgent help on solr optimisation issue !!

Posted by David Santamauro <da...@gmail.com>.
So is this new optimize maxSegments / commit expungeDeletes behavior in 7.5? My experience, and I watch the my optimize process very closely, is that using maxSgements does not touch every segment with a deleted document. expungeDeletes merges all segments that have deleted documents that have been touched with said commit.

After reading LUCENE-7976, it seems this is, indeed, new behavior.


On 6/7/19, 10:31 AM, "Erick Erickson" <er...@gmail.com> wrote:

    Optimizing guarantees that there will be _no_ deleted documents in an index when done. If a segment has even one deleted document, it’s merged, no matter what you specify for maxSegments. 
    
    Segments are write-once, so to remove deleted data from a segment it must be at least rewritten into a new segment, whether or not it’s merged with another segment on optimize.
    
    expungeDeletes  does _not_ merge every segment that has deleted documents. It merges segments that have > 10% (the default) deleted documents. If your index happens to have all segments with > 10% deleted docs, then it will, indeed, merge all of them.
    
    In your example, if you look closely you should find that all segments that had any deleted documents were written (merged) to new segments. I’d expect that segments with _no_ deleted documents might mostly be left alone. And two of the segments were chosen to merge together.
    
    See LUCENE-7976 for a long discussion of how this changed starting  with SOLR 7.5.
    
    Best,
    Erick
    
    > On Jun 7, 2019, at 7:07 AM, David Santamauro <da...@gmail.com> wrote:
    > 
    > Erick, on 6.0.1, optimize with maxSegments only merges down to the specified number. E.g., given an index with 75 segments, optimize with maxSegments=74 will only merge 2 segments leaving 74 segments. It will choose a segment to merge that has deleted documents, but does not merge every segment with deleted documents.
    > 
    > I think you are thinking about the expungeDeletes parameter on the commit request. That will merge every segment that has a deleted document.
    > 
    > 
    > On 6/7/19, 10:00 AM, "Erick Erickson" <er...@gmail.com> wrote:
    > 
    >    This isn’t quite right. Solr will rewrite _all_ segments that have _any_ deleted documents in them when optimizing, even one. Given your description, I’d guess that all your segments will have deleted documents, so even if you do specify maxSegments on the optimize command, the entire index will be rewritten.
    > 
    >    You’re in a bind, see: https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/. You have this one massive segment and it will _not_ be merged until it’s almost all deleted documents, see the link above for a fuller explanation.
    > 
    >    Prior to Solr 7.5 you don’t have many options except to re-index and _not_ optimize. So if possible I’d reindex from scratch into a new collection and do not optimize. Or restructure your process such that you can optimize in a quiet period when little indexing is going on.
    > 
    >    Best,
    >    Erick
    > 
    >> On Jun 7, 2019, at 2:51 AM, jena <st...@gmail.com> wrote:
    >> 
    >> Thanks @Nicolas Franck for reply, i don't see any any segment info for 4.4
    >> version. Is there any API i can use to get my segment information ? Will try
    >> to use maxSegments and see if it can help us during optimization.
    >> 
    >> 
    >> 
    >> --
    >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
    > 
    > 
    
    

Re: Urgent help on solr optimisation issue !!

Posted by Erick Erickson <er...@gmail.com>.
Optimizing guarantees that there will be _no_ deleted documents in an index when done. If a segment has even one deleted document, it’s merged, no matter what you specify for maxSegments. 

Segments are write-once, so to remove deleted data from a segment it must be at least rewritten into a new segment, whether or not it’s merged with another segment on optimize.

expungeDeletes  does _not_ merge every segment that has deleted documents. It merges segments that have > 10% (the default) deleted documents. If your index happens to have all segments with > 10% deleted docs, then it will, indeed, merge all of them.

In your example, if you look closely you should find that all segments that had any deleted documents were written (merged) to new segments. I’d expect that segments with _no_ deleted documents might mostly be left alone. And two of the segments were chosen to merge together.

See LUCENE-7976 for a long discussion of how this changed starting  with SOLR 7.5.

Best,
Erick

> On Jun 7, 2019, at 7:07 AM, David Santamauro <da...@gmail.com> wrote:
> 
> Erick, on 6.0.1, optimize with maxSegments only merges down to the specified number. E.g., given an index with 75 segments, optimize with maxSegments=74 will only merge 2 segments leaving 74 segments. It will choose a segment to merge that has deleted documents, but does not merge every segment with deleted documents.
> 
> I think you are thinking about the expungeDeletes parameter on the commit request. That will merge every segment that has a deleted document.
> 
> 
> On 6/7/19, 10:00 AM, "Erick Erickson" <er...@gmail.com> wrote:
> 
>    This isn’t quite right. Solr will rewrite _all_ segments that have _any_ deleted documents in them when optimizing, even one. Given your description, I’d guess that all your segments will have deleted documents, so even if you do specify maxSegments on the optimize command, the entire index will be rewritten.
> 
>    You’re in a bind, see: https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/. You have this one massive segment and it will _not_ be merged until it’s almost all deleted documents, see the link above for a fuller explanation.
> 
>    Prior to Solr 7.5 you don’t have many options except to re-index and _not_ optimize. So if possible I’d reindex from scratch into a new collection and do not optimize. Or restructure your process such that you can optimize in a quiet period when little indexing is going on.
> 
>    Best,
>    Erick
> 
>> On Jun 7, 2019, at 2:51 AM, jena <st...@gmail.com> wrote:
>> 
>> Thanks @Nicolas Franck for reply, i don't see any any segment info for 4.4
>> version. Is there any API i can use to get my segment information ? Will try
>> to use maxSegments and see if it can help us during optimization.
>> 
>> 
>> 
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> 
> 


Re: Urgent help on solr optimisation issue !!

Posted by David Santamauro <da...@gmail.com>.
/clarification/ ... expungeDeletes will merge every segment *touched by the current commit* that has a deleted document.


On 6/7/19, 10:07 AM, "David Santamauro" <da...@gmail.com> wrote:

    Erick, on 6.0.1, optimize with maxSegments only merges down to the specified number. E.g., given an index with 75 segments, optimize with maxSegments=74 will only merge 2 segments leaving 74 segments. It will choose a segment to merge that has deleted documents, but does not merge every segment with deleted documents.
    
    I think you are thinking about the expungeDeletes parameter on the commit request. That will merge every segment that has a deleted document.
    
    
    On 6/7/19, 10:00 AM, "Erick Erickson" <er...@gmail.com> wrote:
    
        This isn’t quite right. Solr will rewrite _all_ segments that have _any_ deleted documents in them when optimizing, even one. Given your description, I’d guess that all your segments will have deleted documents, so even if you do specify maxSegments on the optimize command, the entire index will be rewritten.
        
        You’re in a bind, see: https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/. You have this one massive segment and it will _not_ be merged until it’s almost all deleted documents, see the link above for a fuller explanation.
        
        Prior to Solr 7.5 you don’t have many options except to re-index and _not_ optimize. So if possible I’d reindex from scratch into a new collection and do not optimize. Or restructure your process such that you can optimize in a quiet period when little indexing is going on.
        
        Best,
        Erick
        
        > On Jun 7, 2019, at 2:51 AM, jena <st...@gmail.com> wrote:
        > 
        > Thanks @Nicolas Franck for reply, i don't see any any segment info for 4.4
        > version. Is there any API i can use to get my segment information ? Will try
        > to use maxSegments and see if it can help us during optimization.
        > 
        > 
        > 
        > --
        > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
        
        
    

Re: Urgent help on solr optimisation issue !!

Posted by David Santamauro <da...@gmail.com>.
Erick, on 6.0.1, optimize with maxSegments only merges down to the specified number. E.g., given an index with 75 segments, optimize with maxSegments=74 will only merge 2 segments leaving 74 segments. It will choose a segment to merge that has deleted documents, but does not merge every segment with deleted documents.

I think you are thinking about the expungeDeletes parameter on the commit request. That will merge every segment that has a deleted document.


On 6/7/19, 10:00 AM, "Erick Erickson" <er...@gmail.com> wrote:

    This isn’t quite right. Solr will rewrite _all_ segments that have _any_ deleted documents in them when optimizing, even one. Given your description, I’d guess that all your segments will have deleted documents, so even if you do specify maxSegments on the optimize command, the entire index will be rewritten.
    
    You’re in a bind, see: https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/. You have this one massive segment and it will _not_ be merged until it’s almost all deleted documents, see the link above for a fuller explanation.
    
    Prior to Solr 7.5 you don’t have many options except to re-index and _not_ optimize. So if possible I’d reindex from scratch into a new collection and do not optimize. Or restructure your process such that you can optimize in a quiet period when little indexing is going on.
    
    Best,
    Erick
    
    > On Jun 7, 2019, at 2:51 AM, jena <st...@gmail.com> wrote:
    > 
    > Thanks @Nicolas Franck for reply, i don't see any any segment info for 4.4
    > version. Is there any API i can use to get my segment information ? Will try
    > to use maxSegments and see if it can help us during optimization.
    > 
    > 
    > 
    > --
    > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
    
    

Re: Urgent help on solr optimisation issue !!

Posted by Erick Erickson <er...@gmail.com>.
This isn’t quite right. Solr will rewrite _all_ segments that have _any_ deleted documents in them when optimizing, even one. Given your description, I’d guess that all your segments will have deleted documents, so even if you do specify maxSegments on the optimize command, the entire index will be rewritten.

You’re in a bind, see: https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/. You have this one massive segment and it will _not_ be merged until it’s almost all deleted documents, see the link above for a fuller explanation.

Prior to Solr 7.5 you don’t have many options except to re-index and _not_ optimize. So if possible I’d reindex from scratch into a new collection and do not optimize. Or restructure your process such that you can optimize in a quiet period when little indexing is going on.

Best,
Erick

> On Jun 7, 2019, at 2:51 AM, jena <st...@gmail.com> wrote:
> 
> Thanks @Nicolas Franck for reply, i don't see any any segment info for 4.4
> version. Is there any API i can use to get my segment information ? Will try
> to use maxSegments and see if it can help us during optimization.
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Urgent help on solr optimisation issue !!

Posted by jena <st...@gmail.com>.
Thanks @Nicolas Franck for reply, i don't see any any segment info for 4.4
version. Is there any API i can use to get my segment information ? Will try
to use maxSegments and see if it can help us during optimization.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Urgent help on solr optimisation issue !!

Posted by Erick Erickson <er...@gmail.com>.
David:

Some of this still matters even with 7.5+. Prior to 7.5, you could easily have 50% of your index consist of deleted docs. With 7.5, this ceiling is reduced. expungeDeletes will reduce the size to no more than 10% while still respecting the default max segment size of 5G. Optimizing and specifying maxSegments was getting you what you wanted, but more as a side effect, ya’ got lucky ;)….

You can set a bunch of parameters explicitly for TieredMergePolicy, some of the juicy ones might be 

- maxMergedSegmentMB, default 5000, will result in fewer segments but doesn’t materially affect the ration of deleted docs.

-forceMergeDeletesPctAllowed (used in expungeDeletes, default 10%)

- deletesPctAllowed (when doing “regular” merging, i.e. not optimizing or expungeDeletes) this is the target ceiling for the % of deleted docs allowed in the index. Cannot set below 20%).


It’s a balance between I/O and wasted space. The reason deletesPctAllowed is not allowed to go below 20% is that it’s too easy to shoot yourself in the foot. Setting it to 5%, for instance, would send I/O (and CPU) through the roof, merging is an expensive operation. And you can get something similar by doing an expungeDeletes once rather than rewriting segments all the time….

Ditto with the default value for forceMergeDeletesPctAllowed. Setting it to 1%, for instance, is doing a LOT of work for little gain.

Best,
Erick


> On Jun 7, 2019, at 2:44 PM, David Santamauro <da...@gmail.com> wrote:
> 
> I use the same algorithm and for me, initialMaxSegments is always the number of segments currently in the index (seen, e.g, in the SOLR admin UI). finalMaxSegments depends on what kind of updates have happened. If I know that "older" documents are untouched, then I'll usually use -60% or even -70%, depending on the initialMaxSegments. I have a few cores that I'll even go all the way down to 1.
> 
> If you are going to attempt this, I'd suggest to test with a small reduction, say 10 segments, and monitor the index size and difference between maxDoc and numDocs. I've shaved ~ 1T off of an index optimizing from 75 down to  30 segments (7T index total) and reduced a significant % of delete documents in the process. YMMV ...
> 
> If you are using a version of SOLR >=7.5 (see LUCENE-7976), this might all be moot.
> 
> //
> 
> 
> On 6/7/19, 2:29 PM, "jena" <st...@gmail.com> wrote:
> 
>    Thanks @Michael Joyner,  how did you decide initialmax segment to 256 ? Or it
>    is some random number i can use for my case ? Can you guuide me how to
>    decide the initial & final max segments ?
> 
> 
>    Michael Joyner wrote
>> That is the way we do it here - also helps a lot with not needing x2 or 
>> x3 disk space to handle the merge:
>> 
>> public void solrOptimize() {
>>         int initialMaxSegments = 256;
>>         int finalMaxSegments = 4;
>>         if (isShowSegmentCounter()) {
>>             log.info("Optimizing ...");
>>         }
>>         try (SolrClient solrServerInstance = getSolrClientInstance()) {
>>             for (int segments = initialMaxSegments; segments >= 
>> finalMaxSegments; segments--) {
>>                 if (isShowSegmentCounter()) {
>>                     System.out.println("Optimizing to a max of " + 
>> segments + " segments.");
>>                 }
>>                 try {
>>                     solrServerInstance.optimize(true, true, segments);
>>                 } catch (RemoteSolrException | SolrServerException | 
>> IOException e) {
>>                     log.severe(e.getMessage());
>>                 }
>>             }
>>         } catch (IOException e) {
>>             throw new RuntimeException(e);
>>         }
>>     }
>> 
>> On 6/7/19 4:56 AM, Nicolas Franck wrote:
>>> In that case, hard optimisation like that is out the question.
>>> Resort to automatic merge policies, specifying a maximum
>>> amount of segments. Solr is created with multiple segments
>>> in mind. Hard optimisation seems like not worth the problem.
>>> 
>>> The problem is this: the less segments you specify during
>>> during an optimisation, the longer it will take, because it has to read
>>> all of these segments to be merged, and redo the sorting. And a cluster
>>> has a lot of housekeeping on top of it.
>>> 
>>> If you really want to issue a optimisation, then you can
>>> also do it in steps (max segments parameter)
>>> 
>>> 10 -> 9 -> 8 -> 7 .. -> 1
>>> 
>>> that way less segments need to be merged in one go.
>>> 
>>> testing your index will show you what a good maximum
>>> amount of segments is for your index.
>>> 
>>>> On 7 Jun 2019, at 07:27, jena &lt;
> 
>> sthita2010@
> 
>> &gt; wrote:
>>>> 
>>>> Hello guys,
>>>> 
>>>> We have 4 solr(version 4.4) instance on production environment, which
>>>> are
>>>> linked/associated with zookeeper for replication. We do heavy deleted &
>>>> add
>>>> operations. We have around 26million records and the index size is
>>>> around
>>>> 70GB. We serve 100k+ requests per day.
>>>> 
>>>> 
>>>> Because of heavy indexing & deletion, we optimise solr instance
>>>> everyday,
>>>> because of that our solr cloud getting unstable , every solr instance go
>>>> on
>>>> recovery mode & our search is getting affected & very slow because of
>>>> that.
>>>> Optimisation takes around 1hr 30minutes.
>>>> We are not able fix this issue, please help.
>>>> 
>>>> Thanks & Regards
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> 
> 
> 
> 
> 
>    --
>    Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> 


Re: Urgent help on solr optimisation issue !!

Posted by David Santamauro <da...@gmail.com>.
I use the same algorithm and for me, initialMaxSegments is always the number of segments currently in the index (seen, e.g, in the SOLR admin UI). finalMaxSegments depends on what kind of updates have happened. If I know that "older" documents are untouched, then I'll usually use -60% or even -70%, depending on the initialMaxSegments. I have a few cores that I'll even go all the way down to 1.

If you are going to attempt this, I'd suggest to test with a small reduction, say 10 segments, and monitor the index size and difference between maxDoc and numDocs. I've shaved ~ 1T off of an index optimizing from 75 down to  30 segments (7T index total) and reduced a significant % of delete documents in the process. YMMV ...

If you are using a version of SOLR >=7.5 (see LUCENE-7976), this might all be moot.

//


On 6/7/19, 2:29 PM, "jena" <st...@gmail.com> wrote:

    Thanks @Michael Joyner,  how did you decide initialmax segment to 256 ? Or it
    is some random number i can use for my case ? Can you guuide me how to
    decide the initial & final max segments ?
    
     
    Michael Joyner wrote
    > That is the way we do it here - also helps a lot with not needing x2 or 
    > x3 disk space to handle the merge:
    > 
    > public void solrOptimize() {
    >          int initialMaxSegments = 256;
    >          int finalMaxSegments = 4;
    >          if (isShowSegmentCounter()) {
    >              log.info("Optimizing ...");
    >          }
    >          try (SolrClient solrServerInstance = getSolrClientInstance()) {
    >              for (int segments = initialMaxSegments; segments >= 
    > finalMaxSegments; segments--) {
    >                  if (isShowSegmentCounter()) {
    >                      System.out.println("Optimizing to a max of " + 
    > segments + " segments.");
    >                  }
    >                  try {
    >                      solrServerInstance.optimize(true, true, segments);
    >                  } catch (RemoteSolrException | SolrServerException | 
    > IOException e) {
    >                      log.severe(e.getMessage());
    >                  }
    >              }
    >          } catch (IOException e) {
    >              throw new RuntimeException(e);
    >          }
    >      }
    > 
    > On 6/7/19 4:56 AM, Nicolas Franck wrote:
    >> In that case, hard optimisation like that is out the question.
    >> Resort to automatic merge policies, specifying a maximum
    >> amount of segments. Solr is created with multiple segments
    >> in mind. Hard optimisation seems like not worth the problem.
    >>
    >> The problem is this: the less segments you specify during
    >> during an optimisation, the longer it will take, because it has to read
    >> all of these segments to be merged, and redo the sorting. And a cluster
    >> has a lot of housekeeping on top of it.
    >>
    >> If you really want to issue a optimisation, then you can
    >> also do it in steps (max segments parameter)
    >>
    >> 10 -> 9 -> 8 -> 7 .. -> 1
    >>
    >> that way less segments need to be merged in one go.
    >>
    >> testing your index will show you what a good maximum
    >> amount of segments is for your index.
    >>
    >>> On 7 Jun 2019, at 07:27, jena &lt;
    
    > sthita2010@
    
    > &gt; wrote:
    >>>
    >>> Hello guys,
    >>>
    >>> We have 4 solr(version 4.4) instance on production environment, which
    >>> are
    >>> linked/associated with zookeeper for replication. We do heavy deleted &
    >>> add
    >>> operations. We have around 26million records and the index size is
    >>> around
    >>> 70GB. We serve 100k+ requests per day.
    >>>
    >>>
    >>> Because of heavy indexing & deletion, we optimise solr instance
    >>> everyday,
    >>> because of that our solr cloud getting unstable , every solr instance go
    >>> on
    >>> recovery mode & our search is getting affected & very slow because of
    >>> that.
    >>> Optimisation takes around 1hr 30minutes.
    >>> We are not able fix this issue, please help.
    >>>
    >>> Thanks & Regards
    >>>
    >>>
    >>>
    >>> --
    >>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
    
    
    
    
    
    --
    Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
    

Re: Urgent help on solr optimisation issue !!

Posted by jena <st...@gmail.com>.
Thanks @Michael Joyner,  how did you decide initialmax segment to 256 ? Or it
is some random number i can use for my case ? Can you guuide me how to
decide the initial & final max segments ?

 
Michael Joyner wrote
> That is the way we do it here - also helps a lot with not needing x2 or 
> x3 disk space to handle the merge:
> 
> public void solrOptimize() {
>          int initialMaxSegments = 256;
>          int finalMaxSegments = 4;
>          if (isShowSegmentCounter()) {
>              log.info("Optimizing ...");
>          }
>          try (SolrClient solrServerInstance = getSolrClientInstance()) {
>              for (int segments = initialMaxSegments; segments >= 
> finalMaxSegments; segments--) {
>                  if (isShowSegmentCounter()) {
>                      System.out.println("Optimizing to a max of " + 
> segments + " segments.");
>                  }
>                  try {
>                      solrServerInstance.optimize(true, true, segments);
>                  } catch (RemoteSolrException | SolrServerException | 
> IOException e) {
>                      log.severe(e.getMessage());
>                  }
>              }
>          } catch (IOException e) {
>              throw new RuntimeException(e);
>          }
>      }
> 
> On 6/7/19 4:56 AM, Nicolas Franck wrote:
>> In that case, hard optimisation like that is out the question.
>> Resort to automatic merge policies, specifying a maximum
>> amount of segments. Solr is created with multiple segments
>> in mind. Hard optimisation seems like not worth the problem.
>>
>> The problem is this: the less segments you specify during
>> during an optimisation, the longer it will take, because it has to read
>> all of these segments to be merged, and redo the sorting. And a cluster
>> has a lot of housekeeping on top of it.
>>
>> If you really want to issue a optimisation, then you can
>> also do it in steps (max segments parameter)
>>
>> 10 -> 9 -> 8 -> 7 .. -> 1
>>
>> that way less segments need to be merged in one go.
>>
>> testing your index will show you what a good maximum
>> amount of segments is for your index.
>>
>>> On 7 Jun 2019, at 07:27, jena &lt;

> sthita2010@

> &gt; wrote:
>>>
>>> Hello guys,
>>>
>>> We have 4 solr(version 4.4) instance on production environment, which
>>> are
>>> linked/associated with zookeeper for replication. We do heavy deleted &
>>> add
>>> operations. We have around 26million records and the index size is
>>> around
>>> 70GB. We serve 100k+ requests per day.
>>>
>>>
>>> Because of heavy indexing & deletion, we optimise solr instance
>>> everyday,
>>> because of that our solr cloud getting unstable , every solr instance go
>>> on
>>> recovery mode & our search is getting affected & very slow because of
>>> that.
>>> Optimisation takes around 1hr 30minutes.
>>> We are not able fix this issue, please help.
>>>
>>> Thanks & Regards
>>>
>>>
>>>
>>> --
>>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Urgent help on solr optimisation issue !!

Posted by Michael Joyner <mi...@newsrx.com>.
That is the way we do it here - also helps a lot with not needing x2 or 
x3 disk space to handle the merge:

public void solrOptimize() {
         int initialMaxSegments = 256;
         int finalMaxSegments = 4;
         if (isShowSegmentCounter()) {
             log.info("Optimizing ...");
         }
         try (SolrClient solrServerInstance = getSolrClientInstance()) {
             for (int segments = initialMaxSegments; segments >= 
finalMaxSegments; segments--) {
                 if (isShowSegmentCounter()) {
                     System.out.println("Optimizing to a max of " + 
segments + " segments.");
                 }
                 try {
                     solrServerInstance.optimize(true, true, segments);
                 } catch (RemoteSolrException | SolrServerException | 
IOException e) {
                     log.severe(e.getMessage());
                 }
             }
         } catch (IOException e) {
             throw new RuntimeException(e);
         }
     }

On 6/7/19 4:56 AM, Nicolas Franck wrote:
> In that case, hard optimisation like that is out the question.
> Resort to automatic merge policies, specifying a maximum
> amount of segments. Solr is created with multiple segments
> in mind. Hard optimisation seems like not worth the problem.
>
> The problem is this: the less segments you specify during
> during an optimisation, the longer it will take, because it has to read
> all of these segments to be merged, and redo the sorting. And a cluster
> has a lot of housekeeping on top of it.
>
> If you really want to issue a optimisation, then you can
> also do it in steps (max segments parameter)
>
> 10 -> 9 -> 8 -> 7 .. -> 1
>
> that way less segments need to be merged in one go.
>
> testing your index will show you what a good maximum
> amount of segments is for your index.
>
>> On 7 Jun 2019, at 07:27, jena <st...@gmail.com> wrote:
>>
>> Hello guys,
>>
>> We have 4 solr(version 4.4) instance on production environment, which are
>> linked/associated with zookeeper for replication. We do heavy deleted & add
>> operations. We have around 26million records and the index size is around
>> 70GB. We serve 100k+ requests per day.
>>
>>
>> Because of heavy indexing & deletion, we optimise solr instance everyday,
>> because of that our solr cloud getting unstable , every solr instance go on
>> recovery mode & our search is getting affected & very slow because of that.
>> Optimisation takes around 1hr 30minutes.
>> We are not able fix this issue, please help.
>>
>> Thanks & Regards
>>
>>
>>
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Urgent help on solr optimisation issue !!

Posted by Nicolas Franck <Ni...@UGent.be>.
In that case, hard optimisation like that is out the question.
Resort to automatic merge policies, specifying a maximum
amount of segments. Solr is created with multiple segments
in mind. Hard optimisation seems like not worth the problem.

The problem is this: the less segments you specify during
during an optimisation, the longer it will take, because it has to read
all of these segments to be merged, and redo the sorting. And a cluster
has a lot of housekeeping on top of it.

If you really want to issue a optimisation, then you can
also do it in steps (max segments parameter)

10 -> 9 -> 8 -> 7 .. -> 1

that way less segments need to be merged in one go.

testing your index will show you what a good maximum
amount of segments is for your index.

> On 7 Jun 2019, at 07:27, jena <st...@gmail.com> wrote:
> 
> Hello guys,
> 
> We have 4 solr(version 4.4) instance on production environment, which are
> linked/associated with zookeeper for replication. We do heavy deleted & add
> operations. We have around 26million records and the index size is around
> 70GB. We serve 100k+ requests per day.
> 
> 
> Because of heavy indexing & deletion, we optimise solr instance everyday,
> because of that our solr cloud getting unstable , every solr instance go on
> recovery mode & our search is getting affected & very slow because of that.
> Optimisation takes around 1hr 30minutes. 
> We are not able fix this issue, please help.
> 
> Thanks & Regards
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Urgent help on solr optimisation issue !!

Posted by jena <st...@gmail.com>.
Thanks Shawn for suggestions. Interesting to know deleteByQuery has some
impact, will try to change it as you have suggested. Thabks



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Urgent help on solr optimisation issue !!

Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/6/2019 11:27 PM, jena wrote:
> Because of heavy indexing & deletion, we optimise solr instance everyday,
> because of that our solr cloud getting unstable , every solr instance go on
> recovery mode & our search is getting affected & very slow because of that.
> Optimisation takes around 1hr 30minutes.

Ordinarily, optimizing would just be a transparent operation and even 
though it's slow, wouldn't be something that would interfere with index 
operation.

But if you add deleteByQuery to the mix, then you WILL have problems. 
These problems can occur even if you don't optimize -- because sometimes 
the normal segment merges will take a very long time like an optimize, 
and the same interference between deleteByQuery and segment merging will 
happen.

The fix for that is to stop doing deleteByQuery.  Replace it with a two 
step operation where you first do the query to get ID values, and then 
do deleteById.  That kind of delete will not have any bad interaction 
with segment merging.

Thanks,
Shawn