You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Zimmermann, Thomas" <tz...@techtarget.com> on 2020/02/24 22:42:37 UTC

Reindex Required for Merge Policy Changes?

Hi Folks –

Few questions before I tackled an upgrade here. Looking to go from 7.4 to 7.7.2 to take advantage of the improved Tiered Merge Policy and segment cleanup – we are dealing with some high (45%) deleted doc counts in a few cores. Would simply upgrading Solr and setting the cores to use Lucene 7.7.2 take advantage of these features? Would I need to reindex to get existing segments merged more efficiently? Does it depend on the size of my current segments vs the configuration of the merge policy or would upgrading simply allow solr to do its own thing help mitigate this issue?

Also – I noticed the 7.5+ defaults to the Autoscaling for replication, and 8.0 defaults to legacy. Would I potentially need to make changes to my existing configs to ensure they stay on Legacy replication?

Thanks much!
TZ




Re: Reindex Required for Merge Policy Changes?

Posted by "Zimmermann, Thomas" <tz...@techtarget.com>.
Thanks so much Erick. Sounds like this should be a perfect approach to helping resolve our current issue.

On 2/24/20, 6:48 PM, "Erick Erickson" <er...@gmail.com> wrote:

    Thomas:
    Yes, upgrading to 7.5+ will automagically take advantage of the improvements, eventually... No, you don’t have to reindex.
    
    The “eventually” part. As you add, and particularly replace, existing documents, TMP will make decisions based on the new policy. If you’ve optimized in the past and have a very large segment (I.e. > 5G), it’ll be rewritten when the number of deleted docs exceeds the threshold; I don’t remember what the exact number is. Point is it’ll recover from having an over-large segment over time and _eventually_ the largest segment will be < 5G.
    
    Absent a previous optimize making a large segment, I’d just consider optimizing after you’ve upgraded. The TMP revisions respect the max segment size, so that should purge all deleted documents from your index without creating a too-large one. Thereafter the number of deleted docs should remain < about 33%. It only really approaches that percentage when you’re updating lots of existing docs.
    
    Finally, expungeDeletes is less expensive than optimize because it doesn’t rewrite segments with 10% deleted docs so that’s an alternative to optimizing after upgrading.
    
    
    Best,
    Erick
    
    > On Feb 24, 2020, at 5:42 PM, Zimmermann, Thomas <tz...@techtarget.com> wrote:
    > 
    > Hi Folks –
    > 
    > Few questions before I tackled an upgrade here. Looking to go from 7.4 to 7.7.2 to take advantage of the improved Tiered Merge Policy and segment cleanup – we are dealing with some high (45%) deleted doc counts in a few cores. Would simply upgrading Solr and setting the cores to use Lucene 7.7.2 take advantage of these features? Would I need to reindex to get existing segments merged more efficiently? Does it depend on the size of my current segments vs the configuration of the merge policy or would upgrading simply allow solr to do its own thing help mitigate this issue?
    > 
    > Also – I noticed the 7.5+ defaults to the Autoscaling for replication, and 8.0 defaults to legacy. Would I potentially need to make changes to my existing configs to ensure they stay on Legacy replication?
    > 
    > Thanks much!
    > TZ
    > 
    > 
    > 
    


Re: Reindex Required for Merge Policy Changes?

Posted by Erick Erickson <er...@gmail.com>.
Thomas:
Yes, upgrading to 7.5+ will automagically take advantage of the improvements, eventually... No, you don’t have to reindex.

The “eventually” part. As you add, and particularly replace, existing documents, TMP will make decisions based on the new policy. If you’ve optimized in the past and have a very large segment (I.e. > 5G), it’ll be rewritten when the number of deleted docs exceeds the threshold; I don’t remember what the exact number is. Point is it’ll recover from having an over-large segment over time and _eventually_ the largest segment will be < 5G.

Absent a previous optimize making a large segment, I’d just consider optimizing after you’ve upgraded. The TMP revisions respect the max segment size, so that should purge all deleted documents from your index without creating a too-large one. Thereafter the number of deleted docs should remain < about 33%. It only really approaches that percentage when you’re updating lots of existing docs.

Finally, expungeDeletes is less expensive than optimize because it doesn’t rewrite segments with 10% deleted docs so that’s an alternative to optimizing after upgrading.


Best,
Erick

> On Feb 24, 2020, at 5:42 PM, Zimmermann, Thomas <tz...@techtarget.com> wrote:
> 
> Hi Folks –
> 
> Few questions before I tackled an upgrade here. Looking to go from 7.4 to 7.7.2 to take advantage of the improved Tiered Merge Policy and segment cleanup – we are dealing with some high (45%) deleted doc counts in a few cores. Would simply upgrading Solr and setting the cores to use Lucene 7.7.2 take advantage of these features? Would I need to reindex to get existing segments merged more efficiently? Does it depend on the size of my current segments vs the configuration of the merge policy or would upgrading simply allow solr to do its own thing help mitigate this issue?
> 
> Also – I noticed the 7.5+ defaults to the Autoscaling for replication, and 8.0 defaults to legacy. Would I potentially need to make changes to my existing configs to ensure they stay on Legacy replication?
> 
> Thanks much!
> TZ
> 
> 
>