You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Michael Conrad <mi...@newsrx.com> on 2021/10/13 12:51:44 UTC

UpgradeIndexMergePolicy | This policy no longer works as described | This is apparently by design

This is apparently by design. This policy no longer seems to work as 
described, and is actually misleading, perhaps remove it or when this 
policy is in effect actually stamp the indexes with the new version 
number? [As this is an intentional non-default configuration - my vote 
would be to *stamp* with updated version number - with appropriate 
warnings added to both the policy class and the policy factory class in 
the javadocs]

Based on this, our systems will be stuck at 7.7.3 for a very long time.

See the following post at stackoverflow, segments upgraded remain 
*stamped* as being created in an older version even when they are 
rewritten into a newer version segment.

https://stackoverflow.com/a/58619742/1341731

<rant>
Not everyone has the luxury of being able reindex from scratch with data 
they might not have copies of anymore, say, for copyright reasons, or 
because of space constraints that can't be alleviated, or the size of 
collection make it unrealistic in time...
</rant>


On 10/12/21 9:44 AM, Michael Conrad wrote:
> Attempting to use the merge policy "UpgradeIndexMergePolicy" via the 
> "UpgradeIndexMergePolicyFactory" seems to result in no merging.
>
> Here is my config fragment:
>
>
> <mergePolicyFactory 
> class="org.apache.solr.index.UpgradeIndexMergePolicyFactory">
>         <str name="wrapped.prefix">mergePolicy</str>
>         <str 
> name="mergePolicy.class">org.apache.solr.index.TieredMergePolicyFactory</str>
>         <int name="mergePolicy.maxMergeAtOnce">2</int>
>         <int name="mergePolicy.segmentsPerTier">2</int>
>         <double name="mergePolicy.noCFSRatio">0.0</double>
> </mergePolicyFactory>
>
>
> The description of the policy says:
>
> > This MergePolicy is used for upgrading all existing segments of an 
> index when calling IndexWriter.forceMerge(int).
> > All other methods delegate to the base MergePolicy given to the 
> constructor. This allows for an as-cheap-as possible
> > upgrade of an older index by only upgrading segments that are 
> created by previous Lucene versions.
> > forceMerge does no longer really merge; it is just used to 
> "forceMerge" older segment versions away.
>
> Based on this description, when I run "forceMerge" via the curl command:
>
> http://solr-0001:8983/solr/collection/update?optimize=true&maxSegments=1
>
> The non-current version segments should get rewritten to new 
> replacement segments using the current Lucene version without 
> reindexing being needed.
>
> The documentation does not provide an example of how to actually use 
> this policy in a collection's config file. My fragment is a best guess 
> based on Google search results.
>
>
> Links:
>
> https://lucene.apache.org/core/7_7_3/core/org/apache/lucene/index/UpgradeIndexMergePolicy.html?is-external=true
>
> https://solr.apache.org/docs/7_7_3/solr-core/org/apache/solr/index/UpgradeIndexMergePolicyFactory.html
>
> Help would be appreciated.
>
> -Michael/NewSRX Tech
>


Re: UpgradeIndexMergePolicy | This policy no longer works as described | This is apparently by design

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/13/21 6:51 AM, Michael Conrad wrote:
> <rant>
> Not everyone has the luxury of being able reindex from scratch with 
> data they might not have copies of anymore, say, for copyright 
> reasons, or because of space constraints that can't be alleviated, or 
> the size of collection make it unrealistic in time...
> </rant> 


Completely understandable, but also problematic.

With any Lucene-based software, including Solr, reindexing is REQUIRED 
after many config changes, and it is highly recommended on ANY upgrade, 
even to a new minor version in the same major release.  Because of this, 
it is strongly recommended that the source data is always accessible for 
building the Solr index from scratch.

I once wrote a page on the Solr wiki about reindexing.  Some of that 
information, plus more that I didn't get written down, has been 
incorporated into the Solr Reference Guide:

https://solr.apache.org/guide/8_9/reindexing.html

One thing from my wiki page that did NOT make it into the reference 
guide is the idea of using a separate Solr install to act as an 
intermediary that just stores the data, doesn't make it searchable -- 
and using that Solr install as a source for reindexing.  This paradigm 
is being used successfully in the wild.

Indexing speed is another reason to avoid reindexes.  Indexing hundreds 
of millions of documents (or more) is going to take a while even when 
indexing speed is highly optimized.

Here is that wiki page that I wrote quite a while ago:

https://cwiki.apache.org/confluence/display/SOLR/HowToReindex

Thanks,
Shawn