You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Erick Erickson (JIRA)" <ji...@apache.org> on 2018/05/10 04:37:00 UTC

[jira] [Comment Edited] (LUCENE-8264) Allow an option to rewrite all segments

    [ https://issues.apache.org/jira/browse/LUCENE-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469894#comment-16469894 ] 

Erick Erickson edited comment on LUCENE-8264 at 5/10/18 4:36 AM:
-----------------------------------------------------------------

OK, we've pretty well disposed of the whole N-2 -> N upgrade issue, ain't gonna happen. There are still two other cases where this would be useful:

1> N-1 -> N
2> adding DocValues without re-indexing

Of the two, <2> is probably the most immediately useful, I've seen a lot of clients in the field be hurt when they realize that they'd have been better off with docValues but didn't have them turned on.

Since I'm working on TMP, that's where I'm focusing. How to implement? A new method on MergePolicy that no-op'd for everything except TMP? See the discussion at LUCENE-8004, but the gist is:

1> some new methods on MergePolicy that returned information from the concrete policy like default max merge segments (don't particularly like that). Callers would have to "do the right thing", which is trappy.

OR 

2> a new method on MergePolicy like {{findRewriteAllSegments}} that was essentially {{findForcedMerges}} that makes some extra decisions. A pass-through for everything except TMP currently.

Or is the right thing to do here is create, say a new MergePolicy {{AddDocValuesBecaseYouDidntReadTheManualAboutWhyDocValuesWereAGoodThingMergePolicy}}?

Off the top of my head it would take (somehow) a list of fields to add DocValues to and then "do the right thing". I don't have any details worked out yet, want to discuss before diving in.

The requirement is that in a distributed system I can issue one command that'll fix this everywhere I care about. I don't really have a clue how it'd deal with being applied twice in a row, merging some segments with and some segments without etc......


was (Author: erickerickson):
See comment 9-May.

> Allow an option to rewrite all segments
> ---------------------------------------
>
>                 Key: LUCENE-8264
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8264
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Major
>
> For the background, see SOLR-12259.
> There are several use-cases that would be much easier, especially during upgrades, if we could specify that all segments get rewritten. 
> One example: Upgrading 5x->6x->7x. When segments are merged, they're rewritten into the current format. However, there's no guarantee that a particular segment _ever_ gets merged so the 6x-7x upgrade won't necessarily be successful.
> How many merge policies support this is an open question. I propose to start with TMP and raise other JIRAs as necessary for other merge policies.
> So far the usual response has been "re-index from scratch", but that's increasingly difficult as systems get larger.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org