You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Christine Poerschke (BLOOMBERG/ LONDON)" <cp...@bloomberg.net> on 2015/06/25 17:08:29 UTC

solr 4.8.1 to 4.10.4 upgrade / luceneMatchVersion

Hello.

We would like to upgrade from solr 4.8.1 to 4.10.4 and for existing collections (at least initially) continue to use the 4.8 lucene format rather than the latest 4.10 format.

Two main reasons for the preference:
(a) no need to worry about a not-yet-upgraded 4.8 replica recovering from an already-upgraded 4.10 replica.
(b) more options if it were to be necessary to downgrade from 4.10 back to 4.8 in a hurry.

[As an aside, we have seen that 4.10.4 creates a /configs/<configName>/_rest_managed.json znode which 4.8 does not like, so any downgrading would need to consider that also, on dev just re-uploading the <configName> config to zookeeper appears to work.]

Questions:

Q1a: Is it possible (for a 4.8.1 to 4.10.4 upgrade) to have this "stay with lucene-4.8" behaviour, out of the box or with a recommended patch?

Q1b: It appears making LiveIndexWriterConfig.matchVersion public and having DocumentsWriterPerThread use it instead of Version.LATEST could work. Has anyone used such a patch approach and/or can think of reasons why it would not work or not be a good idea?

Q2: Has anyone encountered something similar more generally i.e. with other/later versions of solr-lucene? (LUCENE-5871 removed the use of Version in IndexWriterConfig from version 5.0 onwards.)

Regards,

Christine

---------------------------------------------

the potential patch referred to in Q1b above:

--- a/lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThread.java
+++ b/lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThread.java
@@ -178,7 +178,7 @@ class DocumentsWriterPerThread {
     pendingUpdates.clear();
     deleteSlice = deleteQueue.newSlice();
    
-    segmentInfo = new SegmentInfo(directoryOrig, Version.LATEST, segmentName, -1, false, codec, null);
+    segmentInfo = new SegmentInfo(directoryOrig, this.indexWriterConfig.matchVersion, segmentName, -1, false, codec, null);
     assert numDocsInRAM == 0;
     if (INFO_VERBOSE && infoStream.isEnabled("DWPT")) {
       infoStream.message("DWPT", Thread.currentThread().getName() + " init seg=" + segmentName + " delQueue=" + deleteQueue);  

--- a/lucene/core/src/java/org/apache/lucene/index/LiveIndexWriterConfig.java
+++ b/lucene/core/src/java/org/apache/lucene/index/LiveIndexWriterConfig.java
@@ -96,7 +96,7 @@ public class LiveIndexWriterConfig {
   protected volatile int perThreadHardLimitMB;
 
   /** {@link Version} that {@link IndexWriter} should emulate. */
-  protected final Version matchVersion;
+  public final Version matchVersion;
 
   /** True if segment flushes should use compound file format */
   protected volatile boolean useCompoundFile = IndexWriterConfig.DEFAULT_USE_COMPOUND_FILE_SYSTEM;

Re: solr 4.8.1 to 4.10.4 upgrade / luceneMatchVersion

Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/25/2015 9:08 AM, Christine Poerschke (BLOOMBERG/ LONDON) wrote:
> We would like to upgrade from solr 4.8.1 to 4.10.4 and for existing collections (at least initially) continue to use the 4.8 lucene format rather than the latest 4.10 format.
>
> Two main reasons for the preference:
> (a) no need to worry about a not-yet-upgraded 4.8 replica recovering from an already-upgraded 4.10 replica.
> (b) more options if it were to be necessary to downgrade from 4.10 back to 4.8 in a hurry.
>
> [As an aside, we have seen that 4.10.4 creates a /configs/<configName>/_rest_managed.json znode which 4.8 does not like, so any downgrading would need to consider that also, on dev just re-uploading the <configName> config to zookeeper appears to work.]
>
> Questions:
>
> Q1a: Is it possible (for a 4.8.1 to 4.10.4 upgrade) to have this "stay with lucene-4.8" behaviour, out of the box or with a recommended patch?
>
> Q1b: It appears making LiveIndexWriterConfig.matchVersion public and having DocumentsWriterPerThread use it instead of Version.LATEST could work. Has anyone used such a patch approach and/or can think of reasons why it would not work or not be a good idea?
>
> Q2: Has anyone encountered something similar more generally i.e. with other/later versions of solr-lucene? (LUCENE-5871 removed the use of Version in IndexWriterConfig from version 5.0 onwards.)

Even though we have luceneMatchVersion exposed in the Solr config, this
does NOT affect the index format on disk.  As far as I'm aware it only
affects internal behavior of analysis components and similar code. 
Unless you change the postings format in your schema, you'll always get
the latest defaults from Lucene.

In the Lucene CHANGES.txt, I see that in version 4.10, there is a new
DocValues format, which would not be compatible with 4.8.  Looking at
class names, I see a number of Lucene49 format classes, including
Lucene49Codec ... which suggests that large parts of the Solr 4.10 index
format will not be compatible with Solr 4.8.  I haven't dived into the
code deeper to be sure about that, but from what I've seen in the past,
new version-specific format classes are only created when the index
format changes.  The index format does frequently change from one minor
version of Solr/Lucene to the next, and there is no backwards
compatibility when that happens.

I *think* that if you specify the postings and docValues formats for
every field in your schema.xml, you might be able to achieve backwards
version compatibility.  Note that these values will not be Lucene48xxxxx
classes ... because there are no classes specific to version 4.8.  Note
that if you do this, you may encounter problems trying to *upgrade* the
indexformat later ... when toying with the formats, it is better to
completely wipe out the index and rebuild it.

Additional note: All this is speculation based on the tidbits I've
picked up from residence on this list and the dev list.  None of it has
been attempted or verified, so what I'm telling you may be wildly incorrect.

Thanks,
Shawn