You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "tflobbe (via GitHub)" <gi...@apache.org> on 2023/05/24 19:04:42 UTC

[GitHub] [lucene] tflobbe opened a new pull request, #12326: Lucene 9.x fails to merge 8.x segments with a field that changed IndexOptions NONE -> DOCS

tflobbe opened a new pull request, #12326:
URL: https://github.com/apache/lucene/pull/12326

   This was reported in LUCENE-10314/#11350 and I'm surprised it didn't get more attention since it can get indices in a very bad state. 
   This is my understanding, feel free to correct me if I'm wrong:
   In Lucene 8, when a field was converted from having `IndexOptions.NONE` to `IndexOptions.DOCS`, the merging logic would make the new segment have `IndexOptions.DOCS` and things would continue to work.  Of course, docs that had been added before the change would not be searchable on this field, but it allowed the change to happen in-place for an existing index, users would probably want to re-submit all their docs to have the field be really searchable. Lucene 9 no longer supports this for various reasons, but it allows older indices to have this for backwards compatibility. However, the merging logic doesn't seem to handle this well. Instead of turning the `IndexOptions.NONE` into a `IndexOptions.DOCS` for the merged segment, it just takes the `IndexOptions` of whatever the first segment is which causes an invalid merge that fails, assertion error on tests, but an exception like this when running without assertions (from Lucene 9.4.2):
   
   ```
   java.lang.IllegalArgumentException: cannot write negative vLong (got: -368)
   	at org.apache.lucene.store.DataOutput.writeVLong(DataOutput.java:238)
   	at org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter$StatsWriter.add(Lucene90BlockTreeTermsWriter.java:571)
   	at org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter$TermsWriter.writeBlock(Lucene90BlockTreeTermsWriter.java:832)
   	at org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter$TermsWriter.writeBlocks(Lucene90BlockTreeTermsWriter.java:709)
   	at org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter$TermsWriter.finish(Lucene90BlockTreeTermsWriter.java:1105)
   	at org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter.write(Lucene90BlockTreeTermsWriter.java:370)
   	at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:95)
   	at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:204)
   	at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:208)
   	at org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:293)
   	at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:136)
   	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5141)
   	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4681)
   	at org.apache.solr.update.SolrIndexWriter.merge(SolrIndexWriter.java:274)
   	at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6430)
   	at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:638)
   	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:699)
   
   ```
   
   This makes upgrading pretty risky, because since it happens on merge, Lucene 9.x would be running fine until the merge policy decides to merge segments with different `IndexOptions` which could be weeks or more after the upgrade happened, and at that point, the index can't be rolled back (because it already has 9.x segments) and any backups would be very old. Even more, the outcome of the merge (successful or not) depends on the order in which the segments are selected.
   
   As of now, this PR only includes a failing test. This passes in Lucene 8 but fails in 9. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dnhatn commented on a diff in pull request #12326: Lucene 9.x fails to merge 8.x segments with a field that changed IndexOptions NONE -> DOCS

Posted by "dnhatn (via GitHub)" <gi...@apache.org>.
dnhatn commented on code in PR #12326:
URL: https://github.com/apache/lucene/pull/12326#discussion_r1213908644


##########
lucene/core/src/java/org/apache/lucene/index/FieldInfos.java:
##########
@@ -708,6 +750,23 @@ FieldInfo add(FieldInfo fi, long dvGen) {
       final FieldInfo curFi = fieldInfo(fi.getName());
       if (curFi != null) {
         curFi.verifySameSchema(fi, globalFieldNumbers.strictlyConsistent);
+
+        if (!globalFieldNumbers.strictlyConsistent) {
+          // For the not strictly consistent case (legacy index), we may need to merge the
+          // FieldInfo instances
+          FieldInfo updatedFieldInfo = curFi.handleLegacySupportedUpdates(fi);
+          if (updatedFieldInfo != null) {
+            if (fi.getDocValuesType() != DocValuesType.NONE
+                && curFi.getDocValuesType() == fi.getDocValuesType()) {
+              // Must also update docValuesType map so it's
+              // aware of this field's DocValuesType.  This will throw IllegalArgumentException if
+              // an illegal type change was attempted.
+              globalFieldNumbers.setDocValuesType(fi.number, fi.getName(), fi.getDocValuesType());

Review Comment:
   I think we should only call this when the doc_values type changed from None to something else?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] tflobbe merged pull request #12326: Lucene 9.x fails to merge 8.x segments with a field that changed IndexOptions NONE -> DOCS

Posted by "tflobbe (via GitHub)" <gi...@apache.org>.
tflobbe merged PR #12326:
URL: https://github.com/apache/lucene/pull/12326


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] tflobbe commented on pull request #12326: Lucene 9.x fails to merge 8.x segments with a field that changed IndexOptions NONE -> DOCS

Posted by "tflobbe (via GitHub)" <gi...@apache.org>.
tflobbe commented on PR #12326:
URL: https://github.com/apache/lucene/pull/12326#issuecomment-1574667851

   Thanks for the review @dnhatn. I plan to merge early next week


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] tflobbe commented on a diff in pull request #12326: Lucene 9.x fails to merge 8.x segments with a field that changed IndexOptions NONE -> DOCS

Posted by "tflobbe (via GitHub)" <gi...@apache.org>.
tflobbe commented on code in PR #12326:
URL: https://github.com/apache/lucene/pull/12326#discussion_r1214984687


##########
lucene/core/src/java/org/apache/lucene/index/FieldInfos.java:
##########
@@ -708,6 +750,23 @@ FieldInfo add(FieldInfo fi, long dvGen) {
       final FieldInfo curFi = fieldInfo(fi.getName());
       if (curFi != null) {
         curFi.verifySameSchema(fi, globalFieldNumbers.strictlyConsistent);
+
+        if (!globalFieldNumbers.strictlyConsistent) {
+          // For the not strictly consistent case (legacy index), we may need to merge the
+          // FieldInfo instances
+          FieldInfo updatedFieldInfo = curFi.handleLegacySupportedUpdates(fi);
+          if (updatedFieldInfo != null) {
+            if (fi.getDocValuesType() != DocValuesType.NONE
+                && curFi.getDocValuesType() == fi.getDocValuesType()) {
+              // Must also update docValuesType map so it's
+              // aware of this field's DocValuesType.  This will throw IllegalArgumentException if
+              // an illegal type change was attempted.
+              globalFieldNumbers.setDocValuesType(fi.number, fi.getName(), fi.getDocValuesType());

Review Comment:
   Ah, yes, good catch! submitting a fix



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dnhatn commented on pull request #12326: Lucene 9.x fails to merge 8.x segments with a field that changed IndexOptions NONE -> DOCS

Posted by "dnhatn (via GitHub)" <gi...@apache.org>.
dnhatn commented on PR #12326:
URL: https://github.com/apache/lucene/pull/12326#issuecomment-1572574391

   @tflobbe Thanks for working on this. I am looking at the PR now. Sorry for the delay.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] tflobbe commented on a diff in pull request #12326: Lucene 9.x fails to merge 8.x segments with a field that changed IndexOptions NONE -> DOCS

Posted by "tflobbe (via GitHub)" <gi...@apache.org>.
tflobbe commented on code in PR #12326:
URL: https://github.com/apache/lucene/pull/12326#discussion_r1206055144


##########
lucene/core/src/java/org/apache/lucene/index/FieldInfos.java:
##########
@@ -628,6 +628,48 @@ FieldInfo constructFieldInfo(String fieldName, DocValuesType dvType, int newFiel
           isSoftDeletesField);
     }
 
+    synchronized void setDocValuesType(int number, String name, DocValuesType dvType) {
+      verifyConsistent(number, name, dvType);
+      docValuesType.put(name, dvType);
+    }
+
+    synchronized void verifyConsistent(Integer number, String name, DocValuesType dvType) {
+      if (name.equals(numberToName.get(number)) == false) {
+        throw new IllegalArgumentException(
+            "field number "
+                + number
+                + " is already mapped to field name \""
+                + numberToName.get(number)
+                + "\", not \""
+                + name
+                + "\"");
+      }
+      if (number.equals(nameToNumber.get(name)) == false) {
+        throw new IllegalArgumentException(
+            "field name \""
+                + name
+                + "\" is already mapped to field number \""
+                + nameToNumber.get(name)
+                + "\", not \""
+                + number
+                + "\"");
+      }
+      DocValuesType currentDVType = docValuesType.get(name);
+      if (dvType != DocValuesType.NONE
+          && currentDVType != null
+          && currentDVType != DocValuesType.NONE
+          && dvType != currentDVType) {
+        throw new IllegalArgumentException(
+            "cannot change DocValues type from "
+                + currentDVType
+                + " to "
+                + dvType
+                + " for field \""
+                + name
+                + "\"");
+      }
+    }
+

Review Comment:
   These I brought back from the 8.x code



##########
lucene/core/src/java/org/apache/lucene/index/FieldInfo.java:
##########
@@ -400,6 +400,150 @@ static void verifySameVectorOptions(
     }
   }
 
+  /*
+  This method will create a new instance of FieldInfo if any attribute changes (and it changes in a compatible way).
+  It is intended only to be used in indices where schema validation is not strict (legacy indices). It will return null
+  if no changes are done on this FieldInfo
+   */
+  FieldInfo handleLegacySupportedUpdates(FieldInfo otherFi) {

Review Comment:
   The base of this method is coming from 8.x's [FieldInfo#update](https://github.com/apache/lucene-solr/blob/branch_8_11/lucene/core/src/java/org/apache/lucene/index/FieldInfo.java#L144)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org