You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/01/29 22:04:55 UTC

[GitHub] [lucene-solr] jtibshirani opened a new pull request #2274: LUCENE-9705: Create Lucene90LiveDocsFormat

jtibshirani opened a new pull request #2274:
URL: https://github.com/apache/lucene-solr/pull/2274


   For now this is just a copy of Lucene50LiveDocsFormat. The existing
   Lucene50LiveDocsFormat was moved to backwards-codecs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] jtibshirani merged pull request #2274: LUCENE-9705: Create Lucene90LiveDocsFormat

Posted by GitBox <gi...@apache.org>.
jtibshirani merged pull request #2274:
URL: https://github.com/apache/lucene-solr/pull/2274


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] jpountz commented on a change in pull request #2274: LUCENE-9705: Create Lucene90LiveDocsFormat

Posted by GitBox <gi...@apache.org>.
jpountz commented on a change in pull request #2274:
URL: https://github.com/apache/lucene-solr/pull/2274#discussion_r570062574



##########
File path: lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene50/Lucene50LiveDocsFormat.java
##########
@@ -0,0 +1,170 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.backward_codecs.lucene50;
+
+import java.io.IOException;
+import java.util.Collection;
+import org.apache.lucene.codecs.CodecUtil;
+import org.apache.lucene.codecs.LiveDocsFormat;
+import org.apache.lucene.index.CorruptIndexException;
+import org.apache.lucene.index.IndexFileNames;
+import org.apache.lucene.index.SegmentCommitInfo;
+import org.apache.lucene.store.ChecksumIndexInput;
+import org.apache.lucene.store.DataOutput;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.store.IndexOutput;
+import org.apache.lucene.util.Bits;
+import org.apache.lucene.util.FixedBitSet;
+
+/**
+ * Lucene 5.0 live docs format
+ *
+ * <p>The .liv file is optional, and only exists when a segment contains deletions.
+ *
+ * <p>Although per-segment, this file is maintained exterior to compound segment files.
+ *
+ * <p>Deletions (.liv) --&gt; IndexHeader,Generation,Bits
+ *
+ * <ul>
+ *   <li>SegmentHeader --&gt; {@link CodecUtil#writeIndexHeader IndexHeader}
+ *   <li>Bits --&gt; &lt;{@link DataOutput#writeLong Int64}&gt; <sup>LongCount</sup>
+ * </ul>
+ */
+public final class Lucene50LiveDocsFormat extends LiveDocsFormat {
+
+  /** extension of live docs */
+  private static final String EXTENSION = "liv";
+
+  /** codec of live docs */
+  private static final String CODEC_NAME = "Lucene50LiveDocs";
+
+  /** supported version range */
+  private static final int VERSION_START = 0;
+
+  private static final int VERSION_CURRENT = VERSION_START;
+
+  /** Sole constructor. */
+  public Lucene50LiveDocsFormat() {}
+
+  @Override
+  public Bits readLiveDocs(Directory dir, SegmentCommitInfo info, IOContext context)
+      throws IOException {
+    long gen = info.getDelGen();
+    String name = IndexFileNames.fileNameFromGeneration(info.info.name, EXTENSION, gen);
+    final int length = info.info.maxDoc();
+    try (ChecksumIndexInput input = dir.openChecksumInput(name, context)) {
+      Throwable priorE = null;
+      try {
+        CodecUtil.checkIndexHeader(
+            input,
+            CODEC_NAME,
+            VERSION_START,
+            VERSION_CURRENT,
+            info.info.getId(),
+            Long.toString(gen, Character.MAX_RADIX));
+
+        FixedBitSet fbs = readFixedBitSet(input, length);
+
+        if (fbs.length() - fbs.cardinality() != info.getDelCount()) {
+          throw new CorruptIndexException(
+              "bits.deleted="
+                  + (fbs.length() - fbs.cardinality())
+                  + " info.delcount="
+                  + info.getDelCount(),
+              input);
+        }
+        return fbs.asReadOnlyBits();
+      } catch (Throwable exception) {
+        priorE = exception;
+      } finally {
+        CodecUtil.checkFooter(input, priorE);
+      }
+    }
+    throw new AssertionError();
+  }
+
+  private FixedBitSet readFixedBitSet(IndexInput input, int length) throws IOException {
+    long data[] = new long[FixedBitSet.bits2words(length)];
+    for (int i = 0; i < data.length; i++) {
+      data[i] = input.readLong();
+    }
+    return new FixedBitSet(data, length);
+  }
+
+  /**
+   * Note: although this format is only used on older versions, we need to keep the write logic in
+   * addition to the read logic. When we delete documents that live in an older segment, we write to
+   * the live docs for that segment.
+   */

Review comment:
       Thanks for documenting this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] jpountz commented on a change in pull request #2274: LUCENE-9705: Create Lucene90LiveDocsFormat

Posted by GitBox <gi...@apache.org>.
jpountz commented on a change in pull request #2274:
URL: https://github.com/apache/lucene-solr/pull/2274#discussion_r570062574



##########
File path: lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene50/Lucene50LiveDocsFormat.java
##########
@@ -0,0 +1,170 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.backward_codecs.lucene50;
+
+import java.io.IOException;
+import java.util.Collection;
+import org.apache.lucene.codecs.CodecUtil;
+import org.apache.lucene.codecs.LiveDocsFormat;
+import org.apache.lucene.index.CorruptIndexException;
+import org.apache.lucene.index.IndexFileNames;
+import org.apache.lucene.index.SegmentCommitInfo;
+import org.apache.lucene.store.ChecksumIndexInput;
+import org.apache.lucene.store.DataOutput;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.store.IndexOutput;
+import org.apache.lucene.util.Bits;
+import org.apache.lucene.util.FixedBitSet;
+
+/**
+ * Lucene 5.0 live docs format
+ *
+ * <p>The .liv file is optional, and only exists when a segment contains deletions.
+ *
+ * <p>Although per-segment, this file is maintained exterior to compound segment files.
+ *
+ * <p>Deletions (.liv) --&gt; IndexHeader,Generation,Bits
+ *
+ * <ul>
+ *   <li>SegmentHeader --&gt; {@link CodecUtil#writeIndexHeader IndexHeader}
+ *   <li>Bits --&gt; &lt;{@link DataOutput#writeLong Int64}&gt; <sup>LongCount</sup>
+ * </ul>
+ */
+public final class Lucene50LiveDocsFormat extends LiveDocsFormat {
+
+  /** extension of live docs */
+  private static final String EXTENSION = "liv";
+
+  /** codec of live docs */
+  private static final String CODEC_NAME = "Lucene50LiveDocs";
+
+  /** supported version range */
+  private static final int VERSION_START = 0;
+
+  private static final int VERSION_CURRENT = VERSION_START;
+
+  /** Sole constructor. */
+  public Lucene50LiveDocsFormat() {}
+
+  @Override
+  public Bits readLiveDocs(Directory dir, SegmentCommitInfo info, IOContext context)
+      throws IOException {
+    long gen = info.getDelGen();
+    String name = IndexFileNames.fileNameFromGeneration(info.info.name, EXTENSION, gen);
+    final int length = info.info.maxDoc();
+    try (ChecksumIndexInput input = dir.openChecksumInput(name, context)) {
+      Throwable priorE = null;
+      try {
+        CodecUtil.checkIndexHeader(
+            input,
+            CODEC_NAME,
+            VERSION_START,
+            VERSION_CURRENT,
+            info.info.getId(),
+            Long.toString(gen, Character.MAX_RADIX));
+
+        FixedBitSet fbs = readFixedBitSet(input, length);
+
+        if (fbs.length() - fbs.cardinality() != info.getDelCount()) {
+          throw new CorruptIndexException(
+              "bits.deleted="
+                  + (fbs.length() - fbs.cardinality())
+                  + " info.delcount="
+                  + info.getDelCount(),
+              input);
+        }
+        return fbs.asReadOnlyBits();
+      } catch (Throwable exception) {
+        priorE = exception;
+      } finally {
+        CodecUtil.checkFooter(input, priorE);
+      }
+    }
+    throw new AssertionError();
+  }
+
+  private FixedBitSet readFixedBitSet(IndexInput input, int length) throws IOException {
+    long data[] = new long[FixedBitSet.bits2words(length)];
+    for (int i = 0; i < data.length; i++) {
+      data[i] = input.readLong();
+    }
+    return new FixedBitSet(data, length);
+  }
+
+  /**
+   * Note: although this format is only used on older versions, we need to keep the write logic in
+   * addition to the read logic. When we delete documents that live in an older segment, we write to
+   * the live docs for that segment.
+   */

Review comment:
       Thanks for documenting this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] jtibshirani merged pull request #2274: LUCENE-9705: Create Lucene90LiveDocsFormat

Posted by GitBox <gi...@apache.org>.
jtibshirani merged pull request #2274:
URL: https://github.com/apache/lucene-solr/pull/2274


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org