You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/04/06 09:26:58 UTC

[GitHub] [lucene] iverase opened a new pull request #64: LUCENE-9907: Remove packedInts dependency on StoredFieldsFormat

iverase opened a new pull request #64:
URL: https://github.com/apache/lucene/pull/64


   This PR removes the dependency on PackedInts to the StoredFieldsFormat in favour of directReader / DirectWriter. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a change in pull request #64: LUCENE-9907: Remove packedInts dependency on StoredFieldsFormat

Posted by GitBox <gi...@apache.org>.
jpountz commented on a change in pull request #64:
URL: https://github.com/apache/lucene/pull/64#discussion_r607865192



##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsReader.java
##########
@@ -374,8 +371,8 @@ private SerializedDocument(DataInput in, int length, int numStoredFields) {
     // whether the block has been sliced, this happens for large documents
     private boolean sliced;
 
-    private long[] offsets = LongsRef.EMPTY_LONGS;
-    private long[] numStoredFields = LongsRef.EMPTY_LONGS;
+    private int[] offsets = IntsRef.EMPTY_INTS; // EMPTY_LONGS;
+    private int[] numStoredFields = IntsRef.EMPTY_INTS; // LongsRef.EMPTY_LONGS;

Review comment:
       remove commented out code?

##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/StoredFieldsInts.java
##########
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.lucene90.compressing;
+
+import java.io.IOException;
+import java.util.Arrays;
+import org.apache.lucene.store.DataOutput;
+import org.apache.lucene.store.IndexInput;
+
+class StoredFieldsInts {
+
+  private StoredFieldsInts() {}
+
+  static void writeInts(int[] values, int start, int count, DataOutput out) throws IOException {
+    boolean sorted = true;

Review comment:
       why do we need it? It looks like we are dealing with delta encoding on top of this class already?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] iverase merged pull request #64: LUCENE-9907: Remove packedInts dependency on StoredFieldsFormat

Posted by GitBox <gi...@apache.org>.
iverase merged pull request #64:
URL: https://github.com/apache/lucene/pull/64


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] iverase commented on a change in pull request #64: LUCENE-9907: Remove packedInts dependency on StoredFieldsFormat

Posted by GitBox <gi...@apache.org>.
iverase commented on a change in pull request #64:
URL: https://github.com/apache/lucene/pull/64#discussion_r608418565



##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsReader.java
##########
@@ -373,6 +373,7 @@ private SerializedDocument(DataInput in, int length, int numStoredFields) {
 
     private int[] offsets = IntsRef.EMPTY_INTS;
     private int[] numStoredFields = IntsRef.EMPTY_INTS;
+    private StoredFieldsInts intsReader = new StoredFieldsInts();

Review comment:
       I went back to make the method static as now we do not need any helper array




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a change in pull request #64: LUCENE-9907: Remove packedInts dependency on StoredFieldsFormat

Posted by GitBox <gi...@apache.org>.
jpountz commented on a change in pull request #64:
URL: https://github.com/apache/lucene/pull/64#discussion_r608462230



##########
File path: lucene/core/src/test/org/apache/lucene/codecs/lucene90/compressing/TestStoredFieldsInt.java
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.lucene90.compressing;
+
+import java.util.Arrays;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.store.IndexOutput;
+import org.apache.lucene.util.LuceneTestCase;
+import org.apache.lucene.util.TestUtil;
+
+public class TestStoredFieldsInt extends LuceneTestCase {
+
+  public void testRandom() throws Exception {
+    int numIters = atLeast(100);
+    try (Directory dir = newDirectory()) {
+      for (int iter = 0; iter < numIters; ++iter) {
+        int[] values = new int[random().nextInt(5000) + 1];
+        final int bpv = TestUtil.nextInt(random(), 1, 32);

Review comment:
       With bpv == 32, `TestUtil.nextInt(random(), 0, (1 << bpv) - 1)` always returns 0.
   
   ```suggestion
           final int bpv = TestUtil.nextInt(random(), 1, 31);
   ```

##########
File path: lucene/core/src/test/org/apache/lucene/codecs/lucene90/compressing/TestStoredFieldsInt.java
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.lucene90.compressing;
+
+import java.util.Arrays;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.store.IndexOutput;
+import org.apache.lucene.util.LuceneTestCase;
+import org.apache.lucene.util.TestUtil;
+
+public class TestStoredFieldsInt extends LuceneTestCase {
+
+  public void testRandom() throws Exception {
+    int numIters = atLeast(100);
+    try (Directory dir = newDirectory()) {
+      for (int iter = 0; iter < numIters; ++iter) {
+        int[] values = new int[random().nextInt(5000) + 1];
+        final int bpv = TestUtil.nextInt(random(), 1, 32);
+        for (int i = 0; i < values.length; ++i) {
+          values[i] = TestUtil.nextInt(random(), 0, (1 << bpv) - 1);
+        }
+        test(dir, values);
+      }
+    }
+  }
+
+  public void testAllEquals() throws Exception {
+    try (Directory dir = newDirectory()) {
+      int[] docIDs = new int[random().nextInt(5000) + 1];
+      final int bpv = TestUtil.nextInt(random(), 1, 32);

Review comment:
       ```suggestion
         final int bpv = TestUtil.nextInt(random(), 1, 31);
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a change in pull request #64: LUCENE-9907: Remove packedInts dependency on StoredFieldsFormat

Posted by GitBox <gi...@apache.org>.
jpountz commented on a change in pull request #64:
URL: https://github.com/apache/lucene/pull/64#discussion_r608399324



##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsReader.java
##########
@@ -373,6 +373,7 @@ private SerializedDocument(DataInput in, int length, int numStoredFields) {
 
     private int[] offsets = IntsRef.EMPTY_INTS;
     private int[] numStoredFields = IntsRef.EMPTY_INTS;
+    private StoredFieldsInts intsReader = new StoredFieldsInts();

Review comment:
       let's make it final?

##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/StoredFieldsInts.java
##########
@@ -73,17 +136,59 @@ static void readInts(IndexInput in, int count, int[] values, int offset) throws
     }
   }
 
-  private static void readInts8(IndexInput in, int count, int[] values, int offset)
-      throws IOException {
-    for (int i = 0; i < count; i++) {
-      values[offset + i] = Byte.toUnsignedInt(in.readByte());
+  private void readInts8(IndexInput in, int count, int[] values, int offset) throws IOException {

Review comment:
       Unfortunately, for the JVM to auto-vectorize this, I think we'd need `values` to be a `long[]`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] iverase commented on a change in pull request #64: LUCENE-9907: Remove packedInts dependency on StoredFieldsFormat

Posted by GitBox <gi...@apache.org>.
iverase commented on a change in pull request #64:
URL: https://github.com/apache/lucene/pull/64#discussion_r608409180



##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/StoredFieldsInts.java
##########
@@ -73,17 +136,59 @@ static void readInts(IndexInput in, int count, int[] values, int offset) throws
     }
   }
 
-  private static void readInts8(IndexInput in, int count, int[] values, int offset)
-      throws IOException {
-    for (int i = 0; i < count; i++) {
-      values[offset + i] = Byte.toUnsignedInt(in.readByte());
+  private void readInts8(IndexInput in, int count, int[] values, int offset) throws IOException {

Review comment:
       yes, I was thinking that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org