You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/03/17 18:23:10 UTC

[GitHub] [druid] gianm commented on a change in pull request #11004: vectorize 'auto' long decoding

gianm commented on a change in pull request #11004:
URL: https://github.com/apache/druid/pull/11004#discussion_r596269223



##########
File path: processing/src/main/java/org/apache/druid/segment/data/VSizeLongSerde.java
##########
@@ -330,7 +329,7 @@ public void write(long value) throws IOException
         curByte = (byte) value;
         first = false;
       } else {
-        curByte = (byte) ((curByte << 4) | ((value >> (numBytes << 3)) & 0xF));
+        curByte = (byte) ((curByte << 4) | ((value >>> (numBytes << 3)) & 0xF));

Review comment:
       Was this a bug fix? If so: it's on the write side; does that mean there might be bad segments out there, or is there some reason that this line wouldn't have affected any already-written data that people might have? (Maybe negative numbers were never fed to this method.)

##########
File path: processing/src/test/java/org/apache/druid/segment/data/VSizeLongSerdeTest.java
##########
@@ -20,132 +20,352 @@
 package org.apache.druid.segment.data;
 
 
+import com.google.common.primitives.Ints;
+import org.apache.druid.java.util.common.StringUtils;
 import org.junit.Assert;
-import org.junit.Before;
 import org.junit.Test;
+import org.junit.experimental.runners.Enclosed;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
 
 import java.io.ByteArrayOutputStream;
 import java.io.IOException;
 import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.stream.Collectors;
 
+@RunWith(Enclosed.class)
 public class VSizeLongSerdeTest

Review comment:
       The change in Mult4Ser suggests that we care about handling negative numbers, but this test class doesn't exercise negative numbers very much. (I think it only tests Long.MIN_VALUE, in testEveryPowerOfTwo.)
   
   If negative numbers matter, we should extend the test cases in this file to cover them better. I'd suggest adding tests to EveryLittleBitTest that are similar to testEveryPowerOfTwo and testEveryPowerOfTwoMinusOne, but have the sign bit set (i.e. bitwise or with `Long.MIN_VALUE`).
   
   If negative numbers aren't important, I'd suggest blocking them on the write side, i.e. have all the LongSerializers throw errors if they are fed negative numbers.

##########
File path: processing/src/main/java/org/apache/druid/segment/data/VSizeLongSerde.java
##########
@@ -413,9 +412,81 @@ public void close() throws IOException
     }
   }
 
+  /**
+   * Unpack bitpacked long values from an underlying contiguous memory block
+   */
   public interface LongDeserializer
   {
+    /**
+     * Unpack long value at the specified row index
+     */
     long get(int index);
+
+    /**
+     * Unpack a contiguous vector of long values at the specified start index of length and adjust them by the supplied
+     * delta base value.
+     */
+    default void getDelta(long[] out, int outPosition, int startIndex, int length, long base)

Review comment:
       Are the default implementations ever used? If not, we could remove them.

##########
File path: processing/src/main/java/org/apache/druid/segment/data/VSizeLongSerde.java
##########
@@ -413,9 +412,81 @@ public void close() throws IOException
     }
   }
 
+  /**
+   * Unpack bitpacked long values from an underlying contiguous memory block
+   */
   public interface LongDeserializer
   {
+    /**
+     * Unpack long value at the specified row index
+     */
     long get(int index);
+
+    /**
+     * Unpack a contiguous vector of long values at the specified start index of length and adjust them by the supplied
+     * delta base value.
+     */
+    default void getDelta(long[] out, int outPosition, int startIndex, int length, long base)
+    {
+      for (int i = 0; i < length; i++) {
+        out[outPosition + i] = base + get(startIndex + i);
+      }
+    }
+
+    /**
+     * Unpack a non-contiguous vector of long values at the specified indexes and adjust them by the supplied delta base
+     * value.
+     */
+    default int getDelta(long[] out, int outPosition, int[] indexes, int length, int indexOffset, int limit, long base)

Review comment:
       Do you have evidence that the `getDelta` and `getTable` methods are helpful? (vs. the alternative: first calling a regular bulk `get` method, then applying the delta or table adjustment in a loop over the returned arrays)
   
   They complexify the code quite a bit, so we should only include them if they are meaningfully better performance-wise.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org