You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/11/08 13:38:19 UTC

[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #9749: Deserialize Hyperloglog objects more optimally

Jackie-Jiang commented on code in PR #9749:
URL: https://github.com/apache/pinot/pull/9749#discussion_r1016628818


##########
pinot-core/src/main/java/org/apache/pinot/core/common/ObjectSerDeUtils.java:
##########
@@ -474,21 +477,31 @@ public byte[] serialize(HyperLogLog hyperLogLog) {
     @Override
     public HyperLogLog deserialize(byte[] bytes) {
       try {
-        return HyperLogLog.Builder.build(bytes);
-      } catch (IOException e) {
-        throw new RuntimeException("Caught exception while de-serializing HyperLogLog", e);
+        IntBuffer intBuf =
+                ByteBuffer.wrap(bytes)

Review Comment:
   Since we need to wrap the bytes, let's implement `deserialize(ByteBuffer byteBuffer)` to avoid the unnecessary bytes copying



##########
pinot-core/src/main/java/org/apache/pinot/core/common/ObjectSerDeUtils.java:
##########
@@ -474,21 +477,31 @@ public byte[] serialize(HyperLogLog hyperLogLog) {
     @Override
     public HyperLogLog deserialize(byte[] bytes) {
       try {
-        return HyperLogLog.Builder.build(bytes);
-      } catch (IOException e) {
-        throw new RuntimeException("Caught exception while de-serializing HyperLogLog", e);
+        IntBuffer intBuf =
+                ByteBuffer.wrap(bytes)
+                        .order(ByteOrder.BIG_ENDIAN)
+                        .asIntBuffer();
+        // The first 2 bytes are constant headers for the HLL. We only need the first byte, the log2m.

Review Comment:
   (minor) The first 2 integers are ... need the first integer...



##########
pinot-core/src/main/java/org/apache/pinot/core/common/ObjectSerDeUtils.java:
##########
@@ -474,21 +477,31 @@ public byte[] serialize(HyperLogLog hyperLogLog) {
     @Override
     public HyperLogLog deserialize(byte[] bytes) {
       try {
-        return HyperLogLog.Builder.build(bytes);
-      } catch (IOException e) {
-        throw new RuntimeException("Caught exception while de-serializing HyperLogLog", e);
+        IntBuffer intBuf =
+                ByteBuffer.wrap(bytes)
+                        .order(ByteOrder.BIG_ENDIAN)
+                        .asIntBuffer();
+        // The first 2 bytes are constant headers for the HLL. We only need the first byte, the log2m.
+        // We skip the second byte, array size, as we compare the buffer size to the remaining bytes size.
+        int log2m = intBuf.get(0);
+        intBuf.position(2);
+
+        if (intBuf.remaining() != RegisterSet.getSizeForCount(1 << log2m)) {

Review Comment:
   Per pure performance concern, this check can be skipped



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org