You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "AlexanderSaydakov (via GitHub)" <gi...@apache.org> on 2023/10/25 17:17:01 UTC

[PR] use datasketches-java 4.2.0 (druid)

AlexanderSaydakov opened a new pull request, #15257:
URL: https://github.com/apache/druid/pull/15257

   This is to use the latest datasketches-java version 4.2.0.
   This was supposed to be a minor version change, but inadvertently some API changes were introduced. Therefore I had to implement a few new required methods in the custom ArrayOfStringTuplesSerDe. They will need a careful review since I am not entirely sure I understood the serial format correctly.
   Also one test is currently failing. I don’t understand the purpose of this test. It is called preservesMinAndMaxWhenAssumeGroupedFalse. I have no idea what does this mean. However it asks a quantile sketch to partition 66 items into 66 partitions and expects exactly one item in each. If we allow even slightest error (and sketches are approximate) we can get some partitions with 2 items and some empty ones. So with deduplication it leads to fewer partitions.
   This change in behavior from 4.1.0 to 4.2.0 is unfortunate, but not incorrect. This is a degenerate use case. I would think that a better test could generate, say, 1000 items, ask for 10 partitions and assert that partitions have 100+-2 items or something like that. Perhaps this behavior with very small partitions can be improved in the next version, but for now I would suggest using 4.2.0 and changing this test somehow.
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] use datasketches-java 4.2.0 (druid)

Posted by "AlexanderSaydakov (via GitHub)" <gi...@apache.org>.
AlexanderSaydakov merged PR #15257:
URL: https://github.com/apache/druid/pull/15257


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] use datasketches-java 4.2.0 (druid)

Posted by "github-advanced-security[bot] (via GitHub)" <gi...@apache.org>.
github-advanced-security[bot] commented on code in PR #15257:
URL: https://github.com/apache/druid/pull/15257#discussion_r1372127465


##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/statistics/QuantilesSketchKeyCollectorFactory.java:
##########
@@ -126,22 +127,66 @@
     }
 
     @Override
-    public byte[][] deserializeFromMemory(final Memory mem, final int numItems)
+    public byte[][] deserializeFromMemory(final Memory mem, long offsetBytes, final int numItems)
     {
       final byte[][] keys = new byte[numItems][];
-      long keyPosition = (long) Integer.BYTES * numItems;
+      final long start = offsetBytes;
+      offsetBytes += (long) Integer.BYTES * numItems;
 
       for (int i = 0; i < numItems; i++) {
-        final int keyLength = mem.getInt((long) Integer.BYTES * i);
+        final int keyLength = mem.getInt(start + Integer.BYTES * i);

Review Comment:
   ## Result of multiplication cast to wider type
   
   Potential overflow in [int multiplication](1) before it is converted to long by use in a numeric context.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/5930)



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/statistics/QuantilesSketchKeyCollectorFactory.java:
##########
@@ -126,22 +127,66 @@
     }
 
     @Override
-    public byte[][] deserializeFromMemory(final Memory mem, final int numItems)
+    public byte[][] deserializeFromMemory(final Memory mem, long offsetBytes, final int numItems)
     {
       final byte[][] keys = new byte[numItems][];
-      long keyPosition = (long) Integer.BYTES * numItems;
+      final long start = offsetBytes;
+      offsetBytes += (long) Integer.BYTES * numItems;
 
       for (int i = 0; i < numItems; i++) {
-        final int keyLength = mem.getInt((long) Integer.BYTES * i);
+        final int keyLength = mem.getInt(start + Integer.BYTES * i);
         final byte[] keyBytes = new byte[keyLength];
 
-        mem.getByteArray(keyPosition, keyBytes, 0, keyLength);
+        mem.getByteArray(offsetBytes, keyBytes, 0, keyLength);
         keys[i] = keyBytes;
 
-        keyPosition += keyLength;
+        offsetBytes += keyLength;
       }
 
       return keys;
     }
+
+    @Override
+    public byte[] serializeToByteArray(final byte[] item)
+    {
+      final byte[] bytes = new byte[Integer.BYTES + item.length];
+      ByteArrayUtil.putIntLE(bytes, 0, item.length);
+      ByteArrayUtil.copyBytes(item, 0, bytes, Integer.BYTES, item.length);
+      return bytes;
+    }
+
+    @Override
+    public byte[][] deserializeFromMemory(final Memory mem, final int numItems)
+    {
+      return deserializeFromMemory(mem, 0, numItems);
+    }
+
+    @Override
+    public int sizeOf(final byte[] item)
+    {
+      return Integer.BYTES + item.length;
+    }
+
+    @Override
+    public int sizeOf(final Memory mem, long offsetBytes, final int numItems)
+    {
+      int length = Integer.BYTES * numItems;
+      for (int i = 0; i < numItems; i++) {
+        length = mem.getInt(offsetBytes + Integer.BYTES * i);

Review Comment:
   ## Result of multiplication cast to wider type
   
   Potential overflow in [int multiplication](1) before it is converted to long by use in a numeric context.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/5931)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org