You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/11/24 04:13:33 UTC

[GitHub] [druid] cheddar commented on a change in pull request #11973: Handle null values in Range Partition dimension distribution

cheddar commented on a change in pull request #11973:
URL: https://github.com/apache/druid/pull/11973#discussion_r755686465



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/distribution/ArrayOfStringsNullSafeSerde.java
##########
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.indexing.common.task.batch.parallel.distribution;
+
+import org.apache.datasketches.ArrayOfItemsSerDe;
+import org.apache.datasketches.ArrayOfStringsSerDe;
+import org.apache.datasketches.Util;
+import org.apache.datasketches.memory.Memory;
+import org.apache.datasketches.memory.WritableMemory;
+import org.apache.druid.data.input.StringTuple;
+
+import java.nio.charset.StandardCharsets;
+
+/**
+ * Serde for {@link StringTuple}.
+ * <p>
+ * The implementation is the same as {@link ArrayOfStringsSerDe}, except this
+ * class handles null String values as well.
+ */
+public class ArrayOfStringsNullSafeSerde extends ArrayOfItemsSerDe<String>
+{
+
+  @Override
+  public byte[] serializeToByteArray(final String[] items)
+  {
+    int length = 0;
+    final byte[][] itemsBytes = new byte[items.length][];
+    for (int i = 0; i < items.length; i++) {
+      // If the String is null, make the byte array also null

Review comment:
       Nit on the code, but instead of ternary operators, I think it would be easier to read with
   
   ```
   // One int for the string length (or -1 for null)
   length += Integer.BYTES;
   if (items[i] != null) {
     itemsBytes[i] = items[i].getBytes(...)
     length += itemsBytes[i].length
   }
   ```

##########
File path: indexing-service/src/test/java/org/apache/druid/indexing/common/task/batch/parallel/distribution/ArrayOfStringsNullSafeSerdeTest.java
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.indexing.common.task.batch.parallel.distribution;
+
+import org.apache.datasketches.memory.Memory;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class ArrayOfStringsNullSafeSerdeTest
+{
+
+  private final ArrayOfStringsNullSafeSerde serde = new ArrayOfStringsNullSafeSerde();
+
+  @Test
+  public void testStringArray()
+  {
+    testSerde("abc", "def", "xyz");
+    testSerde("abc", "123", "456.0");
+  }
+
+  @Test
+  public void testSingletonArray()
+  {
+    testSerde("abc");
+    testSerde("xyz");
+  }
+
+  @Test
+  public void testEmptyArray()
+  {
+    testSerde();
+  }
+
+  @Test
+  public void testArrayWithNullString()
+  {
+    testSerde((String) null);
+    testSerde("abc", null, "def");
+    testSerde(null, null, null);
+  }
+
+  @Test
+  public void testArrayWithEmptyString()

Review comment:
       Maybe add a test that has both `null` and empty string in it to show that both can co-exist and will be considered as different things.

##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/distribution/ArrayOfStringsNullSafeSerde.java
##########
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.indexing.common.task.batch.parallel.distribution;
+
+import org.apache.datasketches.ArrayOfItemsSerDe;
+import org.apache.datasketches.ArrayOfStringsSerDe;
+import org.apache.datasketches.Util;
+import org.apache.datasketches.memory.Memory;
+import org.apache.datasketches.memory.WritableMemory;
+import org.apache.druid.data.input.StringTuple;
+
+import java.nio.charset.StandardCharsets;
+
+/**
+ * Serde for {@link StringTuple}.
+ * <p>
+ * The implementation is the same as {@link ArrayOfStringsSerDe}, except this
+ * class handles null String values as well.
+ */
+public class ArrayOfStringsNullSafeSerde extends ArrayOfItemsSerDe<String>
+{
+
+  @Override
+  public byte[] serializeToByteArray(final String[] items)
+  {
+    int length = 0;
+    final byte[][] itemsBytes = new byte[items.length][];
+    for (int i = 0; i < items.length; i++) {
+      // If the String is null, make the byte array also null
+      itemsBytes[i] = items[i] == null ? null : items[i].getBytes(StandardCharsets.UTF_8);
+      length += (itemsBytes[i] == null ? 0 : itemsBytes[i].length) + Integer.BYTES;
+    }
+    final byte[] bytes = new byte[length];
+    final WritableMemory mem = WritableMemory.writableWrap(bytes);
+    long offsetBytes = 0;
+    for (int i = 0; i < items.length; i++) {
+      if (itemsBytes[i] != null) {
+        // Write the length of the array and the array itself
+        mem.putInt(offsetBytes, itemsBytes[i].length);
+        offsetBytes += Integer.BYTES;
+        mem.putByteArray(offsetBytes, itemsBytes[i], 0, itemsBytes[i].length);
+        offsetBytes += itemsBytes[i].length;
+      } else {
+        // If the byte array is null, write the length as -1
+        mem.putInt(offsetBytes, -1);
+        offsetBytes += Integer.BYTES;
+      }
+    }
+    return bytes;
+  }
+
+  @Override
+  public String[] deserializeFromMemory(final Memory mem, final int numItems)
+  {
+    final String[] array = new String[numItems];
+    long offsetBytes = 0;
+    for (int i = 0; i < numItems; i++) {
+      // Read the length of the byte array
+      Util.checkBounds(offsetBytes, Integer.BYTES, mem.getCapacity());
+      final int arrayLength = mem.getInt(offsetBytes);
+      offsetBytes += Integer.BYTES;
+
+      // Negative strLength represents a null byte array and a null String
+      if (arrayLength >= 0) {

Review comment:
       This is maybe aesthetic, but `-1` has special meaning, `-2` does not.  But this code acts as if all negative values have special meaning.  I think it is better to check for `!= -1`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org