You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/10/06 19:45:39 UTC

[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #9538: Improve primary key serialization performance

Jackie-Jiang commented on code in PR #9538:
URL: https://github.com/apache/pinot/pull/9538#discussion_r989416345


##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/PrimaryKey.java:
##########
@@ -37,7 +40,56 @@ public Object[] getValues() {
   }
 
   public byte[] asBytes() {
-    return SerializationUtils.serialize(_values);
+    int arraySize = 0;
+    byte[][] cache = new byte[_values.length][];
+    for (int i = 0; i < _values.length; i++) {
+      Object value = _values[i];
+      if (value instanceof Integer) {
+        arraySize += Integer.BYTES;
+      } else if (value instanceof Double) {
+        arraySize += Double.BYTES;
+      } else if (value instanceof Float) {
+        arraySize += Float.BYTES;
+      } else if (value instanceof Long) {
+        arraySize += Long.BYTES;
+      } else if (value instanceof String) {
+        cache[i] = ((String) value).getBytes(StandardCharsets.UTF_8);
+        arraySize += cache[i].length + Integer.BYTES;
+      } else if (value instanceof BigDecimal) {
+        cache[i] = BigDecimalUtils.serialize((BigDecimal) value);
+        arraySize += cache[i].length + Integer.BYTES;
+      } else if (value instanceof byte[]) {
+        arraySize += ((byte[]) value).length + Integer.BYTES;
+      } else {
+        throw new IllegalStateException("Data type not supported for serializing Primary Key");
+      }
+    }
+
+    ByteBuffer byteBuffer = ByteBuffer.allocate(arraySize);
+    for (int i = 0; i < _values.length; i++) {
+      Object value = _values[i];
+      if (value instanceof Integer) {
+        byteBuffer.putInt((Integer) value);
+      } else if (value instanceof Double) {
+        byteBuffer.putDouble((Double) value);
+      } else if (value instanceof Float) {
+        byteBuffer.putFloat((Float) value);
+      } else if (value instanceof Long) {
+        byteBuffer.putLong((Long) value);
+      } else if (value instanceof String) {
+        byteBuffer.putInt(cache[i].length);
+        byteBuffer.put(cache[i]);
+      } else if (value instanceof BigDecimal) {
+        byteBuffer.putInt(cache[i].length);
+        byteBuffer.put(cache[i]);
+      } else if (value instanceof byte[]) {
+        byteBuffer.putInt(cache[i].length);
+        byteBuffer.put(cache[i]);
+      } else {
+        throw new IllegalStateException("Data type not supported for serializing Primary Key");
+      }

Review Comment:
   This part can be simplified
   ```suggestion
         } else {
           byteBuffer.putInt(cache[i].length);
           byteBuffer.put(cache[i]);
         }
   ```



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/PrimaryKey.java:
##########
@@ -37,7 +40,56 @@ public Object[] getValues() {
   }
 
   public byte[] asBytes() {
-    return SerializationUtils.serialize(_values);
+    int arraySize = 0;

Review Comment:
   (optional) `sizeInBytes`



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/PrimaryKey.java:
##########
@@ -37,7 +40,56 @@ public Object[] getValues() {
   }
 
   public byte[] asBytes() {
-    return SerializationUtils.serialize(_values);
+    int arraySize = 0;
+    byte[][] cache = new byte[_values.length][];
+    for (int i = 0; i < _values.length; i++) {
+      Object value = _values[i];
+      if (value instanceof Integer) {
+        arraySize += Integer.BYTES;
+      } else if (value instanceof Double) {
+        arraySize += Double.BYTES;
+      } else if (value instanceof Float) {
+        arraySize += Float.BYTES;
+      } else if (value instanceof Long) {
+        arraySize += Long.BYTES;
+      } else if (value instanceof String) {
+        cache[i] = ((String) value).getBytes(StandardCharsets.UTF_8);
+        arraySize += cache[i].length + Integer.BYTES;
+      } else if (value instanceof BigDecimal) {
+        cache[i] = BigDecimalUtils.serialize((BigDecimal) value);
+        arraySize += cache[i].length + Integer.BYTES;
+      } else if (value instanceof byte[]) {

Review Comment:
   (MAJOR) For `BYTES` type, value will be `ByteArray`



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/PrimaryKey.java:
##########
@@ -37,7 +40,56 @@ public Object[] getValues() {
   }
 
   public byte[] asBytes() {
-    return SerializationUtils.serialize(_values);
+    int arraySize = 0;
+    byte[][] cache = new byte[_values.length][];
+    for (int i = 0; i < _values.length; i++) {
+      Object value = _values[i];
+      if (value instanceof Integer) {

Review Comment:
   (minor) Consider reordering based on the how common it is used as primary key: INT, LONG, STRING, BYTES, FLOAT, DOUBLE, BIG_DECIMAL



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/PrimaryKey.java:
##########
@@ -37,7 +40,56 @@ public Object[] getValues() {
   }
 
   public byte[] asBytes() {
-    return SerializationUtils.serialize(_values);
+    int arraySize = 0;
+    byte[][] cache = new byte[_values.length][];
+    for (int i = 0; i < _values.length; i++) {
+      Object value = _values[i];
+      if (value instanceof Integer) {
+        arraySize += Integer.BYTES;
+      } else if (value instanceof Double) {
+        arraySize += Double.BYTES;
+      } else if (value instanceof Float) {
+        arraySize += Float.BYTES;
+      } else if (value instanceof Long) {
+        arraySize += Long.BYTES;
+      } else if (value instanceof String) {
+        cache[i] = ((String) value).getBytes(StandardCharsets.UTF_8);
+        arraySize += cache[i].length + Integer.BYTES;
+      } else if (value instanceof BigDecimal) {
+        cache[i] = BigDecimalUtils.serialize((BigDecimal) value);
+        arraySize += cache[i].length + Integer.BYTES;
+      } else if (value instanceof byte[]) {
+        arraySize += ((byte[]) value).length + Integer.BYTES;
+      } else {
+        throw new IllegalStateException("Data type not supported for serializing Primary Key");

Review Comment:
   Put the value class in the exception message (also handle `null` properly)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org