You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/04/09 17:19:18 UTC

[GitHub] [pinot] richardstartin opened a new pull request, #8499: DataTable deserialization improvements

richardstartin opened a new pull request, #8499:
URL: https://github.com/apache/pinot/pull/8499

   Calling `Enum.values()` allocates an array every time it is called, and this method gets called a lot during `DataTable` deserialization, which shows up in allocation profiles.
   
   Also deserializes table directly from `ByteBuffer` to avoid allocating lots of `byte[]` during deserialization. This isn't done when the `DataTable` retains a copy of the `ByteBuffer` because of ambiguity of ownership, though it looks like it could be done safely.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on pull request #8499: DataTable deserialization improvements

Posted by GitBox <gi...@apache.org>.
richardstartin commented on PR #8499:
URL: https://github.com/apache/pinot/pull/8499#issuecomment-1096331971

   Actually these changes are covered by existing tests, nothing could be added without redundancy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia commented on a diff in pull request #8499: DataTable deserialization improvements

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on code in PR #8499:
URL: https://github.com/apache/pinot/pull/8499#discussion_r847771725


##########
pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java:
##########
@@ -350,32 +340,29 @@ private byte[] serializeMetadata()
    * This is to make V3 implementation keep the consumers of Map<String, String> getMetadata() API in the code happy
    * by internally converting it.
    */
-  private Map<String, String> deserializeMetadata(byte[] bytes)
+  private Map<String, String> deserializeMetadata(ByteBuffer buffer)

Review Comment:
   Can we add a comment / note stating that the caller of this method assumes that ByteBuffer position is moved accordingly by reads done in this method and so we should use relative get/read methods ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #8499: DataTable deserialization improvements

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #8499:
URL: https://github.com/apache/pinot/pull/8499#issuecomment-1094096626

   # [Codecov](https://codecov.io/gh/apache/pinot/pull/8499?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#8499](https://codecov.io/gh/apache/pinot/pull/8499?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d108c20) into [master](https://codecov.io/gh/apache/pinot/commit/9314e600808c378663de99a2cdce4d5542ea99df?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (9314e60) will **decrease** coverage by `15.13%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #8499       +/-   ##
   =============================================
   - Coverage     29.25%   14.12%   -15.14%     
   - Complexity        0       84       +84     
   =============================================
     Files          1660     1627       -33     
     Lines         87126    85566     -1560     
     Branches      13203    13039      -164     
   =============================================
   - Hits          25492    12086    -13406     
   - Misses        59316    72574    +13258     
   + Partials       2318      906     -1412     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests2 | `14.12% <0.00%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8499?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/8499/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `0.00% <0.00%> (-48.11%)` | :arrow_down: |
   | [.../java/org/apache/pinot/common/utils/DataTable.java](https://codecov.io/gh/apache/pinot/pull/8499/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVRhYmxlLmphdmE=) | `0.00% <0.00%> (-89.48%)` | :arrow_down: |
   | [...che/pinot/core/common/datatable/BaseDataTable.java](https://codecov.io/gh/apache/pinot/pull/8499/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YXRhYmxlL0Jhc2VEYXRhVGFibGUuamF2YQ==) | `0.00% <0.00%> (-81.68%)` | :arrow_down: |
   | [...e/pinot/core/common/datatable/DataTableImplV2.java](https://codecov.io/gh/apache/pinot/pull/8499/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YXRhYmxlL0RhdGFUYWJsZUltcGxWMi5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...e/pinot/core/common/datatable/DataTableImplV3.java](https://codecov.io/gh/apache/pinot/pull/8499/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YXRhYmxlL0RhdGFUYWJsZUltcGxWMy5qYXZh) | `0.00% <0.00%> (-91.31%)` | :arrow_down: |
   | [...he/pinot/core/common/datatable/DataTableUtils.java](https://codecov.io/gh/apache/pinot/pull/8499/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YXRhYmxlL0RhdGFUYWJsZVV0aWxzLmphdmE=) | `0.00% <0.00%> (-70.09%)` | :arrow_down: |
   | [.../java/org/apache/pinot/core/data/table/Record.java](https://codecov.io/gh/apache/pinot/pull/8499/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL3RhYmxlL1JlY29yZC5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../java/org/apache/pinot/core/util/GroupByUtils.java](https://codecov.io/gh/apache/pinot/pull/8499/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS91dGlsL0dyb3VwQnlVdGlscy5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...va/org/apache/pinot/core/routing/RoutingTable.java](https://codecov.io/gh/apache/pinot/pull/8499/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9yb3V0aW5nL1JvdXRpbmdUYWJsZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...a/org/apache/pinot/core/operator/BaseOperator.java](https://codecov.io/gh/apache/pinot/pull/8499/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9CYXNlT3BlcmF0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [809 more](https://codecov.io/gh/apache/pinot/pull/8499/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8499?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8499?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [9314e60...d108c20](https://codecov.io/gh/apache/pinot/pull/8499?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia commented on a diff in pull request #8499: DataTable deserialization improvements

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on code in PR #8499:
URL: https://github.com/apache/pinot/pull/8499#discussion_r847771725


##########
pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java:
##########
@@ -350,32 +340,29 @@ private byte[] serializeMetadata()
    * This is to make V3 implementation keep the consumers of Map<String, String> getMetadata() API in the code happy
    * by internally converting it.
    */
-  private Map<String, String> deserializeMetadata(byte[] bytes)
+  private Map<String, String> deserializeMetadata(ByteBuffer buffer)

Review Comment:
   Can we add a comment / note stating that the caller of this method assumes that ByteBuffer position is moved accordingly by reads done in this method and so we should use relative get/read calls on ByteBuffer ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a diff in pull request #8499: DataTable deserialization improvements

Posted by GitBox <gi...@apache.org>.
richardstartin commented on code in PR #8499:
URL: https://github.com/apache/pinot/pull/8499#discussion_r848057124


##########
pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java:
##########
@@ -350,32 +340,29 @@ private byte[] serializeMetadata()
    * This is to make V3 implementation keep the consumers of Map<String, String> getMetadata() API in the code happy
    * by internally converting it.
    */
-  private Map<String, String> deserializeMetadata(byte[] bytes)
+  private Map<String, String> deserializeMetadata(ByteBuffer buffer)
       throws IOException {
-    try (ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
-        DataInputStream dataInputStream = new DataInputStream(byteArrayInputStream)) {
-      int numEntries = dataInputStream.readInt();
-      Map<String, String> metadata = new HashMap<>();
-      for (int i = 0; i < numEntries; i++) {
-        int keyId = dataInputStream.readInt();
-        MetadataKey key = MetadataKey.getByOrdinal(keyId);
-        // Ignore unknown keys.
-        if (key == null) {
-          continue;
-        }
-        if (key.getValueType() == MetadataValueType.INT) {
-          String value = String.valueOf(DataTableUtils.decodeInt(dataInputStream));
-          metadata.put(key.getName(), value);
-        } else if (key.getValueType() == MetadataValueType.LONG) {
-          String value = String.valueOf(DataTableUtils.decodeLong(dataInputStream));
-          metadata.put(key.getName(), value);
-        } else {
-          String value = String.valueOf(DataTableUtils.decodeString(dataInputStream));
-          metadata.put(key.getName(), value);
-        }
+    int numEntries = buffer.getInt();
+    Map<String, String> metadata = new HashMap<>();
+    for (int i = 0; i < numEntries; i++) {
+      int keyId = buffer.getInt();
+      MetadataKey key = MetadataKey.getByOrdinal(keyId);
+      // Ignore unknown keys.
+      if (key == null) {
+        continue;
+      }
+      if (key.getValueType() == MetadataValueType.INT) {
+        String value = "" + buffer.getInt();

Review Comment:
   it's both faster and shorter



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia commented on a diff in pull request #8499: DataTable deserialization improvements

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on code in PR #8499:
URL: https://github.com/apache/pinot/pull/8499#discussion_r847772108


##########
pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java:
##########
@@ -350,32 +340,29 @@ private byte[] serializeMetadata()
    * This is to make V3 implementation keep the consumers of Map<String, String> getMetadata() API in the code happy
    * by internally converting it.
    */
-  private Map<String, String> deserializeMetadata(byte[] bytes)
+  private Map<String, String> deserializeMetadata(ByteBuffer buffer)
       throws IOException {
-    try (ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
-        DataInputStream dataInputStream = new DataInputStream(byteArrayInputStream)) {
-      int numEntries = dataInputStream.readInt();
-      Map<String, String> metadata = new HashMap<>();
-      for (int i = 0; i < numEntries; i++) {
-        int keyId = dataInputStream.readInt();
-        MetadataKey key = MetadataKey.getByOrdinal(keyId);
-        // Ignore unknown keys.
-        if (key == null) {
-          continue;
-        }
-        if (key.getValueType() == MetadataValueType.INT) {
-          String value = String.valueOf(DataTableUtils.decodeInt(dataInputStream));
-          metadata.put(key.getName(), value);
-        } else if (key.getValueType() == MetadataValueType.LONG) {
-          String value = String.valueOf(DataTableUtils.decodeLong(dataInputStream));
-          metadata.put(key.getName(), value);
-        } else {
-          String value = String.valueOf(DataTableUtils.decodeString(dataInputStream));
-          metadata.put(key.getName(), value);
-        }
+    int numEntries = buffer.getInt();
+    Map<String, String> metadata = new HashMap<>();
+    for (int i = 0; i < numEntries; i++) {
+      int keyId = buffer.getInt();
+      MetadataKey key = MetadataKey.getByOrdinal(keyId);
+      // Ignore unknown keys.
+      if (key == null) {
+        continue;
+      }
+      if (key.getValueType() == MetadataValueType.INT) {
+        String value = "" + buffer.getInt();

Review Comment:
   Curious if this is better / faster compared to `String.valueOf()` ?



##########
pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java:
##########
@@ -350,32 +340,29 @@ private byte[] serializeMetadata()
    * This is to make V3 implementation keep the consumers of Map<String, String> getMetadata() API in the code happy
    * by internally converting it.
    */
-  private Map<String, String> deserializeMetadata(byte[] bytes)
+  private Map<String, String> deserializeMetadata(ByteBuffer buffer)
       throws IOException {
-    try (ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
-        DataInputStream dataInputStream = new DataInputStream(byteArrayInputStream)) {
-      int numEntries = dataInputStream.readInt();
-      Map<String, String> metadata = new HashMap<>();
-      for (int i = 0; i < numEntries; i++) {
-        int keyId = dataInputStream.readInt();
-        MetadataKey key = MetadataKey.getByOrdinal(keyId);
-        // Ignore unknown keys.
-        if (key == null) {
-          continue;
-        }
-        if (key.getValueType() == MetadataValueType.INT) {
-          String value = String.valueOf(DataTableUtils.decodeInt(dataInputStream));
-          metadata.put(key.getName(), value);
-        } else if (key.getValueType() == MetadataValueType.LONG) {
-          String value = String.valueOf(DataTableUtils.decodeLong(dataInputStream));
-          metadata.put(key.getName(), value);
-        } else {
-          String value = String.valueOf(DataTableUtils.decodeString(dataInputStream));
-          metadata.put(key.getName(), value);
-        }
+    int numEntries = buffer.getInt();
+    Map<String, String> metadata = new HashMap<>();
+    for (int i = 0; i < numEntries; i++) {
+      int keyId = buffer.getInt();
+      MetadataKey key = MetadataKey.getByOrdinal(keyId);
+      // Ignore unknown keys.
+      if (key == null) {
+        continue;
+      }
+      if (key.getValueType() == MetadataValueType.INT) {
+        String value = "" + buffer.getInt();

Review Comment:
   Curious if this is better / faster compared to `String.valueOf()` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on pull request #8499: DataTable deserialization improvements

Posted by GitBox <gi...@apache.org>.
richardstartin commented on PR #8499:
URL: https://github.com/apache/pinot/pull/8499#issuecomment-1096219633

   > I feel we should change / enhance the `DataTableSerDeTest` (testDataTableMetadataBytesLayout) unit test to test the ByteBuffer based deserialization or may be SerDe code may probably have enough coverage already via the inter-segment query execution tests ?
   
   OK I will add tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin merged pull request #8499: DataTable deserialization improvements

Posted by GitBox <gi...@apache.org>.
richardstartin merged PR #8499:
URL: https://github.com/apache/pinot/pull/8499


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org