You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/11/07 21:01:57 UTC

[GitHub] [pinot] priyen opened a new pull request, #9749: Deserialize Hyperloglog objects more optimally (#171)

priyen opened a new pull request, #9749:
URL: https://github.com/apache/pinot/pull/9749

   This PR 
   `performance`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #9749: Deserialize Hyperloglog objects more optimally

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #9749:
URL: https://github.com/apache/pinot/pull/9749#discussion_r1016628818


##########
pinot-core/src/main/java/org/apache/pinot/core/common/ObjectSerDeUtils.java:
##########
@@ -474,21 +477,31 @@ public byte[] serialize(HyperLogLog hyperLogLog) {
     @Override
     public HyperLogLog deserialize(byte[] bytes) {
       try {
-        return HyperLogLog.Builder.build(bytes);
-      } catch (IOException e) {
-        throw new RuntimeException("Caught exception while de-serializing HyperLogLog", e);
+        IntBuffer intBuf =
+                ByteBuffer.wrap(bytes)

Review Comment:
   Since we need to wrap the bytes, let's implement `deserialize(ByteBuffer byteBuffer)` to avoid the unnecessary bytes copying



##########
pinot-core/src/main/java/org/apache/pinot/core/common/ObjectSerDeUtils.java:
##########
@@ -474,21 +477,31 @@ public byte[] serialize(HyperLogLog hyperLogLog) {
     @Override
     public HyperLogLog deserialize(byte[] bytes) {
       try {
-        return HyperLogLog.Builder.build(bytes);
-      } catch (IOException e) {
-        throw new RuntimeException("Caught exception while de-serializing HyperLogLog", e);
+        IntBuffer intBuf =
+                ByteBuffer.wrap(bytes)
+                        .order(ByteOrder.BIG_ENDIAN)
+                        .asIntBuffer();
+        // The first 2 bytes are constant headers for the HLL. We only need the first byte, the log2m.

Review Comment:
   (minor) The first 2 integers are ... need the first integer...



##########
pinot-core/src/main/java/org/apache/pinot/core/common/ObjectSerDeUtils.java:
##########
@@ -474,21 +477,31 @@ public byte[] serialize(HyperLogLog hyperLogLog) {
     @Override
     public HyperLogLog deserialize(byte[] bytes) {
       try {
-        return HyperLogLog.Builder.build(bytes);
-      } catch (IOException e) {
-        throw new RuntimeException("Caught exception while de-serializing HyperLogLog", e);
+        IntBuffer intBuf =
+                ByteBuffer.wrap(bytes)
+                        .order(ByteOrder.BIG_ENDIAN)
+                        .asIntBuffer();
+        // The first 2 bytes are constant headers for the HLL. We only need the first byte, the log2m.
+        // We skip the second byte, array size, as we compare the buffer size to the remaining bytes size.
+        int log2m = intBuf.get(0);
+        intBuf.position(2);
+
+        if (intBuf.remaining() != RegisterSet.getSizeForCount(1 << log2m)) {

Review Comment:
   Per pure performance concern, this check can be skipped



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] priyen commented on pull request #9749: Deserialize Hyperloglog objects more optimally

Posted by GitBox <gi...@apache.org>.
priyen commented on PR #9749:
URL: https://github.com/apache/pinot/pull/9749#issuecomment-1307669531

   @Jackie-Jiang , please take a second look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #9749: Deserialize Hyperloglog objects more optimally

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #9749:
URL: https://github.com/apache/pinot/pull/9749#issuecomment-1317297966

   # [Codecov](https://codecov.io/gh/apache/pinot/pull/9749?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#9749](https://codecov.io/gh/apache/pinot/pull/9749?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c28a695) into [master](https://codecov.io/gh/apache/pinot/commit/c8c6b253ae8f0a8e9e3293b99bd37d946f94a19d?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c8c6b25) will **decrease** coverage by `0.26%`.
   > The diff coverage is `100.00%`.
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #9749      +/-   ##
   ============================================
   - Coverage     68.57%   68.31%   -0.27%     
   - Complexity     4979     5378     +399     
   ============================================
     Files          1952     1964      +12     
     Lines        104809   106278    +1469     
     Branches      15866    16274     +408     
   ============================================
   + Hits          71876    72602     +726     
   - Misses        27856    28598     +742     
   - Partials       5077     5078       +1     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration2 | `24.39% <46.15%> (-0.12%)` | :arrow_down: |
   | unittests1 | `67.63% <100.00%> (+0.08%)` | :arrow_up: |
   | unittests2 | `16.03% <0.00%> (+0.38%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/9749?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...org/apache/pinot/core/common/ObjectSerDeUtils.java](https://codecov.io/gh/apache/pinot/pull/9749/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vT2JqZWN0U2VyRGVVdGlscy5qYXZh) | `91.58% <100.00%> (+1.66%)` | :arrow_up: |
   | [...ot/common/restlet/resources/TableMetadataInfo.java](https://codecov.io/gh/apache/pinot/pull/9749/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vcmVzdGxldC9yZXNvdXJjZXMvVGFibGVNZXRhZGF0YUluZm8uamF2YQ==) | `61.53% <0.00%> (-38.47%)` | :arrow_down: |
   | [...pache/pinot/core/transport/AsyncQueryResponse.java](https://codecov.io/gh/apache/pinot/pull/9749/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS90cmFuc3BvcnQvQXN5bmNRdWVyeVJlc3BvbnNlLmphdmE=) | `52.32% <0.00%> (-37.68%)` | :arrow_down: |
   | [...ocal/segment/index/loader/ForwardIndexHandler.java](https://codecov.io/gh/apache/pinot/pull/9749/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9Gb3J3YXJkSW5kZXhIYW5kbGVyLmphdmE=) | `51.56% <0.00%> (-33.31%)` | :arrow_down: |
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/9749/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `54.77% <0.00%> (-28.72%)` | :arrow_down: |
   | [.../pinot/core/common/datablock/DataBlockBuilder.java](https://codecov.io/gh/apache/pinot/pull/9749/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YWJsb2NrL0RhdGFCbG9ja0J1aWxkZXIuamF2YQ==) | `55.39% <0.00%> (-17.59%)` | :arrow_down: |
   | [...ata/manager/realtime/RealtimeTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/9749/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvUmVhbHRpbWVUYWJsZURhdGFNYW5hZ2VyLmphdmE=) | `52.22% <0.00%> (-15.20%)` | :arrow_down: |
   | [...a/org/apache/pinot/core/transport/QueryRouter.java](https://codecov.io/gh/apache/pinot/pull/9749/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS90cmFuc3BvcnQvUXVlcnlSb3V0ZXIuamF2YQ==) | `66.36% <0.00%> (-14.75%)` | :arrow_down: |
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/9749/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `62.28% <0.00%> (-13.84%)` | :arrow_down: |
   | [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/9749/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `44.80% <0.00%> (-13.43%)` | :arrow_down: |
   | ... and [85 more](https://codecov.io/gh/apache/pinot/pull/9749/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang merged pull request #9749: Deserialize Hyperloglog objects more optimally

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang merged PR #9749:
URL: https://github.com/apache/pinot/pull/9749


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] priyen commented on a diff in pull request #9749: Deserialize Hyperloglog objects more optimally

Posted by GitBox <gi...@apache.org>.
priyen commented on code in PR #9749:
URL: https://github.com/apache/pinot/pull/9749#discussion_r1016993263


##########
pinot-core/src/main/java/org/apache/pinot/core/common/ObjectSerDeUtils.java:
##########
@@ -474,21 +477,31 @@ public byte[] serialize(HyperLogLog hyperLogLog) {
     @Override
     public HyperLogLog deserialize(byte[] bytes) {
       try {
-        return HyperLogLog.Builder.build(bytes);
-      } catch (IOException e) {
-        throw new RuntimeException("Caught exception while de-serializing HyperLogLog", e);
+        IntBuffer intBuf =
+                ByteBuffer.wrap(bytes)
+                        .order(ByteOrder.BIG_ENDIAN)
+                        .asIntBuffer();
+        // The first 2 bytes are constant headers for the HLL. We only need the first byte, the log2m.
+        // We skip the second byte, array size, as we compare the buffer size to the remaining bytes size.
+        int log2m = intBuf.get(0);
+        intBuf.position(2);
+
+        if (intBuf.remaining() != RegisterSet.getSizeForCount(1 << log2m)) {

Review Comment:
   I think we should keep it here, as otherwise we can't guarantee the deserialization is right. For eg, the unit test I added, if you pass a bytes arr that is smaller then expected, it doesnt throw an exception unless this is there. This can also help us catch issue if the underlying HLL library changes that could break this, we'd know



##########
pinot-core/src/main/java/org/apache/pinot/core/common/ObjectSerDeUtils.java:
##########
@@ -474,21 +477,31 @@ public byte[] serialize(HyperLogLog hyperLogLog) {
     @Override
     public HyperLogLog deserialize(byte[] bytes) {
       try {
-        return HyperLogLog.Builder.build(bytes);
-      } catch (IOException e) {
-        throw new RuntimeException("Caught exception while de-serializing HyperLogLog", e);
+        IntBuffer intBuf =
+                ByteBuffer.wrap(bytes)
+                        .order(ByteOrder.BIG_ENDIAN)
+                        .asIntBuffer();
+        // The first 2 bytes are constant headers for the HLL. We only need the first byte, the log2m.

Review Comment:
   Fixed



##########
pinot-core/src/main/java/org/apache/pinot/core/common/ObjectSerDeUtils.java:
##########
@@ -474,21 +477,31 @@ public byte[] serialize(HyperLogLog hyperLogLog) {
     @Override
     public HyperLogLog deserialize(byte[] bytes) {
       try {
-        return HyperLogLog.Builder.build(bytes);
-      } catch (IOException e) {
-        throw new RuntimeException("Caught exception while de-serializing HyperLogLog", e);
+        IntBuffer intBuf =
+                ByteBuffer.wrap(bytes)

Review Comment:
   Good idea, fixed this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] priyen commented on pull request #9749: Deserialize Hyperloglog objects more optimally

Posted by GitBox <gi...@apache.org>.
priyen commented on PR #9749:
URL: https://github.com/apache/pinot/pull/9749#issuecomment-1322558520

   Thanks for pushing the changes in the latest commit, looks good to me


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org