You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/11/07 21:01:57 UTC
[GitHub] [pinot] priyen opened a new pull request, #9749: Deserialize Hyperloglog objects more optimally (#171)
priyen opened a new pull request, #9749:
URL: https://github.com/apache/pinot/pull/9749
This PR
`performance`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #9749: Deserialize Hyperloglog objects more optimally
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #9749:
URL: https://github.com/apache/pinot/pull/9749#discussion_r1016628818
##########
pinot-core/src/main/java/org/apache/pinot/core/common/ObjectSerDeUtils.java:
##########
@@ -474,21 +477,31 @@ public byte[] serialize(HyperLogLog hyperLogLog) {
@Override
public HyperLogLog deserialize(byte[] bytes) {
try {
- return HyperLogLog.Builder.build(bytes);
- } catch (IOException e) {
- throw new RuntimeException("Caught exception while de-serializing HyperLogLog", e);
+ IntBuffer intBuf =
+ ByteBuffer.wrap(bytes)
Review Comment:
Since we need to wrap the bytes, let's implement `deserialize(ByteBuffer byteBuffer)` to avoid the unnecessary bytes copying
##########
pinot-core/src/main/java/org/apache/pinot/core/common/ObjectSerDeUtils.java:
##########
@@ -474,21 +477,31 @@ public byte[] serialize(HyperLogLog hyperLogLog) {
@Override
public HyperLogLog deserialize(byte[] bytes) {
try {
- return HyperLogLog.Builder.build(bytes);
- } catch (IOException e) {
- throw new RuntimeException("Caught exception while de-serializing HyperLogLog", e);
+ IntBuffer intBuf =
+ ByteBuffer.wrap(bytes)
+ .order(ByteOrder.BIG_ENDIAN)
+ .asIntBuffer();
+ // The first 2 bytes are constant headers for the HLL. We only need the first byte, the log2m.
Review Comment:
(minor) The first 2 integers are ... need the first integer...
##########
pinot-core/src/main/java/org/apache/pinot/core/common/ObjectSerDeUtils.java:
##########
@@ -474,21 +477,31 @@ public byte[] serialize(HyperLogLog hyperLogLog) {
@Override
public HyperLogLog deserialize(byte[] bytes) {
try {
- return HyperLogLog.Builder.build(bytes);
- } catch (IOException e) {
- throw new RuntimeException("Caught exception while de-serializing HyperLogLog", e);
+ IntBuffer intBuf =
+ ByteBuffer.wrap(bytes)
+ .order(ByteOrder.BIG_ENDIAN)
+ .asIntBuffer();
+ // The first 2 bytes are constant headers for the HLL. We only need the first byte, the log2m.
+ // We skip the second byte, array size, as we compare the buffer size to the remaining bytes size.
+ int log2m = intBuf.get(0);
+ intBuf.position(2);
+
+ if (intBuf.remaining() != RegisterSet.getSizeForCount(1 << log2m)) {
Review Comment:
Per pure performance concern, this check can be skipped
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] priyen commented on pull request #9749: Deserialize Hyperloglog objects more optimally
Posted by GitBox <gi...@apache.org>.
priyen commented on PR #9749:
URL: https://github.com/apache/pinot/pull/9749#issuecomment-1307669531
@Jackie-Jiang , please take a second look
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter commented on pull request #9749: Deserialize Hyperloglog objects more optimally
Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #9749:
URL: https://github.com/apache/pinot/pull/9749#issuecomment-1317297966
# [Codecov](https://codecov.io/gh/apache/pinot/pull/9749?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#9749](https://codecov.io/gh/apache/pinot/pull/9749?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c28a695) into [master](https://codecov.io/gh/apache/pinot/commit/c8c6b253ae8f0a8e9e3293b99bd37d946f94a19d?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c8c6b25) will **decrease** coverage by `0.26%`.
> The diff coverage is `100.00%`.
```diff
@@ Coverage Diff @@
## master #9749 +/- ##
============================================
- Coverage 68.57% 68.31% -0.27%
- Complexity 4979 5378 +399
============================================
Files 1952 1964 +12
Lines 104809 106278 +1469
Branches 15866 16274 +408
============================================
+ Hits 71876 72602 +726
- Misses 27856 28598 +742
- Partials 5077 5078 +1
```
| Flag | Coverage Δ | |
|---|---|---|
| integration2 | `24.39% <46.15%> (-0.12%)` | :arrow_down: |
| unittests1 | `67.63% <100.00%> (+0.08%)` | :arrow_up: |
| unittests2 | `16.03% <0.00%> (+0.38%)` | :arrow_up: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/9749?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...org/apache/pinot/core/common/ObjectSerDeUtils.java](https://codecov.io/gh/apache/pinot/pull/9749/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vT2JqZWN0U2VyRGVVdGlscy5qYXZh) | `91.58% <100.00%> (+1.66%)` | :arrow_up: |
| [...ot/common/restlet/resources/TableMetadataInfo.java](https://codecov.io/gh/apache/pinot/pull/9749/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vcmVzdGxldC9yZXNvdXJjZXMvVGFibGVNZXRhZGF0YUluZm8uamF2YQ==) | `61.53% <0.00%> (-38.47%)` | :arrow_down: |
| [...pache/pinot/core/transport/AsyncQueryResponse.java](https://codecov.io/gh/apache/pinot/pull/9749/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS90cmFuc3BvcnQvQXN5bmNRdWVyeVJlc3BvbnNlLmphdmE=) | `52.32% <0.00%> (-37.68%)` | :arrow_down: |
| [...ocal/segment/index/loader/ForwardIndexHandler.java](https://codecov.io/gh/apache/pinot/pull/9749/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9Gb3J3YXJkSW5kZXhIYW5kbGVyLmphdmE=) | `51.56% <0.00%> (-33.31%)` | :arrow_down: |
| [...ocal/segment/index/loader/SegmentPreProcessor.java](https://codecov.io/gh/apache/pinot/pull/9749/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `54.77% <0.00%> (-28.72%)` | :arrow_down: |
| [.../pinot/core/common/datablock/DataBlockBuilder.java](https://codecov.io/gh/apache/pinot/pull/9749/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YWJsb2NrL0RhdGFCbG9ja0J1aWxkZXIuamF2YQ==) | `55.39% <0.00%> (-17.59%)` | :arrow_down: |
| [...ata/manager/realtime/RealtimeTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/9749/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvUmVhbHRpbWVUYWJsZURhdGFNYW5hZ2VyLmphdmE=) | `52.22% <0.00%> (-15.20%)` | :arrow_down: |
| [...a/org/apache/pinot/core/transport/QueryRouter.java](https://codecov.io/gh/apache/pinot/pull/9749/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS90cmFuc3BvcnQvUXVlcnlSb3V0ZXIuamF2YQ==) | `66.36% <0.00%> (-14.75%)` | :arrow_down: |
| [.../pinot/core/data/manager/BaseTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/9749/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `62.28% <0.00%> (-13.84%)` | :arrow_down: |
| [...loader/defaultcolumn/BaseDefaultColumnHandler.java](https://codecov.io/gh/apache/pinot/pull/9749/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9kZWZhdWx0Y29sdW1uL0Jhc2VEZWZhdWx0Q29sdW1uSGFuZGxlci5qYXZh) | `44.80% <0.00%> (-13.43%)` | :arrow_down: |
| ... and [85 more](https://codecov.io/gh/apache/pinot/pull/9749/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
:mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] Jackie-Jiang merged pull request #9749: Deserialize Hyperloglog objects more optimally
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang merged PR #9749:
URL: https://github.com/apache/pinot/pull/9749
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] priyen commented on a diff in pull request #9749: Deserialize Hyperloglog objects more optimally
Posted by GitBox <gi...@apache.org>.
priyen commented on code in PR #9749:
URL: https://github.com/apache/pinot/pull/9749#discussion_r1016993263
##########
pinot-core/src/main/java/org/apache/pinot/core/common/ObjectSerDeUtils.java:
##########
@@ -474,21 +477,31 @@ public byte[] serialize(HyperLogLog hyperLogLog) {
@Override
public HyperLogLog deserialize(byte[] bytes) {
try {
- return HyperLogLog.Builder.build(bytes);
- } catch (IOException e) {
- throw new RuntimeException("Caught exception while de-serializing HyperLogLog", e);
+ IntBuffer intBuf =
+ ByteBuffer.wrap(bytes)
+ .order(ByteOrder.BIG_ENDIAN)
+ .asIntBuffer();
+ // The first 2 bytes are constant headers for the HLL. We only need the first byte, the log2m.
+ // We skip the second byte, array size, as we compare the buffer size to the remaining bytes size.
+ int log2m = intBuf.get(0);
+ intBuf.position(2);
+
+ if (intBuf.remaining() != RegisterSet.getSizeForCount(1 << log2m)) {
Review Comment:
I think we should keep it here, as otherwise we can't guarantee the deserialization is right. For eg, the unit test I added, if you pass a bytes arr that is smaller then expected, it doesnt throw an exception unless this is there. This can also help us catch issue if the underlying HLL library changes that could break this, we'd know
##########
pinot-core/src/main/java/org/apache/pinot/core/common/ObjectSerDeUtils.java:
##########
@@ -474,21 +477,31 @@ public byte[] serialize(HyperLogLog hyperLogLog) {
@Override
public HyperLogLog deserialize(byte[] bytes) {
try {
- return HyperLogLog.Builder.build(bytes);
- } catch (IOException e) {
- throw new RuntimeException("Caught exception while de-serializing HyperLogLog", e);
+ IntBuffer intBuf =
+ ByteBuffer.wrap(bytes)
+ .order(ByteOrder.BIG_ENDIAN)
+ .asIntBuffer();
+ // The first 2 bytes are constant headers for the HLL. We only need the first byte, the log2m.
Review Comment:
Fixed
##########
pinot-core/src/main/java/org/apache/pinot/core/common/ObjectSerDeUtils.java:
##########
@@ -474,21 +477,31 @@ public byte[] serialize(HyperLogLog hyperLogLog) {
@Override
public HyperLogLog deserialize(byte[] bytes) {
try {
- return HyperLogLog.Builder.build(bytes);
- } catch (IOException e) {
- throw new RuntimeException("Caught exception while de-serializing HyperLogLog", e);
+ IntBuffer intBuf =
+ ByteBuffer.wrap(bytes)
Review Comment:
Good idea, fixed this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] priyen commented on pull request #9749: Deserialize Hyperloglog objects more optimally
Posted by GitBox <gi...@apache.org>.
priyen commented on PR #9749:
URL: https://github.com/apache/pinot/pull/9749#issuecomment-1322558520
Thanks for pushing the changes in the latest commit, looks good to me
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org