You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/04/09 17:19:18 UTC
[GitHub] [pinot] richardstartin opened a new pull request, #8499: DataTable deserialization improvements
richardstartin opened a new pull request, #8499:
URL: https://github.com/apache/pinot/pull/8499
Calling `Enum.values()` allocates an array every time it is called, and this method gets called a lot during `DataTable` deserialization, which shows up in allocation profiles.
Also deserializes table directly from `ByteBuffer` to avoid allocating lots of `byte[]` during deserialization. This isn't done when the `DataTable` retains a copy of the `ByteBuffer` because of ambiguity of ownership, though it looks like it could be done safely.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on pull request #8499: DataTable deserialization improvements
Posted by GitBox <gi...@apache.org>.
richardstartin commented on PR #8499:
URL: https://github.com/apache/pinot/pull/8499#issuecomment-1096331971
Actually these changes are covered by existing tests, nothing could be added without redundancy.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] siddharthteotia commented on a diff in pull request #8499: DataTable deserialization improvements
Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on code in PR #8499:
URL: https://github.com/apache/pinot/pull/8499#discussion_r847771725
##########
pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java:
##########
@@ -350,32 +340,29 @@ private byte[] serializeMetadata()
* This is to make V3 implementation keep the consumers of Map<String, String> getMetadata() API in the code happy
* by internally converting it.
*/
- private Map<String, String> deserializeMetadata(byte[] bytes)
+ private Map<String, String> deserializeMetadata(ByteBuffer buffer)
Review Comment:
Can we add a comment / note stating that the caller of this method assumes that ByteBuffer position is moved accordingly by reads done in this method and so we should use relative get/read methods ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter commented on pull request #8499: DataTable deserialization improvements
Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #8499:
URL: https://github.com/apache/pinot/pull/8499#issuecomment-1094096626
# [Codecov](https://codecov.io/gh/apache/pinot/pull/8499?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#8499](https://codecov.io/gh/apache/pinot/pull/8499?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d108c20) into [master](https://codecov.io/gh/apache/pinot/commit/9314e600808c378663de99a2cdce4d5542ea99df?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (9314e60) will **decrease** coverage by `15.13%`.
> The diff coverage is `0.00%`.
```diff
@@ Coverage Diff @@
## master #8499 +/- ##
=============================================
- Coverage 29.25% 14.12% -15.14%
- Complexity 0 84 +84
=============================================
Files 1660 1627 -33
Lines 87126 85566 -1560
Branches 13203 13039 -164
=============================================
- Hits 25492 12086 -13406
- Misses 59316 72574 +13258
+ Partials 2318 906 -1412
```
| Flag | Coverage Δ | |
|---|---|---|
| integration1 | `?` | |
| integration2 | `?` | |
| unittests2 | `14.12% <0.00%> (?)` | |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8499?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...java/org/apache/pinot/common/utils/DataSchema.java](https://codecov.io/gh/apache/pinot/pull/8499/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVNjaGVtYS5qYXZh) | `0.00% <0.00%> (-48.11%)` | :arrow_down: |
| [.../java/org/apache/pinot/common/utils/DataTable.java](https://codecov.io/gh/apache/pinot/pull/8499/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVRhYmxlLmphdmE=) | `0.00% <0.00%> (-89.48%)` | :arrow_down: |
| [...che/pinot/core/common/datatable/BaseDataTable.java](https://codecov.io/gh/apache/pinot/pull/8499/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YXRhYmxlL0Jhc2VEYXRhVGFibGUuamF2YQ==) | `0.00% <0.00%> (-81.68%)` | :arrow_down: |
| [...e/pinot/core/common/datatable/DataTableImplV2.java](https://codecov.io/gh/apache/pinot/pull/8499/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YXRhYmxlL0RhdGFUYWJsZUltcGxWMi5qYXZh) | `0.00% <0.00%> (ø)` | |
| [...e/pinot/core/common/datatable/DataTableImplV3.java](https://codecov.io/gh/apache/pinot/pull/8499/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YXRhYmxlL0RhdGFUYWJsZUltcGxWMy5qYXZh) | `0.00% <0.00%> (-91.31%)` | :arrow_down: |
| [...he/pinot/core/common/datatable/DataTableUtils.java](https://codecov.io/gh/apache/pinot/pull/8499/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vZGF0YXRhYmxlL0RhdGFUYWJsZVV0aWxzLmphdmE=) | `0.00% <0.00%> (-70.09%)` | :arrow_down: |
| [.../java/org/apache/pinot/core/data/table/Record.java](https://codecov.io/gh/apache/pinot/pull/8499/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL3RhYmxlL1JlY29yZC5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [.../java/org/apache/pinot/core/util/GroupByUtils.java](https://codecov.io/gh/apache/pinot/pull/8499/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS91dGlsL0dyb3VwQnlVdGlscy5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...va/org/apache/pinot/core/routing/RoutingTable.java](https://codecov.io/gh/apache/pinot/pull/8499/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9yb3V0aW5nL1JvdXRpbmdUYWJsZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...a/org/apache/pinot/core/operator/BaseOperator.java](https://codecov.io/gh/apache/pinot/pull/8499/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9CYXNlT3BlcmF0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| ... and [809 more](https://codecov.io/gh/apache/pinot/pull/8499/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8499?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8499?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [9314e60...d108c20](https://codecov.io/gh/apache/pinot/pull/8499?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] siddharthteotia commented on a diff in pull request #8499: DataTable deserialization improvements
Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on code in PR #8499:
URL: https://github.com/apache/pinot/pull/8499#discussion_r847771725
##########
pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java:
##########
@@ -350,32 +340,29 @@ private byte[] serializeMetadata()
* This is to make V3 implementation keep the consumers of Map<String, String> getMetadata() API in the code happy
* by internally converting it.
*/
- private Map<String, String> deserializeMetadata(byte[] bytes)
+ private Map<String, String> deserializeMetadata(ByteBuffer buffer)
Review Comment:
Can we add a comment / note stating that the caller of this method assumes that ByteBuffer position is moved accordingly by reads done in this method and so we should use relative get/read calls on ByteBuffer ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a diff in pull request #8499: DataTable deserialization improvements
Posted by GitBox <gi...@apache.org>.
richardstartin commented on code in PR #8499:
URL: https://github.com/apache/pinot/pull/8499#discussion_r848057124
##########
pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java:
##########
@@ -350,32 +340,29 @@ private byte[] serializeMetadata()
* This is to make V3 implementation keep the consumers of Map<String, String> getMetadata() API in the code happy
* by internally converting it.
*/
- private Map<String, String> deserializeMetadata(byte[] bytes)
+ private Map<String, String> deserializeMetadata(ByteBuffer buffer)
throws IOException {
- try (ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
- DataInputStream dataInputStream = new DataInputStream(byteArrayInputStream)) {
- int numEntries = dataInputStream.readInt();
- Map<String, String> metadata = new HashMap<>();
- for (int i = 0; i < numEntries; i++) {
- int keyId = dataInputStream.readInt();
- MetadataKey key = MetadataKey.getByOrdinal(keyId);
- // Ignore unknown keys.
- if (key == null) {
- continue;
- }
- if (key.getValueType() == MetadataValueType.INT) {
- String value = String.valueOf(DataTableUtils.decodeInt(dataInputStream));
- metadata.put(key.getName(), value);
- } else if (key.getValueType() == MetadataValueType.LONG) {
- String value = String.valueOf(DataTableUtils.decodeLong(dataInputStream));
- metadata.put(key.getName(), value);
- } else {
- String value = String.valueOf(DataTableUtils.decodeString(dataInputStream));
- metadata.put(key.getName(), value);
- }
+ int numEntries = buffer.getInt();
+ Map<String, String> metadata = new HashMap<>();
+ for (int i = 0; i < numEntries; i++) {
+ int keyId = buffer.getInt();
+ MetadataKey key = MetadataKey.getByOrdinal(keyId);
+ // Ignore unknown keys.
+ if (key == null) {
+ continue;
+ }
+ if (key.getValueType() == MetadataValueType.INT) {
+ String value = "" + buffer.getInt();
Review Comment:
it's both faster and shorter
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] siddharthteotia commented on a diff in pull request #8499: DataTable deserialization improvements
Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on code in PR #8499:
URL: https://github.com/apache/pinot/pull/8499#discussion_r847772108
##########
pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java:
##########
@@ -350,32 +340,29 @@ private byte[] serializeMetadata()
* This is to make V3 implementation keep the consumers of Map<String, String> getMetadata() API in the code happy
* by internally converting it.
*/
- private Map<String, String> deserializeMetadata(byte[] bytes)
+ private Map<String, String> deserializeMetadata(ByteBuffer buffer)
throws IOException {
- try (ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
- DataInputStream dataInputStream = new DataInputStream(byteArrayInputStream)) {
- int numEntries = dataInputStream.readInt();
- Map<String, String> metadata = new HashMap<>();
- for (int i = 0; i < numEntries; i++) {
- int keyId = dataInputStream.readInt();
- MetadataKey key = MetadataKey.getByOrdinal(keyId);
- // Ignore unknown keys.
- if (key == null) {
- continue;
- }
- if (key.getValueType() == MetadataValueType.INT) {
- String value = String.valueOf(DataTableUtils.decodeInt(dataInputStream));
- metadata.put(key.getName(), value);
- } else if (key.getValueType() == MetadataValueType.LONG) {
- String value = String.valueOf(DataTableUtils.decodeLong(dataInputStream));
- metadata.put(key.getName(), value);
- } else {
- String value = String.valueOf(DataTableUtils.decodeString(dataInputStream));
- metadata.put(key.getName(), value);
- }
+ int numEntries = buffer.getInt();
+ Map<String, String> metadata = new HashMap<>();
+ for (int i = 0; i < numEntries; i++) {
+ int keyId = buffer.getInt();
+ MetadataKey key = MetadataKey.getByOrdinal(keyId);
+ // Ignore unknown keys.
+ if (key == null) {
+ continue;
+ }
+ if (key.getValueType() == MetadataValueType.INT) {
+ String value = "" + buffer.getInt();
Review Comment:
Curious if this is better / faster compared to `String.valueOf()` ?
##########
pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplV3.java:
##########
@@ -350,32 +340,29 @@ private byte[] serializeMetadata()
* This is to make V3 implementation keep the consumers of Map<String, String> getMetadata() API in the code happy
* by internally converting it.
*/
- private Map<String, String> deserializeMetadata(byte[] bytes)
+ private Map<String, String> deserializeMetadata(ByteBuffer buffer)
throws IOException {
- try (ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
- DataInputStream dataInputStream = new DataInputStream(byteArrayInputStream)) {
- int numEntries = dataInputStream.readInt();
- Map<String, String> metadata = new HashMap<>();
- for (int i = 0; i < numEntries; i++) {
- int keyId = dataInputStream.readInt();
- MetadataKey key = MetadataKey.getByOrdinal(keyId);
- // Ignore unknown keys.
- if (key == null) {
- continue;
- }
- if (key.getValueType() == MetadataValueType.INT) {
- String value = String.valueOf(DataTableUtils.decodeInt(dataInputStream));
- metadata.put(key.getName(), value);
- } else if (key.getValueType() == MetadataValueType.LONG) {
- String value = String.valueOf(DataTableUtils.decodeLong(dataInputStream));
- metadata.put(key.getName(), value);
- } else {
- String value = String.valueOf(DataTableUtils.decodeString(dataInputStream));
- metadata.put(key.getName(), value);
- }
+ int numEntries = buffer.getInt();
+ Map<String, String> metadata = new HashMap<>();
+ for (int i = 0; i < numEntries; i++) {
+ int keyId = buffer.getInt();
+ MetadataKey key = MetadataKey.getByOrdinal(keyId);
+ // Ignore unknown keys.
+ if (key == null) {
+ continue;
+ }
+ if (key.getValueType() == MetadataValueType.INT) {
+ String value = "" + buffer.getInt();
Review Comment:
Curious if this is better / faster compared to `String.valueOf()` ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on pull request #8499: DataTable deserialization improvements
Posted by GitBox <gi...@apache.org>.
richardstartin commented on PR #8499:
URL: https://github.com/apache/pinot/pull/8499#issuecomment-1096219633
> I feel we should change / enhance the `DataTableSerDeTest` (testDataTableMetadataBytesLayout) unit test to test the ByteBuffer based deserialization or may be SerDe code may probably have enough coverage already via the inter-segment query execution tests ?
OK I will add tests
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin merged pull request #8499: DataTable deserialization improvements
Posted by GitBox <gi...@apache.org>.
richardstartin merged PR #8499:
URL: https://github.com/apache/pinot/pull/8499
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org