You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "mgranderath (via GitHub)" <gi...@apache.org> on 2023/11/10 13:15:08 UTC
[PR] Adding byte functions for UUIDs [pinot]
mgranderath opened a new pull request, #11988:
URL: https://github.com/apache/pinot/pull/11988
We use UUIDs as identifiers in our data that we ingest into Pinot and we noticed that these take up quite a lot of space because they can't easily be compressed in their String representation. Converting them to bytes, however, results in about 30% storage savings.
This adds two new scalar functions for dealing with UUIDs:
- `toUuidBytes`: turns a String representation of a UUID to bytes
- `fromUuidBytes`: turns a byte representation of a UUID back to a String
Thanks for the help of @kishoreg for investigating this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] Adding byte functions for UUIDs [pinot]
Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #11988:
URL: https://github.com/apache/pinot/pull/11988#issuecomment-1805765917
## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/11988?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
> Merging [#11988](https://app.codecov.io/gh/apache/pinot/pull/11988?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (baf160f) into [master](https://app.codecov.io/gh/apache/pinot/commit/2beb9a4938d7d7c9481fd2546075e2a2475fe0ec?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (2beb9a4) will **decrease** coverage by `14.87%`.
> The diff coverage is `0.00%`.
```diff
@@ Coverage Diff @@
## master #11988 +/- ##
=============================================
- Coverage 61.61% 46.74% -14.87%
- Complexity 207 927 +720
=============================================
Files 2385 1787 -598
Lines 129214 93645 -35569
Branches 20003 15145 -4858
=============================================
- Hits 79613 43779 -35834
- Misses 43801 46752 +2951
+ Partials 5800 3114 -2686
```
| [Flag](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
|---|---|---|
| [custom-integration1](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
| [integration](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
| [integration1](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
| [integration2](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
| [java-11](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
| [java-21](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.74% <0.00%> (-14.73%)` | :arrow_down: |
| [skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.74% <0.00%> (-14.87%)` | :arrow_down: |
| [skip-bytebuffers-true](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
| [temurin](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.74% <0.00%> (-14.87%)` | :arrow_down: |
| [unittests](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.74% <0.00%> (-14.86%)` | :arrow_down: |
| [unittests1](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.74% <0.00%> (-0.20%)` | :arrow_down: |
| [unittests2](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Files](https://app.codecov.io/gh/apache/pinot/pull/11988?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
|---|---|---|
| [.../pinot/common/function/scalar/StringFunctions.java](https://app.codecov.io/gh/apache/pinot/pull/11988?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vZnVuY3Rpb24vc2NhbGFyL1N0cmluZ0Z1bmN0aW9ucy5qYXZh) | `60.46% <0.00%> (-5.64%)` | :arrow_down: |
... and [975 files with indirect coverage changes](https://app.codecov.io/gh/apache/pinot/pull/11988/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
:mega: Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in [Chrome](https://chrome.google.com/webstore/detail/codecov/gedikamndpbemklijjkncpnolildpbgo) or [Firefox](https://addons.mozilla.org/en-US/firefox/addon/codecov/) today!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] Adding byte functions for UUIDs [pinot]
Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang merged PR #11988:
URL: https://github.com/apache/pinot/pull/11988
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] Adding byte functions for UUIDs [pinot]
Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11988:
URL: https://github.com/apache/pinot/pull/11988#discussion_r1392947331
##########
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java:
##########
@@ -493,6 +495,37 @@ public static byte[] toAscii(String input) {
return input.getBytes(StandardCharsets.US_ASCII);
}
+ /**
+ * @param input UUID as string
+ * @return bytearray
+ * returns bytes and null on exception
+ */
+ @ScalarFunction
+ public static byte[] toUuidBytes(String input) {
+ try {
+ UUID uuid = UUID.fromString(input);
+ ByteBuffer bb = ByteBuffer.wrap(new byte[16]);
+ bb.putLong(uuid.getMostSignificantBits());
+ bb.putLong(uuid.getLeastSignificantBits());
+ return bb.array();
+ } catch (IllegalArgumentException e) {
+ return null;
+ }
+ }
+
+ /**
+ * @param input UUID serialized to bytes
+ * @return String representation of UUID
+ * returns bytes and null on exception
+ */
+ @ScalarFunction
+ public static String fromUuidBytes(byte[] input) {
Review Comment:
```suggestion
public static String fromUUIDBytes(byte[] input) {
```
##########
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java:
##########
@@ -493,6 +495,37 @@ public static byte[] toAscii(String input) {
return input.getBytes(StandardCharsets.US_ASCII);
}
+ /**
+ * @param input UUID as string
+ * @return bytearray
+ * returns bytes and null on exception
+ */
+ @ScalarFunction
+ public static byte[] toUuidBytes(String input) {
Review Comment:
```suggestion
public static byte[] toUUIDBytes(String input) {
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org