You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "mgranderath (via GitHub)" <gi...@apache.org> on 2023/11/10 13:15:08 UTC

[PR] Adding byte functions for UUIDs [pinot]

mgranderath opened a new pull request, #11988:
URL: https://github.com/apache/pinot/pull/11988

   We use UUIDs as identifiers in our data that we ingest into Pinot and we noticed that these take up quite a lot of space because they can't easily be compressed in their String representation. Converting them to bytes, however, results in about 30% storage savings.
   
   This adds two new scalar functions for dealing with UUIDs:
   - `toUuidBytes`: turns a String representation of a UUID to bytes
   - `fromUuidBytes`: turns a byte representation of a UUID back to a String
   
   Thanks for the help of @kishoreg for investigating this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Adding byte functions for UUIDs [pinot]

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #11988:
URL: https://github.com/apache/pinot/pull/11988#issuecomment-1805765917

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/11988?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#11988](https://app.codecov.io/gh/apache/pinot/pull/11988?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (baf160f) into [master](https://app.codecov.io/gh/apache/pinot/commit/2beb9a4938d7d7c9481fd2546075e2a2475fe0ec?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (2beb9a4) will **decrease** coverage by `14.87%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@              Coverage Diff              @@
   ##             master   #11988       +/-   ##
   =============================================
   - Coverage     61.61%   46.74%   -14.87%     
   - Complexity      207      927      +720     
   =============================================
     Files          2385     1787      -598     
     Lines        129214    93645    -35569     
     Branches      20003    15145     -4858     
   =============================================
   - Hits          79613    43779    -35834     
   - Misses        43801    46752     +2951     
   + Partials       5800     3114     -2686     
   ```
   
   | [Flag](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [custom-integration1](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration1](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration2](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [java-11](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [java-21](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.74% <0.00%> (-14.73%)` | :arrow_down: |
   | [skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.74% <0.00%> (-14.87%)` | :arrow_down: |
   | [skip-bytebuffers-true](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [temurin](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.74% <0.00%> (-14.87%)` | :arrow_down: |
   | [unittests](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.74% <0.00%> (-14.86%)` | :arrow_down: |
   | [unittests1](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.74% <0.00%> (-0.20%)` | :arrow_down: |
   | [unittests2](https://app.codecov.io/gh/apache/pinot/pull/11988/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Files](https://app.codecov.io/gh/apache/pinot/pull/11988?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [.../pinot/common/function/scalar/StringFunctions.java](https://app.codecov.io/gh/apache/pinot/pull/11988?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vZnVuY3Rpb24vc2NhbGFyL1N0cmluZ0Z1bmN0aW9ucy5qYXZh) | `60.46% <0.00%> (-5.64%)` | :arrow_down: |
   
   ... and [975 files with indirect coverage changes](https://app.codecov.io/gh/apache/pinot/pull/11988/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in [Chrome](https://chrome.google.com/webstore/detail/codecov/gedikamndpbemklijjkncpnolildpbgo) or [Firefox](https://addons.mozilla.org/en-US/firefox/addon/codecov/) today!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Adding byte functions for UUIDs [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang merged PR #11988:
URL: https://github.com/apache/pinot/pull/11988


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Adding byte functions for UUIDs [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11988:
URL: https://github.com/apache/pinot/pull/11988#discussion_r1392947331


##########
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java:
##########
@@ -493,6 +495,37 @@ public static byte[] toAscii(String input) {
     return input.getBytes(StandardCharsets.US_ASCII);
   }
 
+  /**
+   * @param input UUID as string
+   * @return bytearray
+   * returns bytes and null on exception
+   */
+  @ScalarFunction
+  public static byte[] toUuidBytes(String input) {
+    try {
+      UUID uuid = UUID.fromString(input);
+      ByteBuffer bb = ByteBuffer.wrap(new byte[16]);
+      bb.putLong(uuid.getMostSignificantBits());
+      bb.putLong(uuid.getLeastSignificantBits());
+      return bb.array();
+    } catch (IllegalArgumentException e) {
+      return null;
+    }
+  }
+
+  /**
+   * @param input UUID serialized to bytes
+   * @return String representation of UUID
+   * returns bytes and null on exception
+   */
+  @ScalarFunction
+  public static String fromUuidBytes(byte[] input) {

Review Comment:
   ```suggestion
     public static String fromUUIDBytes(byte[] input) {
   ```



##########
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java:
##########
@@ -493,6 +495,37 @@ public static byte[] toAscii(String input) {
     return input.getBytes(StandardCharsets.US_ASCII);
   }
 
+  /**
+   * @param input UUID as string
+   * @return bytearray
+   * returns bytes and null on exception
+   */
+  @ScalarFunction
+  public static byte[] toUuidBytes(String input) {

Review Comment:
   ```suggestion
     public static byte[] toUUIDBytes(String input) {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org