You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "Jackie-Jiang (via GitHub)" <gi...@apache.org> on 2023/07/12 18:18:44 UTC

[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10990: Updated code for setting value of segment min/max property.

Jackie-Jiang commented on code in PR #10990:
URL: https://github.com/apache/pinot/pull/10990#discussion_r1261564543


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -605,6 +597,40 @@ static boolean isValidPropertyValue(String value) {
     return value.indexOf(',') == -1;
   }
 
+  /**
+   * Helper method to get the valid value for setting min/max.
+   */
+  @VisibleForTesting
+  static String getValidPropertyValue(String value) {
+    int length = value.length();
+    if (length > 0) {
+      // check and replace first character if it's a white space
+      if (Character.isWhitespace(value.charAt(0))) {
+        String unicodeValue = "\\u" + String.format("%04x", (int) value.charAt(0));
+        value = unicodeValue + value.substring(1);
+      }
+
+      // check and replace last character if it's a white space
+      if (Character.isWhitespace(value.charAt(value.length() - 1))) {
+        String unicodeValue = "\\u" + String.format("%04x", (int) value.charAt(value.length() - 1));
+        value = value.substring(0, value.length() - 1) + unicodeValue;
+      }
+    }
+
+    // removing the ',' from the value if it's present.
+    if (length > 0 && value.contains(",")) {
+      value = value.replace(",", "\\,");
+      length = value.length(); // updating the length value after escaping ','
+    }
+
+    // taking first m characters of the value if length is greater than METADATA_PROPERTY_LENGTH_LIMIT
+    if (length > METADATA_PROPERTY_LENGTH_LIMIT) {

Review Comment:
   We cannot simply trim the extra characters because it can result in a wrong min/max value. For min value, the modified value must be smaller than the actual value; for max value, the modified value must be larger than the actual value.
   One approach to solve the max value problem is to add 1 to the last character or append one character after the last character. For BYTES type, we might also need some special handling because the value needs to be a valid hex encoded string
   
   Also, we should first modify the value then handle special characters, or the special characters might be cut in half and cause it unreadable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org