You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "abhioncbr (via GitHub)" <gi...@apache.org> on 2023/06/28 02:19:53 UTC

[GitHub] [pinot] abhioncbr opened a new pull request, #10990: Updated code to getting valid value for segment min/max property.

abhioncbr opened a new pull request, #10990:
URL: https://github.com/apache/pinot/pull/10990

   Previously, we skipped to set the value for min/max if the value was invalid. We were determining invalid values based on the following checks 
   - If value contains leading or trailing whitespace character.
   - if the value contains `,`
   - if the value length is greater than 512.
   
   
   Changes in this PR are intended to ensure we will set the value of the segment's min/max property. It's part of the fix for the [issue](10793) and as per the discussion in the other [PR](https://github.com/apache/pinot/pull/10891)
   
   cc: @Jackie-Jiang 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on pull request #10990: Updated code for setting value of segment min/max property.

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on PR #10990:
URL: https://github.com/apache/pinot/pull/10990#issuecomment-1624799515

   Thanks for taking time looking into this library. As long as we can read the original value back, we are good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] abhioncbr commented on a diff in pull request #10990: Updated code for setting value of segment min/max property.

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr commented on code in PR #10990:
URL: https://github.com/apache/pinot/pull/10990#discussion_r1271443007


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -619,4 +645,44 @@ public void close()
     creators.addAll(_dictionaryCreatorMap.values());
     FileUtils.close(creators);
   }
+
+  /**
+   * Returns the original string if its length is within the allowed limit.
+   * If the string's length exceeds the limit,
+   * it returns a truncated version of the string with maintaining min or max value.
+   *
+   */
+  @VisibleForTesting
+  static String getValueWithinLengthLimit(String value, boolean isMax, DataType dataType) {
+    int length = value.length();
+
+    // if length is less, no need of trimming the value.
+    if (length <= METADATA_PROPERTY_LENGTH_LIMIT) {
+      return value;
+    }
+
+    String alteredValue;
+    // For Numeric Data Type(INT, LONG, DOUBLE, FLOAT) value longer than METADATA_PROPERTY_LENGTH_LIMIT is not possible.
+    switch (dataType) {
+      case STRING:
+      case JSON:
+        if (isMax) {
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT - 1)
+              + (char) (value.charAt(METADATA_PROPERTY_LENGTH_LIMIT - 1) + 1);
+        } else {
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT);
+        }
+        break;
+      case BYTES:
+        byte[] shortByteValue = Arrays.copyOf(BytesUtils.toBytes(value), (METADATA_PROPERTY_LENGTH_LIMIT / 2));
+        if (isMax) {
+          shortByteValue[METADATA_PROPERTY_LENGTH_LIMIT / 2 - 1] += 1;

Review Comment:
   Yes, this can still overflow. However, becoming the last byte with the value `-127` isn't resulting in the error. Added the UT with such a scenario. Please let me know if there is an alternative approach. Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] abhioncbr closed pull request #10990: Updated code for setting value of segment min/max property.

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr closed pull request #10990: Updated code for setting value of segment min/max property.
URL: https://github.com/apache/pinot/pull/10990


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10990: Updated code for setting value of segment min/max property.

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10990:
URL: https://github.com/apache/pinot/pull/10990#discussion_r1269931272


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -619,4 +645,52 @@ public void close()
     creators.addAll(_dictionaryCreatorMap.values());
     FileUtils.close(creators);
   }
+
+  /**
+   * Returns the original string if its length is within the allowed limit.
+   * If the string's length exceeds the limit,
+   * it returns a truncated version of the string with maintaining min or max value.
+   *
+   */
+  @VisibleForTesting
+  static String getValueWithinLengthLimit(String value, boolean isMax, DataType dataType) {
+    int length = value.length();
+
+    // if length is less, no need of trimming the value.
+    if (length < METADATA_PROPERTY_LENGTH_LIMIT) {

Review Comment:
   (minor)
   ```suggestion
       if (length <= METADATA_PROPERTY_LENGTH_LIMIT) {
   ```



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -605,6 +599,38 @@ static boolean isValidPropertyValue(String value) {
     return value.indexOf(',') == -1;
   }
 
+  /**
+   * Helper method to get the valid value for setting min/max.
+   */
+  @VisibleForTesting
+  static String getValidPropertyValue(String value, boolean isMax, DataType dataType) {
+    String valueWithinLengthLimit = getValueWithinLengthLimit(value, isMax, dataType);
+    int length = valueWithinLengthLimit.length();
+
+    if (length > 0) {
+      // check and replace first character if it's a white space
+      if (Character.isWhitespace(valueWithinLengthLimit.charAt(0))) {
+        String unicodeValue = "\\u" + String.format("%04x", (int) valueWithinLengthLimit.charAt(0));
+        valueWithinLengthLimit = unicodeValue + valueWithinLengthLimit.substring(1);
+      }
+
+      // check and replace last character if it's a white space
+      if (Character.isWhitespace(valueWithinLengthLimit.charAt(valueWithinLengthLimit.length() - 1))) {
+        String unicodeValue = "\\u"
+            + String.format("%04x", (int) valueWithinLengthLimit.charAt(valueWithinLengthLimit.length() - 1));
+        valueWithinLengthLimit =
+            valueWithinLengthLimit.substring(0, valueWithinLengthLimit.length() - 1) + unicodeValue;
+      }
+    }
+
+    // removing the ',' from the value if it's present.
+    if (length > 0 && valueWithinLengthLimit.contains(",")) {
+      valueWithinLengthLimit = valueWithinLengthLimit.replace(",", "\\,");
+    }

Review Comment:
   (minor) Move this into the previous if branch (`length > 0` is duplicated)



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -619,4 +645,52 @@ public void close()
     creators.addAll(_dictionaryCreatorMap.values());
     FileUtils.close(creators);
   }
+
+  /**
+   * Returns the original string if its length is within the allowed limit.
+   * If the string's length exceeds the limit,
+   * it returns a truncated version of the string with maintaining min or max value.
+   *
+   */
+  @VisibleForTesting
+  static String getValueWithinLengthLimit(String value, boolean isMax, DataType dataType) {
+    int length = value.length();
+
+    // if length is less, no need of trimming the value.
+    if (length < METADATA_PROPERTY_LENGTH_LIMIT) {
+      return value;
+    }
+
+    String alteredValue;
+    // For Numeric Data Type(INT, LONG, DOUBLE, FLOAT) value longer than METADATA_PROPERTY_LENGTH_LIMIT is not possible.
+    switch (dataType) {
+      case STRING:
+      case JSON:
+        if (isMax) {
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT - 1)
+              + (char) (value.charAt(METADATA_PROPERTY_LENGTH_LIMIT - 1) + 1);
+        } else {
+          // property type is min value take first METADATA_PROPERTY_LENGTH_LIMIT - 1 characters
+          // and decrement the last character value by 1.
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT - 1)
+              + (char) (value.charAt(METADATA_PROPERTY_LENGTH_LIMIT - 1) - 1);
+        }
+        break;
+      case BYTES:
+        byte[] shortByteValue = Arrays.copyOf(BytesUtils.toBytes(value),
+            (METADATA_PROPERTY_LENGTH_LIMIT / 2) + 1);
+        char ch;

Review Comment:
   Why converting it to `char`?



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -619,4 +645,52 @@ public void close()
     creators.addAll(_dictionaryCreatorMap.values());
     FileUtils.close(creators);
   }
+
+  /**
+   * Returns the original string if its length is within the allowed limit.
+   * If the string's length exceeds the limit,
+   * it returns a truncated version of the string with maintaining min or max value.
+   *
+   */
+  @VisibleForTesting
+  static String getValueWithinLengthLimit(String value, boolean isMax, DataType dataType) {
+    int length = value.length();
+
+    // if length is less, no need of trimming the value.
+    if (length < METADATA_PROPERTY_LENGTH_LIMIT) {
+      return value;
+    }
+
+    String alteredValue;
+    // For Numeric Data Type(INT, LONG, DOUBLE, FLOAT) value longer than METADATA_PROPERTY_LENGTH_LIMIT is not possible.
+    switch (dataType) {
+      case STRING:
+      case JSON:
+        if (isMax) {
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT - 1)
+              + (char) (value.charAt(METADATA_PROPERTY_LENGTH_LIMIT - 1) + 1);
+        } else {
+          // property type is min value take first METADATA_PROPERTY_LENGTH_LIMIT - 1 characters
+          // and decrement the last character value by 1.
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT - 1)
+              + (char) (value.charAt(METADATA_PROPERTY_LENGTH_LIMIT - 1) - 1);
+        }
+        break;
+      case BYTES:
+        byte[] shortByteValue = Arrays.copyOf(BytesUtils.toBytes(value),
+            (METADATA_PROPERTY_LENGTH_LIMIT / 2) + 1);

Review Comment:
   Why `+1`?



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -619,4 +645,52 @@ public void close()
     creators.addAll(_dictionaryCreatorMap.values());
     FileUtils.close(creators);
   }
+
+  /**
+   * Returns the original string if its length is within the allowed limit.
+   * If the string's length exceeds the limit,
+   * it returns a truncated version of the string with maintaining min or max value.
+   *
+   */
+  @VisibleForTesting
+  static String getValueWithinLengthLimit(String value, boolean isMax, DataType dataType) {
+    int length = value.length();
+
+    // if length is less, no need of trimming the value.
+    if (length < METADATA_PROPERTY_LENGTH_LIMIT) {
+      return value;
+    }
+
+    String alteredValue;
+    // For Numeric Data Type(INT, LONG, DOUBLE, FLOAT) value longer than METADATA_PROPERTY_LENGTH_LIMIT is not possible.
+    switch (dataType) {
+      case STRING:
+      case JSON:
+        if (isMax) {
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT - 1)
+              + (char) (value.charAt(METADATA_PROPERTY_LENGTH_LIMIT - 1) + 1);
+        } else {
+          // property type is min value take first METADATA_PROPERTY_LENGTH_LIMIT - 1 characters
+          // and decrement the last character value by 1.
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT - 1)
+              + (char) (value.charAt(METADATA_PROPERTY_LENGTH_LIMIT - 1) - 1);
+        }
+        break;
+      case BYTES:
+        byte[] shortByteValue = Arrays.copyOf(BytesUtils.toBytes(value),
+            (METADATA_PROPERTY_LENGTH_LIMIT / 2) + 1);
+        char ch;
+        if (isMax) {
+          ch = (char) ((char) shortByteValue[METADATA_PROPERTY_LENGTH_LIMIT / 2] + 1);
+        } else {
+          ch = (char) ((char) shortByteValue[METADATA_PROPERTY_LENGTH_LIMIT / 2] - 1);

Review Comment:
   For min, we can just remove the trailing bytes



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -619,4 +645,52 @@ public void close()
     creators.addAll(_dictionaryCreatorMap.values());
     FileUtils.close(creators);
   }
+
+  /**
+   * Returns the original string if its length is within the allowed limit.
+   * If the string's length exceeds the limit,
+   * it returns a truncated version of the string with maintaining min or max value.
+   *
+   */
+  @VisibleForTesting
+  static String getValueWithinLengthLimit(String value, boolean isMax, DataType dataType) {
+    int length = value.length();
+
+    // if length is less, no need of trimming the value.
+    if (length < METADATA_PROPERTY_LENGTH_LIMIT) {
+      return value;
+    }
+
+    String alteredValue;
+    // For Numeric Data Type(INT, LONG, DOUBLE, FLOAT) value longer than METADATA_PROPERTY_LENGTH_LIMIT is not possible.
+    switch (dataType) {
+      case STRING:
+      case JSON:
+        if (isMax) {
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT - 1)
+              + (char) (value.charAt(METADATA_PROPERTY_LENGTH_LIMIT - 1) + 1);

Review Comment:
   Is it possible that this `+1` can result in an invalid char?



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -619,4 +645,52 @@ public void close()
     creators.addAll(_dictionaryCreatorMap.values());
     FileUtils.close(creators);
   }
+
+  /**
+   * Returns the original string if its length is within the allowed limit.
+   * If the string's length exceeds the limit,
+   * it returns a truncated version of the string with maintaining min or max value.
+   *
+   */
+  @VisibleForTesting
+  static String getValueWithinLengthLimit(String value, boolean isMax, DataType dataType) {
+    int length = value.length();
+
+    // if length is less, no need of trimming the value.
+    if (length < METADATA_PROPERTY_LENGTH_LIMIT) {
+      return value;
+    }
+
+    String alteredValue;
+    // For Numeric Data Type(INT, LONG, DOUBLE, FLOAT) value longer than METADATA_PROPERTY_LENGTH_LIMIT is not possible.
+    switch (dataType) {
+      case STRING:
+      case JSON:
+        if (isMax) {
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT - 1)
+              + (char) (value.charAt(METADATA_PROPERTY_LENGTH_LIMIT - 1) + 1);
+        } else {
+          // property type is min value take first METADATA_PROPERTY_LENGTH_LIMIT - 1 characters
+          // and decrement the last character value by 1.
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT - 1)
+              + (char) (value.charAt(METADATA_PROPERTY_LENGTH_LIMIT - 1) - 1);

Review Comment:
   Here we should be able to just do `value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT)`



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -619,4 +645,52 @@ public void close()
     creators.addAll(_dictionaryCreatorMap.values());
     FileUtils.close(creators);
   }
+
+  /**
+   * Returns the original string if its length is within the allowed limit.
+   * If the string's length exceeds the limit,
+   * it returns a truncated version of the string with maintaining min or max value.
+   *
+   */
+  @VisibleForTesting
+  static String getValueWithinLengthLimit(String value, boolean isMax, DataType dataType) {
+    int length = value.length();
+
+    // if length is less, no need of trimming the value.
+    if (length < METADATA_PROPERTY_LENGTH_LIMIT) {
+      return value;
+    }
+
+    String alteredValue;
+    // For Numeric Data Type(INT, LONG, DOUBLE, FLOAT) value longer than METADATA_PROPERTY_LENGTH_LIMIT is not possible.
+    switch (dataType) {
+      case STRING:
+      case JSON:
+        if (isMax) {
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT - 1)
+              + (char) (value.charAt(METADATA_PROPERTY_LENGTH_LIMIT - 1) + 1);
+        } else {
+          // property type is min value take first METADATA_PROPERTY_LENGTH_LIMIT - 1 characters
+          // and decrement the last character value by 1.
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT - 1)
+              + (char) (value.charAt(METADATA_PROPERTY_LENGTH_LIMIT - 1) - 1);
+        }
+        break;
+      case BYTES:
+        byte[] shortByteValue = Arrays.copyOf(BytesUtils.toBytes(value),
+            (METADATA_PROPERTY_LENGTH_LIMIT / 2) + 1);
+        char ch;
+        if (isMax) {
+          ch = (char) ((char) shortByteValue[METADATA_PROPERTY_LENGTH_LIMIT / 2] + 1);

Review Comment:
   This can overflow



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10990: Updated code for setting value of segment min/max property.

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10990:
URL: https://github.com/apache/pinot/pull/10990#discussion_r1245955420


##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreatorTest.java:
##########
@@ -174,21 +202,52 @@ private static long getStartTimeInSegmentMetadata(String testDateTimeFormat, Str
   }
 
   @Test
-  public void testAddMinMaxValueInvalid() {
+  public void testAddMinMaxValue() {
     PropertiesConfiguration props = new PropertiesConfiguration();
     SegmentColumnarIndexCreator.addColumnMinMaxValueInfo(props, "colA", "bar", "foo");
     Assert.assertFalse(Boolean.parseBoolean(
-        String.valueOf(props.getProperty(Column.getKeyFor("colA", Column.MIN_MAX_VALUE_INVALID)))));
+        String.valueOf(props.getProperty(getKeyFor("colA", Column.MIN_MAX_VALUE_INVALID)))));
 
     props = new PropertiesConfiguration();
     SegmentColumnarIndexCreator.addColumnMinMaxValueInfo(props, "colA", ",bar", "foo");
-    Assert.assertTrue(Boolean.parseBoolean(
-        String.valueOf(props.getProperty(Column.getKeyFor("colA", Column.MIN_MAX_VALUE_INVALID)))));
+    Assert.assertFalse(Boolean.parseBoolean(
+        String.valueOf(props.getProperty(getKeyFor("colA", Column.MIN_MAX_VALUE_INVALID)))));
+    Assert.assertEquals(String.valueOf(props.getProperty(getKeyFor("colA", MIN_VALUE))), "bar");
 
     props = new PropertiesConfiguration();
     SegmentColumnarIndexCreator.addColumnMinMaxValueInfo(props, "colA", "bar", "  ");

Review Comment:
   Based on the existing logic, I think it is a valid value



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] abhioncbr commented on pull request #10990: Updated code for setting value of segment min/max property.

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr commented on PR #10990:
URL: https://github.com/apache/pinot/pull/10990#issuecomment-1622191369

   > This is incorrect because after modifying the value it is no longer the min/max value and can cause wrong result. E.g. ` zzz` can be the min value because of the leading whitespace, but after removing the whitespace it will be changed to `zzz` which is a very large value. We need to handle min/max value separately, and ensure the modified value is smaller than min/larger than max.
   
   Question: why do the min/max values with leading or trailing ` ` or with `,` are invalid? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10990: Updated code for setting value of segment min/max property.

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10990:
URL: https://github.com/apache/pinot/pull/10990#discussion_r1261564543


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -605,6 +597,40 @@ static boolean isValidPropertyValue(String value) {
     return value.indexOf(',') == -1;
   }
 
+  /**
+   * Helper method to get the valid value for setting min/max.
+   */
+  @VisibleForTesting
+  static String getValidPropertyValue(String value) {
+    int length = value.length();
+    if (length > 0) {
+      // check and replace first character if it's a white space
+      if (Character.isWhitespace(value.charAt(0))) {
+        String unicodeValue = "\\u" + String.format("%04x", (int) value.charAt(0));
+        value = unicodeValue + value.substring(1);
+      }
+
+      // check and replace last character if it's a white space
+      if (Character.isWhitespace(value.charAt(value.length() - 1))) {
+        String unicodeValue = "\\u" + String.format("%04x", (int) value.charAt(value.length() - 1));
+        value = value.substring(0, value.length() - 1) + unicodeValue;
+      }
+    }
+
+    // removing the ',' from the value if it's present.
+    if (length > 0 && value.contains(",")) {
+      value = value.replace(",", "\\,");
+      length = value.length(); // updating the length value after escaping ','
+    }
+
+    // taking first m characters of the value if length is greater than METADATA_PROPERTY_LENGTH_LIMIT
+    if (length > METADATA_PROPERTY_LENGTH_LIMIT) {

Review Comment:
   We cannot simply trim the extra characters because it can result in a wrong min/max value. For min value, the modified value must be smaller than the actual value; for max value, the modified value must be larger than the actual value.
   One approach to solve the max value problem is to add 1 to the last character or append one character after the last character. For BYTES type, we might also need some special handling because the value needs to be a valid hex encoded string
   
   Also, we should first modify the value then handle special characters, or the special characters might be cut in half and cause it unreadable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10990: Updated code for setting value of segment min/max property.

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10990:
URL: https://github.com/apache/pinot/pull/10990#discussion_r1271267712


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -619,4 +645,52 @@ public void close()
     creators.addAll(_dictionaryCreatorMap.values());
     FileUtils.close(creators);
   }
+
+  /**
+   * Returns the original string if its length is within the allowed limit.
+   * If the string's length exceeds the limit,
+   * it returns a truncated version of the string with maintaining min or max value.
+   *
+   */
+  @VisibleForTesting
+  static String getValueWithinLengthLimit(String value, boolean isMax, DataType dataType) {
+    int length = value.length();
+
+    // if length is less, no need of trimming the value.
+    if (length < METADATA_PROPERTY_LENGTH_LIMIT) {
+      return value;
+    }
+
+    String alteredValue;
+    // For Numeric Data Type(INT, LONG, DOUBLE, FLOAT) value longer than METADATA_PROPERTY_LENGTH_LIMIT is not possible.
+    switch (dataType) {
+      case STRING:
+      case JSON:
+        if (isMax) {
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT - 1)
+              + (char) (value.charAt(METADATA_PROPERTY_LENGTH_LIMIT - 1) + 1);

Review Comment:
   This is java `char`, which is 16 bytes and contains a lot of special characters. I don't know if all of the chars are printable and can be stored in the `Configuration`. We need to avoid creating unprintable char here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10990: Updated code for setting value of segment min/max property.

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10990:
URL: https://github.com/apache/pinot/pull/10990#discussion_r1273020617


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -619,4 +645,57 @@ public void close()
     creators.addAll(_dictionaryCreatorMap.values());
     FileUtils.close(creators);
   }
+
+  /**
+   * Returns the original string if its length is within the allowed limit.
+   * If the string's length exceeds the limit,
+   * it returns a truncated version of the string with maintaining min or max value.
+   *
+   */
+  @VisibleForTesting
+  static String getValueWithinLengthLimit(String value, boolean isMax, DataType dataType) {
+    int length = value.length();
+
+    // if length is less, no need of trimming the value.
+    if (length <= METADATA_PROPERTY_LENGTH_LIMIT) {
+      return value;
+    }
+
+    String alteredValue;
+    // For Numeric Data Type(INT, LONG, DOUBLE, FLOAT) value longer than METADATA_PROPERTY_LENGTH_LIMIT is not possible.
+    switch (dataType) {
+      case STRING:
+      case JSON:
+        if (isMax) {
+          int trimIndexValue = METADATA_PROPERTY_LENGTH_LIMIT - 1;
+          // determining the index for the character having value less than '\uFFFF'
+          while (trimIndexValue < value.length() && value.charAt(trimIndexValue) == '\uFFFF') {
+            trimIndexValue++;
+          }
+          alteredValue = value.substring(0, trimIndexValue) + '\uFFFF'; // assigning the '\uFFFF' to make the value max.
+        } else {
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT);
+        }
+        break;
+      case BYTES:
+        if (isMax) {
+          byte[] valueInByteArray = BytesUtils.toBytes(value);
+          int trimIndexValue = METADATA_PROPERTY_LENGTH_LIMIT / 2 - 1;
+          // determining the index for the byte having value less than 0xFF
+          while (trimIndexValue < value.length() && (valueInByteArray[trimIndexValue] & 0xff) == 0xFF) {

Review Comment:
   Cannot use `value.length()`
   ```suggestion
             while (trimIndexValue < valueInByteArray.length && valueInByteArray[trimIndexValue]  == (byte) 0xFF) {
   ```



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -619,4 +645,57 @@ public void close()
     creators.addAll(_dictionaryCreatorMap.values());
     FileUtils.close(creators);
   }
+
+  /**
+   * Returns the original string if its length is within the allowed limit.
+   * If the string's length exceeds the limit,
+   * it returns a truncated version of the string with maintaining min or max value.
+   *
+   */
+  @VisibleForTesting
+  static String getValueWithinLengthLimit(String value, boolean isMax, DataType dataType) {
+    int length = value.length();
+
+    // if length is less, no need of trimming the value.
+    if (length <= METADATA_PROPERTY_LENGTH_LIMIT) {
+      return value;
+    }
+
+    String alteredValue;
+    // For Numeric Data Type(INT, LONG, DOUBLE, FLOAT) value longer than METADATA_PROPERTY_LENGTH_LIMIT is not possible.
+    switch (dataType) {
+      case STRING:
+      case JSON:
+        if (isMax) {
+          int trimIndexValue = METADATA_PROPERTY_LENGTH_LIMIT - 1;
+          // determining the index for the character having value less than '\uFFFF'
+          while (trimIndexValue < value.length() && value.charAt(trimIndexValue) == '\uFFFF') {
+            trimIndexValue++;
+          }
+          alteredValue = value.substring(0, trimIndexValue) + '\uFFFF'; // assigning the '\uFFFF' to make the value max.
+        } else {
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT);
+        }
+        break;
+      case BYTES:
+        if (isMax) {
+          byte[] valueInByteArray = BytesUtils.toBytes(value);
+          int trimIndexValue = METADATA_PROPERTY_LENGTH_LIMIT / 2 - 1;
+          // determining the index for the byte having value less than 0xFF
+          while (trimIndexValue < value.length() && (valueInByteArray[trimIndexValue] & 0xff) == 0xFF) {
+            trimIndexValue++;
+          }
+          byte[] shortByteValue = Arrays.copyOf(valueInByteArray, trimIndexValue + 1);
+          shortByteValue[trimIndexValue] = (byte) 0xFF; // assigning the 0xFF to make the value max.
+          alteredValue = BytesUtils.toHexString(shortByteValue);

Review Comment:
   ```suggestion
             if (trimIndexValue == valueInByteArray.length) {
               alteredValue = value;
             } else {
               byte[] shortByteValue = Arrays.copyOf(valueInByteArray, trimIndexValue + 1);
               shortByteValue[trimIndexValue] = (byte) 0xFF; // assigning the 0xFF to make the value max.
               alteredValue = BytesUtils.toHexString(shortByteValue);
             }
   ```



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -619,4 +645,57 @@ public void close()
     creators.addAll(_dictionaryCreatorMap.values());
     FileUtils.close(creators);
   }
+
+  /**
+   * Returns the original string if its length is within the allowed limit.
+   * If the string's length exceeds the limit,
+   * it returns a truncated version of the string with maintaining min or max value.
+   *
+   */
+  @VisibleForTesting
+  static String getValueWithinLengthLimit(String value, boolean isMax, DataType dataType) {
+    int length = value.length();
+
+    // if length is less, no need of trimming the value.
+    if (length <= METADATA_PROPERTY_LENGTH_LIMIT) {
+      return value;
+    }
+
+    String alteredValue;
+    // For Numeric Data Type(INT, LONG, DOUBLE, FLOAT) value longer than METADATA_PROPERTY_LENGTH_LIMIT is not possible.
+    switch (dataType) {
+      case STRING:
+      case JSON:
+        if (isMax) {
+          int trimIndexValue = METADATA_PROPERTY_LENGTH_LIMIT - 1;
+          // determining the index for the character having value less than '\uFFFF'
+          while (trimIndexValue < value.length() && value.charAt(trimIndexValue) == '\uFFFF') {
+            trimIndexValue++;
+          }
+          alteredValue = value.substring(0, trimIndexValue) + '\uFFFF'; // assigning the '\uFFFF' to make the value max.

Review Comment:
   (minor)
   ```suggestion
             if (trimIndexValue == length) {
               alteredValue = value;
             } else {
               alteredValue = value.substring(0, trimIndexValue) + '\uFFFF'; // assigning the '\uFFFF' to make the value max.
             }
   ```



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -619,4 +645,57 @@ public void close()
     creators.addAll(_dictionaryCreatorMap.values());
     FileUtils.close(creators);
   }
+
+  /**
+   * Returns the original string if its length is within the allowed limit.
+   * If the string's length exceeds the limit,
+   * it returns a truncated version of the string with maintaining min or max value.
+   *
+   */
+  @VisibleForTesting
+  static String getValueWithinLengthLimit(String value, boolean isMax, DataType dataType) {
+    int length = value.length();
+
+    // if length is less, no need of trimming the value.
+    if (length <= METADATA_PROPERTY_LENGTH_LIMIT) {
+      return value;
+    }
+
+    String alteredValue;
+    // For Numeric Data Type(INT, LONG, DOUBLE, FLOAT) value longer than METADATA_PROPERTY_LENGTH_LIMIT is not possible.
+    switch (dataType) {
+      case STRING:
+      case JSON:
+        if (isMax) {
+          int trimIndexValue = METADATA_PROPERTY_LENGTH_LIMIT - 1;
+          // determining the index for the character having value less than '\uFFFF'
+          while (trimIndexValue < value.length() && value.charAt(trimIndexValue) == '\uFFFF') {

Review Comment:
   (minor)
   ```suggestion
             while (trimIndexValue < length && value.charAt(trimIndexValue) == '\uFFFF') {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] abhioncbr commented on a diff in pull request #10990: Updated code for setting value of segment min/max property.

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr commented on code in PR #10990:
URL: https://github.com/apache/pinot/pull/10990#discussion_r1263236562


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -605,6 +597,40 @@ static boolean isValidPropertyValue(String value) {
     return value.indexOf(',') == -1;
   }
 
+  /**
+   * Helper method to get the valid value for setting min/max.
+   */
+  @VisibleForTesting
+  static String getValidPropertyValue(String value) {
+    int length = value.length();
+    if (length > 0) {
+      // check and replace first character if it's a white space
+      if (Character.isWhitespace(value.charAt(0))) {
+        String unicodeValue = "\\u" + String.format("%04x", (int) value.charAt(0));
+        value = unicodeValue + value.substring(1);
+      }
+
+      // check and replace last character if it's a white space
+      if (Character.isWhitespace(value.charAt(value.length() - 1))) {
+        String unicodeValue = "\\u" + String.format("%04x", (int) value.charAt(value.length() - 1));
+        value = value.substring(0, value.length() - 1) + unicodeValue;
+      }
+    }
+
+    // removing the ',' from the value if it's present.
+    if (length > 0 && value.contains(",")) {
+      value = value.replace(",", "\\,");
+      length = value.length(); // updating the length value after escaping ','
+    }
+
+    // taking first m characters of the value if length is greater than METADATA_PROPERTY_LENGTH_LIMIT
+    if (length > METADATA_PROPERTY_LENGTH_LIMIT) {

Review Comment:
   Updated as per the comment. I will add the length-based UTs if this looks good to you. Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] abhioncbr commented on a diff in pull request #10990: Updated code for setting value of segment min/max property.

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr commented on code in PR #10990:
URL: https://github.com/apache/pinot/pull/10990#discussion_r1263248948


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -605,6 +597,40 @@ static boolean isValidPropertyValue(String value) {
     return value.indexOf(',') == -1;
   }
 
+  /**
+   * Helper method to get the valid value for setting min/max.
+   */
+  @VisibleForTesting
+  static String getValidPropertyValue(String value) {
+    int length = value.length();
+    if (length > 0) {
+      // check and replace first character if it's a white space
+      if (Character.isWhitespace(value.charAt(0))) {
+        String unicodeValue = "\\u" + String.format("%04x", (int) value.charAt(0));
+        value = unicodeValue + value.substring(1);
+      }
+
+      // check and replace last character if it's a white space
+      if (Character.isWhitespace(value.charAt(value.length() - 1))) {
+        String unicodeValue = "\\u" + String.format("%04x", (int) value.charAt(value.length() - 1));
+        value = value.substring(0, value.length() - 1) + unicodeValue;
+      }
+    }
+
+    // removing the ',' from the value if it's present.
+    if (length > 0 && value.contains(",")) {
+      value = value.replace(",", "\\,");
+      length = value.length(); // updating the length value after escaping ','
+    }
+
+    // taking first m characters of the value if length is greater than METADATA_PROPERTY_LENGTH_LIMIT
+    if (length > METADATA_PROPERTY_LENGTH_LIMIT) {

Review Comment:
   nvm, I figured out one issue with the approach. Let me add all the tests first to cover the scenarios. Thanks



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -605,6 +597,40 @@ static boolean isValidPropertyValue(String value) {
     return value.indexOf(',') == -1;
   }
 
+  /**
+   * Helper method to get the valid value for setting min/max.
+   */
+  @VisibleForTesting
+  static String getValidPropertyValue(String value) {
+    int length = value.length();
+    if (length > 0) {
+      // check and replace first character if it's a white space
+      if (Character.isWhitespace(value.charAt(0))) {
+        String unicodeValue = "\\u" + String.format("%04x", (int) value.charAt(0));
+        value = unicodeValue + value.substring(1);
+      }
+
+      // check and replace last character if it's a white space
+      if (Character.isWhitespace(value.charAt(value.length() - 1))) {
+        String unicodeValue = "\\u" + String.format("%04x", (int) value.charAt(value.length() - 1));
+        value = value.substring(0, value.length() - 1) + unicodeValue;
+      }
+    }
+
+    // removing the ',' from the value if it's present.
+    if (length > 0 && value.contains(",")) {
+      value = value.replace(",", "\\,");
+      length = value.length(); // updating the length value after escaping ','
+    }
+
+    // taking first m characters of the value if length is greater than METADATA_PROPERTY_LENGTH_LIMIT
+    if (length > METADATA_PROPERTY_LENGTH_LIMIT) {

Review Comment:
   Updated as per the comment. I will add the length-based UTs if this looks good to you. Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] abhioncbr commented on a diff in pull request #10990: Updated code for setting value of segment min/max property.

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr commented on code in PR #10990:
URL: https://github.com/apache/pinot/pull/10990#discussion_r1270130558


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -619,4 +645,52 @@ public void close()
     creators.addAll(_dictionaryCreatorMap.values());
     FileUtils.close(creators);
   }
+
+  /**
+   * Returns the original string if its length is within the allowed limit.
+   * If the string's length exceeds the limit,
+   * it returns a truncated version of the string with maintaining min or max value.
+   *
+   */
+  @VisibleForTesting
+  static String getValueWithinLengthLimit(String value, boolean isMax, DataType dataType) {
+    int length = value.length();
+
+    // if length is less, no need of trimming the value.
+    if (length < METADATA_PROPERTY_LENGTH_LIMIT) {
+      return value;
+    }
+
+    String alteredValue;
+    // For Numeric Data Type(INT, LONG, DOUBLE, FLOAT) value longer than METADATA_PROPERTY_LENGTH_LIMIT is not possible.
+    switch (dataType) {
+      case STRING:
+      case JSON:
+        if (isMax) {
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT - 1)
+              + (char) (value.charAt(METADATA_PROPERTY_LENGTH_LIMIT - 1) + 1);

Review Comment:
   I think it's not possible; the last char, as per Ascii, is `DEL`(127), and I think that wouldn't be part of any string value. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] abhioncbr commented on pull request #10990: Updated code for setting value of segment min/max property.

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr commented on PR #10990:
URL: https://github.com/apache/pinot/pull/10990#issuecomment-1624572629

   > > Question: why do the min/max values with leading or trailing ` ` or with `,` are invalid?
   > 
   > Good question! IIRC, the limitation is introduced by the `PropertiesConfiguration` (from Apache Common library). Maybe we can work-around this limitation
   
   Thanks for the pointer. I was digging around the `PropertiesConfiguration`, and here are my understandings.
   - By default, PropertiesConfiguration has the ` = `(leading and trailing whitespace) as a key,value separator. As per the [doc](https://commons.apache.org/proper/commons-configuration/userguide_v1.10/howto_properties.html), we can either use the `setSeparator()` or `setGlobalSeparator()`
   - values having comma`,` is considered an array while reading, but we can escape the comma by adding `\\` in front of the comma.
   
   Lastly, here is the sample function for understanding the issue
   ```
   public static void main(String[] args)
         throws ConfigurationException {
       PropertiesConfiguration properties =
           new PropertiesConfiguration(new File("test"));
       String testValueWithWhitespace = "  aaa   ";
       properties.setProperty("a", testValueWithWhitespace);
   
       String testValueWithComma = "a\\,a\\,a";
       properties.setProperty("a1", testValueWithComma);
   
       properties.save();
   
       String readValue = (String)properties.getProperty("a");
       String readValue1 = (String)properties.getProperty("a1");
       System.out.printf("#%s#%n", readValue);
       System.out.printf("#%s#%n", readValue1);
       Assert.equals(testValueWithWhitespace, readValue);
     }
   ```
   The property file content for the above is as follows.
   ```
   a =           aaa   
   a1 = a\,a\,a
   
   ```
   
   I don't see the issue using the value with whitespace based on the above example, and for comma we can add the escape character. Please share your opinion. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] abhioncbr commented on a diff in pull request #10990: Updated code for setting value of segment min/max property.

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr commented on code in PR #10990:
URL: https://github.com/apache/pinot/pull/10990#discussion_r1265649633


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -605,6 +597,40 @@ static boolean isValidPropertyValue(String value) {
     return value.indexOf(',') == -1;
   }
 
+  /**
+   * Helper method to get the valid value for setting min/max.
+   */
+  @VisibleForTesting
+  static String getValidPropertyValue(String value) {
+    int length = value.length();
+    if (length > 0) {
+      // check and replace first character if it's a white space
+      if (Character.isWhitespace(value.charAt(0))) {
+        String unicodeValue = "\\u" + String.format("%04x", (int) value.charAt(0));
+        value = unicodeValue + value.substring(1);
+      }
+
+      // check and replace last character if it's a white space
+      if (Character.isWhitespace(value.charAt(value.length() - 1))) {
+        String unicodeValue = "\\u" + String.format("%04x", (int) value.charAt(value.length() - 1));
+        value = value.substring(0, value.length() - 1) + unicodeValue;
+      }
+    }
+
+    // removing the ',' from the value if it's present.
+    if (length > 0 && value.contains(",")) {
+      value = value.replace(",", "\\,");
+      length = value.length(); // updating the length value after escaping ','
+    }
+
+    // taking first m characters of the value if length is greater than METADATA_PROPERTY_LENGTH_LIMIT
+    if (length > METADATA_PROPERTY_LENGTH_LIMIT) {

Review Comment:
   Please review. Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] abhioncbr commented on a diff in pull request #10990: Updated code for setting value of segment min/max property.

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr commented on code in PR #10990:
URL: https://github.com/apache/pinot/pull/10990#discussion_r1263248948


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -605,6 +597,40 @@ static boolean isValidPropertyValue(String value) {
     return value.indexOf(',') == -1;
   }
 
+  /**
+   * Helper method to get the valid value for setting min/max.
+   */
+  @VisibleForTesting
+  static String getValidPropertyValue(String value) {
+    int length = value.length();
+    if (length > 0) {
+      // check and replace first character if it's a white space
+      if (Character.isWhitespace(value.charAt(0))) {
+        String unicodeValue = "\\u" + String.format("%04x", (int) value.charAt(0));
+        value = unicodeValue + value.substring(1);
+      }
+
+      // check and replace last character if it's a white space
+      if (Character.isWhitespace(value.charAt(value.length() - 1))) {
+        String unicodeValue = "\\u" + String.format("%04x", (int) value.charAt(value.length() - 1));
+        value = value.substring(0, value.length() - 1) + unicodeValue;
+      }
+    }
+
+    // removing the ',' from the value if it's present.
+    if (length > 0 && value.contains(",")) {
+      value = value.replace(",", "\\,");
+      length = value.length(); // updating the length value after escaping ','
+    }
+
+    // taking first m characters of the value if length is greater than METADATA_PROPERTY_LENGTH_LIMIT
+    if (length > METADATA_PROPERTY_LENGTH_LIMIT) {

Review Comment:
   nvm, I figured out one issue with the approach. Let me add all the tests first to cover the scenarios. Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] abhioncbr commented on a diff in pull request #10990: Updated code for setting value of segment min/max property.

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr commented on code in PR #10990:
URL: https://github.com/apache/pinot/pull/10990#discussion_r1271442751


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -619,4 +645,52 @@ public void close()
     creators.addAll(_dictionaryCreatorMap.values());
     FileUtils.close(creators);
   }
+
+  /**
+   * Returns the original string if its length is within the allowed limit.
+   * If the string's length exceeds the limit,
+   * it returns a truncated version of the string with maintaining min or max value.
+   *
+   */
+  @VisibleForTesting
+  static String getValueWithinLengthLimit(String value, boolean isMax, DataType dataType) {
+    int length = value.length();
+
+    // if length is less, no need of trimming the value.
+    if (length < METADATA_PROPERTY_LENGTH_LIMIT) {
+      return value;
+    }
+
+    String alteredValue;
+    // For Numeric Data Type(INT, LONG, DOUBLE, FLOAT) value longer than METADATA_PROPERTY_LENGTH_LIMIT is not possible.
+    switch (dataType) {
+      case STRING:
+      case JSON:
+        if (isMax) {
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT - 1)
+              + (char) (value.charAt(METADATA_PROPERTY_LENGTH_LIMIT - 1) + 1);

Review Comment:
   Instead of incrementing char, I have updated the code to add `z` as a last character. Please let me know if it works or suggest an alternative please.  Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #10990: Updated code for setting value of segment min/max property.

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #10990:
URL: https://github.com/apache/pinot/pull/10990#issuecomment-1610632503

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/10990?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#10990](https://app.codecov.io/gh/apache/pinot/pull/10990?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (69dc22d) into [master](https://app.codecov.io/gh/apache/pinot/commit/238629c41c551c6f42e38f817749aa445beea343?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (238629c) will **increase** coverage by `0.00%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@            Coverage Diff            @@
   ##           master   #10990     +/-   ##
   =========================================
     Coverage    0.11%    0.11%             
   =========================================
     Files        2192     2138     -54     
     Lines      118087   115581   -2506     
     Branches    17885    17583    -302     
   =========================================
     Hits          137      137             
   + Misses     117930   115424   -2506     
     Partials       20       20             
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1temurin11 | `?` | |
   | integration1temurin17 | `?` | |
   | integration1temurin20 | `?` | |
   | integration2temurin11 | `?` | |
   | integration2temurin17 | `?` | |
   | integration2temurin20 | `?` | |
   | unittests1temurin11 | `?` | |
   | unittests1temurin17 | `?` | |
   | unittests1temurin20 | `?` | |
   | unittests2temurin11 | `?` | |
   | unittests2temurin17 | `?` | |
   | unittests2temurin20 | `0.11% <0.00%> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://app.codecov.io/gh/apache/pinot/pull/10990?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://app.codecov.io/gh/apache/pinot/pull/10990?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   
   ... and [61 files with indirect coverage changes](https://app.codecov.io/gh/apache/pinot/pull/10990/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10990: Updated code for setting value of segment min/max property.

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10990:
URL: https://github.com/apache/pinot/pull/10990#discussion_r1271573773


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -619,4 +645,44 @@ public void close()
     creators.addAll(_dictionaryCreatorMap.values());
     FileUtils.close(creators);
   }
+
+  /**
+   * Returns the original string if its length is within the allowed limit.
+   * If the string's length exceeds the limit,
+   * it returns a truncated version of the string with maintaining min or max value.
+   *
+   */
+  @VisibleForTesting
+  static String getValueWithinLengthLimit(String value, boolean isMax, DataType dataType) {
+    int length = value.length();
+
+    // if length is less, no need of trimming the value.
+    if (length <= METADATA_PROPERTY_LENGTH_LIMIT) {
+      return value;
+    }
+
+    String alteredValue;
+    // For Numeric Data Type(INT, LONG, DOUBLE, FLOAT) value longer than METADATA_PROPERTY_LENGTH_LIMIT is not possible.
+    switch (dataType) {
+      case STRING:
+      case JSON:
+        if (isMax) {
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT - 1)
+              + (char) (value.charAt(METADATA_PROPERTY_LENGTH_LIMIT - 1) + 1);
+        } else {
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT);
+        }
+        break;
+      case BYTES:
+        byte[] shortByteValue = Arrays.copyOf(BytesUtils.toBytes(value), (METADATA_PROPERTY_LENGTH_LIMIT / 2));
+        if (isMax) {
+          shortByteValue[METADATA_PROPERTY_LENGTH_LIMIT / 2 - 1] += 1;

Review Comment:
   It won't cause an error, but will cause wrong behavior because it is no longer larger than the original value.
   The byte comparison is performed on the unsigned value, so we need to prevent doing `+1` to `0xFF` which will result in `0`. We can skip all `0xFF` until we find a value that is smaller, then replace it with `0xFF`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10990: Updated code for setting value of segment min/max property.

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10990:
URL: https://github.com/apache/pinot/pull/10990#discussion_r1271577239


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -619,4 +645,52 @@ public void close()
     creators.addAll(_dictionaryCreatorMap.values());
     FileUtils.close(creators);
   }
+
+  /**
+   * Returns the original string if its length is within the allowed limit.
+   * If the string's length exceeds the limit,
+   * it returns a truncated version of the string with maintaining min or max value.
+   *
+   */
+  @VisibleForTesting
+  static String getValueWithinLengthLimit(String value, boolean isMax, DataType dataType) {
+    int length = value.length();
+
+    // if length is less, no need of trimming the value.
+    if (length < METADATA_PROPERTY_LENGTH_LIMIT) {
+      return value;
+    }
+
+    String alteredValue;
+    // For Numeric Data Type(INT, LONG, DOUBLE, FLOAT) value longer than METADATA_PROPERTY_LENGTH_LIMIT is not possible.
+    switch (dataType) {
+      case STRING:
+      case JSON:
+        if (isMax) {
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT - 1)
+              + (char) (value.charAt(METADATA_PROPERTY_LENGTH_LIMIT - 1) + 1);

Review Comment:
   `z` is not good enough because it is actually a very small value (122) comparing to the 16-bit unicode space. I tried appending '\uFFFF' and seems `PropertiesConfiguration` can store and retrieve it properly. We can just loop over the characters, and replace the first one that is not '\uFFFF' to '\uFFFF'



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on pull request #10990: Updated code for setting value of segment min/max property.

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on PR #10990:
URL: https://github.com/apache/pinot/pull/10990#issuecomment-1623006680

   > Question: why do the min/max values with leading or trailing ` ` or with `,` are invalid?
   
   Good question! IIRC, the limitation is introduced by the `PropertiesConfiguration` (from Apache Common library). Maybe we can work-around this limitation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] abhioncbr closed pull request #10990: Updated code for setting value of segment min/max property.

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr closed pull request #10990: Updated code for setting value of segment min/max property.
URL: https://github.com/apache/pinot/pull/10990


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] abhioncbr commented on a diff in pull request #10990: Updated code to getting valid value for segment min/max property.

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr commented on code in PR #10990:
URL: https://github.com/apache/pinot/pull/10990#discussion_r1244585380


##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreatorTest.java:
##########
@@ -174,21 +202,52 @@ private static long getStartTimeInSegmentMetadata(String testDateTimeFormat, Str
   }
 
   @Test
-  public void testAddMinMaxValueInvalid() {
+  public void testAddMinMaxValue() {
     PropertiesConfiguration props = new PropertiesConfiguration();
     SegmentColumnarIndexCreator.addColumnMinMaxValueInfo(props, "colA", "bar", "foo");
     Assert.assertFalse(Boolean.parseBoolean(
-        String.valueOf(props.getProperty(Column.getKeyFor("colA", Column.MIN_MAX_VALUE_INVALID)))));
+        String.valueOf(props.getProperty(getKeyFor("colA", Column.MIN_MAX_VALUE_INVALID)))));
 
     props = new PropertiesConfiguration();
     SegmentColumnarIndexCreator.addColumnMinMaxValueInfo(props, "colA", ",bar", "foo");
-    Assert.assertTrue(Boolean.parseBoolean(
-        String.valueOf(props.getProperty(Column.getKeyFor("colA", Column.MIN_MAX_VALUE_INVALID)))));
+    Assert.assertFalse(Boolean.parseBoolean(
+        String.valueOf(props.getProperty(getKeyFor("colA", Column.MIN_MAX_VALUE_INVALID)))));
+    Assert.assertEquals(String.valueOf(props.getProperty(getKeyFor("colA", MIN_VALUE))), "bar");
 
     props = new PropertiesConfiguration();
     SegmentColumnarIndexCreator.addColumnMinMaxValueInfo(props, "colA", "bar", "  ");

Review Comment:
   @Jackie-Jiang Do we consider an empty string as a valid value? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang merged pull request #10990: Updated code for setting value of segment min/max property.

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang merged PR #10990:
URL: https://github.com/apache/pinot/pull/10990


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10990: Updated code for setting value of segment min/max property.

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10990:
URL: https://github.com/apache/pinot/pull/10990#discussion_r1277977517


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/metadata/ColumnMetadataImpl.java:
##########
@@ -251,8 +251,10 @@ public static ColumnMetadataImpl fromPropertiesConfiguration(String column, Prop
     // NOTE: Use getProperty() instead of getString() to avoid variable substitution ('${anotherKey}'), which can cause
     //       problem for special values such as '$${' where the first '$' is identified as escape character.
     // TODO: Use getProperty() for other properties as well to avoid the overhead of variable substitution
-    String minString = (String) config.getProperty(Column.getKeyFor(column, Column.MIN_VALUE));
-    String maxString = (String) config.getProperty(Column.getKeyFor(column, Column.MAX_VALUE));
+    String minString = StringEscapeUtils.unescapeJava((String)

Review Comment:
   This can cause problem when the original value has `\` because we didn't call `escapeJava()` when writing the value. Find this issue in this test [run](https://github.com/apache/pinot/actions/runs/5691584661/job/15427035722?pr=11206). I think we should just use `escapeJava()` to solve the whitespace and unicode issue



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10990: Updated code for setting value of segment min/max property.

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10990:
URL: https://github.com/apache/pinot/pull/10990#discussion_r1271267843


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -619,4 +645,44 @@ public void close()
     creators.addAll(_dictionaryCreatorMap.values());
     FileUtils.close(creators);
   }
+
+  /**
+   * Returns the original string if its length is within the allowed limit.
+   * If the string's length exceeds the limit,
+   * it returns a truncated version of the string with maintaining min or max value.
+   *
+   */
+  @VisibleForTesting
+  static String getValueWithinLengthLimit(String value, boolean isMax, DataType dataType) {
+    int length = value.length();
+
+    // if length is less, no need of trimming the value.
+    if (length <= METADATA_PROPERTY_LENGTH_LIMIT) {
+      return value;
+    }
+
+    String alteredValue;
+    // For Numeric Data Type(INT, LONG, DOUBLE, FLOAT) value longer than METADATA_PROPERTY_LENGTH_LIMIT is not possible.
+    switch (dataType) {
+      case STRING:
+      case JSON:
+        if (isMax) {
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT - 1)
+              + (char) (value.charAt(METADATA_PROPERTY_LENGTH_LIMIT - 1) + 1);
+        } else {
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT);
+        }
+        break;
+      case BYTES:
+        byte[] shortByteValue = Arrays.copyOf(BytesUtils.toBytes(value), (METADATA_PROPERTY_LENGTH_LIMIT / 2));
+        if (isMax) {
+          shortByteValue[METADATA_PROPERTY_LENGTH_LIMIT / 2 - 1] += 1;

Review Comment:
   This still can overflow



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10990: Updated code for setting value of segment min/max property.

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10990:
URL: https://github.com/apache/pinot/pull/10990#discussion_r1271573773


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -619,4 +645,44 @@ public void close()
     creators.addAll(_dictionaryCreatorMap.values());
     FileUtils.close(creators);
   }
+
+  /**
+   * Returns the original string if its length is within the allowed limit.
+   * If the string's length exceeds the limit,
+   * it returns a truncated version of the string with maintaining min or max value.
+   *
+   */
+  @VisibleForTesting
+  static String getValueWithinLengthLimit(String value, boolean isMax, DataType dataType) {
+    int length = value.length();
+
+    // if length is less, no need of trimming the value.
+    if (length <= METADATA_PROPERTY_LENGTH_LIMIT) {
+      return value;
+    }
+
+    String alteredValue;
+    // For Numeric Data Type(INT, LONG, DOUBLE, FLOAT) value longer than METADATA_PROPERTY_LENGTH_LIMIT is not possible.
+    switch (dataType) {
+      case STRING:
+      case JSON:
+        if (isMax) {
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT - 1)
+              + (char) (value.charAt(METADATA_PROPERTY_LENGTH_LIMIT - 1) + 1);
+        } else {
+          alteredValue = value.substring(0, METADATA_PROPERTY_LENGTH_LIMIT);
+        }
+        break;
+      case BYTES:
+        byte[] shortByteValue = Arrays.copyOf(BytesUtils.toBytes(value), (METADATA_PROPERTY_LENGTH_LIMIT / 2));
+        if (isMax) {
+          shortByteValue[METADATA_PROPERTY_LENGTH_LIMIT / 2 - 1] += 1;

Review Comment:
   It won't cause an error, but will cause wrong behavior because it is no longer larger than the original value.
   The byte comparison is performed on the unsigned value, so we need to prevent doing `+1` to `0xFF` which will result in `0`. We can skip all `0xFF` until we find a value that is smaller



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org