You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "abhioncbr (via GitHub)" <gi...@apache.org> on 2023/07/29 02:30:44 UTC

[GitHub] [pinot] abhioncbr opened a new pull request, #11218: Fix for escape/unescape issue for segment min/max values.

abhioncbr opened a new pull request, #11218:
URL: https://github.com/apache/pinot/pull/11218

   - This is a follow-up PR for fixing the [issue](https://github.com/apache/pinot/pull/10990/files#r1277977517) related to the introduction of `unescapeJava` while reading the data from the property file. 
   - The fix is based on escaping the value while writing to handle the cases if some characters are present in the original value, adding the flag in the property file to signal that the values are escaped. 
   - Default value of the flag is false.
   - Added unit test case based on random ASCII value to test such possible issues.
   
   cc @Jackie-Jiang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] abhioncbr commented on a diff in pull request #11218: Fix for escape/unescape issue for segment min/max values.

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr commented on code in PR #11218:
URL: https://github.com/apache/pinot/pull/11218#discussion_r1278588943


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -605,30 +615,30 @@ static boolean isValidPropertyValue(String value) {
   @VisibleForTesting
   static String getValidPropertyValue(String value, boolean isMax, DataType dataType) {
     String valueWithinLengthLimit = getValueWithinLengthLimit(value, isMax, dataType);
-    int length = valueWithinLengthLimit.length();
+    String escapedValue = StringEscapeUtils.escapeJava(valueWithinLengthLimit);
+    int length = escapedValue.length();
 
     if (length > 0) {
       // check and replace first character if it's a white space
-      if (Character.isWhitespace(valueWithinLengthLimit.charAt(0))) {
-        String unicodeValue = "\\u" + String.format("%04x", (int) valueWithinLengthLimit.charAt(0));
-        valueWithinLengthLimit = unicodeValue + valueWithinLengthLimit.substring(1);
+      if (Character.isWhitespace(escapedValue.charAt(0))) {

Review Comment:
   I did further analysis, are here are my findings
   - we only need the unescapeJava in `ColumnMetadataImpl` for converting back the Unicode whitespace character into the original value we are adding [here](https://github.com/apache/pinot/pull/11218/files#diff-f77f250efb4d56b596a7991f65ff00139fb73463111619d64741bec4306efa1aR626) and [here](https://github.com/apache/pinot/pull/11218/files#diff-f77f250efb4d56b596a7991f65ff00139fb73463111619d64741bec4306efa1aR633). For comma it's already handled in properties-configuration enhanced [unescapeJava](https://github.com/apache/commons-configuration/blob/RELEASE_1_10_BRANCH/src/main/java/org/apache/commons/configuration/PropertiesConfiguration.java#L1376) method.
   
   IMO, let's avoid using escapeJava and unescapeJava. We don't need to introduce it just for handling the leading or trailing whitespace. For handling leading or trailing whitespace, here are a couple of options. 
   
   1. Instead of replacing the whitespace character value with its Unicode value, we can add an extra `\` at the start or end. We will remove this additional `/` based on the flag property( say `isWhitespacePreserved`) value(since we are already good with escape flag property)
   2. The correct way of handling this is using [custom properties readers and writers](https://commons.apache.org/proper/commons-configuration/userguide/howto_properties.html). However, I think this requires properties-configuration version `2.x`. We have already identified the [work](https://github.com/apache/pinot/issues/11085), so we can wait for the upgrade to be done and drop the handling of whitespace for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11218: Fix for escape/unescape issue for segment min/max values.

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11218:
URL: https://github.com/apache/pinot/pull/11218#discussion_r1278227361


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -605,30 +615,30 @@ static boolean isValidPropertyValue(String value) {
   @VisibleForTesting
   static String getValidPropertyValue(String value, boolean isMax, DataType dataType) {
     String valueWithinLengthLimit = getValueWithinLengthLimit(value, isMax, dataType);
-    int length = valueWithinLengthLimit.length();
+    String escapedValue = StringEscapeUtils.escapeJava(valueWithinLengthLimit);
+    int length = escapedValue.length();
 
     if (length > 0) {
       // check and replace first character if it's a white space
-      if (Character.isWhitespace(valueWithinLengthLimit.charAt(0))) {
-        String unicodeValue = "\\u" + String.format("%04x", (int) valueWithinLengthLimit.charAt(0));
-        valueWithinLengthLimit = unicodeValue + valueWithinLengthLimit.substring(1);
+      if (Character.isWhitespace(escapedValue.charAt(0))) {

Review Comment:
   Will escape handle whitespace? If so, we don't need to explicitly handle whitespace



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -571,8 +572,17 @@ public static void addColumnMetadataInfo(PropertiesConfiguration properties, Str
 
   public static void addColumnMinMaxValueInfo(PropertiesConfiguration properties, String column, String minValue,
       String maxValue, DataType dataType) {
-    properties.setProperty(getKeyFor(column, MIN_VALUE), getValidPropertyValue(minValue, false, dataType));
-    properties.setProperty(getKeyFor(column, MAX_VALUE), getValidPropertyValue(maxValue, true, dataType));
+    String validMinValue = getValidPropertyValue(minValue, false, dataType);
+    properties.setProperty(getKeyFor(column, MIN_VALUE), validMinValue);
+
+    String validMaxValue = getValidPropertyValue(maxValue, true, dataType);
+    properties.setProperty(getKeyFor(column, MAX_VALUE), validMaxValue);
+
+    // setting the property for signifying that the mon or max value is escaped.
+    if (validMinValue.compareTo(minValue) != 0

Review Comment:
   We want to compare the result before and after the escape. With current code, we will count it as escaped even when it is just modified but not escaped.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -571,8 +572,17 @@ public static void addColumnMetadataInfo(PropertiesConfiguration properties, Str
 
   public static void addColumnMinMaxValueInfo(PropertiesConfiguration properties, String column, String minValue,
       String maxValue, DataType dataType) {
-    properties.setProperty(getKeyFor(column, MIN_VALUE), getValidPropertyValue(minValue, false, dataType));
-    properties.setProperty(getKeyFor(column, MAX_VALUE), getValidPropertyValue(maxValue, true, dataType));
+    String validMinValue = getValidPropertyValue(minValue, false, dataType);
+    properties.setProperty(getKeyFor(column, MIN_VALUE), validMinValue);
+
+    String validMaxValue = getValidPropertyValue(maxValue, true, dataType);
+    properties.setProperty(getKeyFor(column, MAX_VALUE), validMaxValue);
+
+    // setting the property for signifying that the mon or max value is escaped.
+    if (validMinValue.compareTo(minValue) != 0
+        || validMaxValue.compareTo(maxValue) != 0) {
+      properties.setProperty(getKeyFor(column, IS_MIN_MAX_VALUE_ESCAPED), true);

Review Comment:
   After a second thought, keep `iS_MIN_VALUE_ESCAPED` and `IS_MAX_VALUE_ESCAPED` separately might be better



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] abhioncbr commented on a diff in pull request #11218: Fix for escape/unescape issue for segment min/max values.

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr commented on code in PR #11218:
URL: https://github.com/apache/pinot/pull/11218#discussion_r1279136440


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -605,30 +615,30 @@ static boolean isValidPropertyValue(String value) {
   @VisibleForTesting
   static String getValidPropertyValue(String value, boolean isMax, DataType dataType) {
     String valueWithinLengthLimit = getValueWithinLengthLimit(value, isMax, dataType);
-    int length = valueWithinLengthLimit.length();
+    String escapedValue = StringEscapeUtils.escapeJava(valueWithinLengthLimit);
+    int length = escapedValue.length();
 
     if (length > 0) {
       // check and replace first character if it's a white space
-      if (Character.isWhitespace(valueWithinLengthLimit.charAt(0))) {
-        String unicodeValue = "\\u" + String.format("%04x", (int) valueWithinLengthLimit.charAt(0));
-        valueWithinLengthLimit = unicodeValue + valueWithinLengthLimit.substring(1);
+      if (Character.isWhitespace(escapedValue.charAt(0))) {

Review Comment:
   Thanks, closing the pr.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11218: Fix for escape/unescape issue for segment min/max values.

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11218:
URL: https://github.com/apache/pinot/pull/11218#discussion_r1278801877


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -605,30 +615,30 @@ static boolean isValidPropertyValue(String value) {
   @VisibleForTesting
   static String getValidPropertyValue(String value, boolean isMax, DataType dataType) {
     String valueWithinLengthLimit = getValueWithinLengthLimit(value, isMax, dataType);
-    int length = valueWithinLengthLimit.length();
+    String escapedValue = StringEscapeUtils.escapeJava(valueWithinLengthLimit);
+    int length = escapedValue.length();
 
     if (length > 0) {
       // check and replace first character if it's a white space
-      if (Character.isWhitespace(valueWithinLengthLimit.charAt(0))) {
-        String unicodeValue = "\\u" + String.format("%04x", (int) valueWithinLengthLimit.charAt(0));
-        valueWithinLengthLimit = unicodeValue + valueWithinLengthLimit.substring(1);
+      if (Character.isWhitespace(escapedValue.charAt(0))) {

Review Comment:
   Thanks for taking time to do the research. Great finding that the value is already escaped in the property file.
   I tried explicitly escape the `","` by replacing it with `"\\,"`, but it doesn't work if it is preceded by a backslash (I tried replacing `"\\,"` with `"\\\\\\,"` which works in memory, but once the file is saved and read again, it breaks).
   To solve both special cases (leading/trailing space and comma), one workaround we can take is to use a preserved character to replace them. Here we can use `"\0"` because it is preserved as the padding character for string values, and string values cannot include it (See `StringUtil.sanitizeStringValue()` where we explicitly remove it).
   Implemented the idea in #11223 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11218: Fix for escape/unescape issue for segment min/max values.

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11218:
URL: https://github.com/apache/pinot/pull/11218#discussion_r1278375607


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -605,30 +615,30 @@ static boolean isValidPropertyValue(String value) {
   @VisibleForTesting
   static String getValidPropertyValue(String value, boolean isMax, DataType dataType) {
     String valueWithinLengthLimit = getValueWithinLengthLimit(value, isMax, dataType);
-    int length = valueWithinLengthLimit.length();
+    String escapedValue = StringEscapeUtils.escapeJava(valueWithinLengthLimit);
+    int length = escapedValue.length();
 
     if (length > 0) {
       // check and replace first character if it's a white space
-      if (Character.isWhitespace(valueWithinLengthLimit.charAt(0))) {
-        String unicodeValue = "\\u" + String.format("%04x", (int) valueWithinLengthLimit.charAt(0));
-        valueWithinLengthLimit = unicodeValue + valueWithinLengthLimit.substring(1);
+      if (Character.isWhitespace(escapedValue.charAt(0))) {

Review Comment:
   Can you give an example of the whitespace that is not skipped? Seems everything < 32 or > 0x7F are skipped



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] abhioncbr commented on a diff in pull request #11218: Fix for escape/unescape issue for segment min/max values.

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr commented on code in PR #11218:
URL: https://github.com/apache/pinot/pull/11218#discussion_r1278214257


##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreatorTest.java:
##########
@@ -250,6 +253,16 @@ public void testAddMinMaxValue() {
         stringMaxValue, FieldSpec.DataType.STRING);
     compareLongValuesWithColumnMinMax(stringMinValue, stringMaxValue, props, FieldSpec.DataType.STRING);
 
+    // test for value length grater than METADATA_PROPERTY_LENGTH_LIMIT with random string having ascii characters.

Review Comment:
   Added ASCII-based random string to test the scenario if the original values have characters which need escape.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] abhioncbr commented on a diff in pull request #11218: Fix for escape/unescape issue for segment min/max values.

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr commented on code in PR #11218:
URL: https://github.com/apache/pinot/pull/11218#discussion_r1278367704


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -605,30 +615,30 @@ static boolean isValidPropertyValue(String value) {
   @VisibleForTesting
   static String getValidPropertyValue(String value, boolean isMax, DataType dataType) {
     String valueWithinLengthLimit = getValueWithinLengthLimit(value, isMax, dataType);
-    int length = valueWithinLengthLimit.length();
+    String escapedValue = StringEscapeUtils.escapeJava(valueWithinLengthLimit);
+    int length = escapedValue.length();
 
     if (length > 0) {
       // check and replace first character if it's a white space
-      if (Character.isWhitespace(valueWithinLengthLimit.charAt(0))) {
-        String unicodeValue = "\\u" + String.format("%04x", (int) valueWithinLengthLimit.charAt(0));
-        valueWithinLengthLimit = unicodeValue + valueWithinLengthLimit.substring(1);
+      if (Character.isWhitespace(escapedValue.charAt(0))) {

Review Comment:
   Escape only handles a few of whitespace characters not all. We also need to escape leading and trailing ` `, which is not possible through escape function. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #11218: Fix for escape/unescape issue for segment min/max values.

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #11218:
URL: https://github.com/apache/pinot/pull/11218#issuecomment-1656532904

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/11218?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#11218](https://app.codecov.io/gh/apache/pinot/pull/11218?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (b7b1bb3) into [master](https://app.codecov.io/gh/apache/pinot/commit/b69f438a79716179c3a7c8161a67d675bdd9909e?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (b69f438) will **increase** coverage by `0.00%`.
   > Report is 4 commits behind head on master.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@            Coverage Diff            @@
   ##           master   #11218     +/-   ##
   =========================================
     Coverage    0.11%    0.11%             
   =========================================
     Files        2226     2172     -54     
     Lines      119357   117077   -2280     
     Branches    18059    17792    -267     
   =========================================
     Hits          137      137             
   + Misses     119200   116920   -2280     
     Partials       20       20             
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1temurin11 | `?` | |
   | integration1temurin17 | `?` | |
   | integration1temurin20 | `?` | |
   | integration2temurin11 | `?` | |
   | integration2temurin17 | `?` | |
   | integration2temurin20 | `?` | |
   | unittests1temurin11 | `?` | |
   | unittests1temurin17 | `?` | |
   | unittests1temurin20 | `?` | |
   | unittests2temurin11 | `?` | |
   | unittests2temurin17 | `0.11% <0.00%> (-0.01%)` | :arrow_down: |
   | unittests2temurin20 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Files Changed](https://app.codecov.io/gh/apache/pinot/pull/11218?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [...ment/creator/impl/SegmentColumnarIndexCreator.java](https://app.codecov.io/gh/apache/pinot/pull/11218?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2NyZWF0b3IvaW1wbC9TZWdtZW50Q29sdW1uYXJJbmRleENyZWF0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...java/org/apache/pinot/segment/spi/V1Constants.java](https://app.codecov.io/gh/apache/pinot/pull/11218?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL1YxQ29uc3RhbnRzLmphdmE=) | `0.00% <ø> (ø)` | |
   | [...segment/spi/index/metadata/ColumnMetadataImpl.java](https://app.codecov.io/gh/apache/pinot/pull/11218?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L21ldGFkYXRhL0NvbHVtbk1ldGFkYXRhSW1wbC5qYXZh) | `0.00% <0.00%> (ø)` | |
   
   ... and [73 files with indirect coverage changes](https://app.codecov.io/gh/apache/pinot/pull/11218/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] abhioncbr commented on a diff in pull request #11218: Fix for escape/unescape issue for segment min/max values.

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr commented on code in PR #11218:
URL: https://github.com/apache/pinot/pull/11218#discussion_r1278367879


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -571,8 +572,17 @@ public static void addColumnMetadataInfo(PropertiesConfiguration properties, Str
 
   public static void addColumnMinMaxValueInfo(PropertiesConfiguration properties, String column, String minValue,
       String maxValue, DataType dataType) {
-    properties.setProperty(getKeyFor(column, MIN_VALUE), getValidPropertyValue(minValue, false, dataType));
-    properties.setProperty(getKeyFor(column, MAX_VALUE), getValidPropertyValue(maxValue, true, dataType));
+    String validMinValue = getValidPropertyValue(minValue, false, dataType);
+    properties.setProperty(getKeyFor(column, MIN_VALUE), validMinValue);
+
+    String validMaxValue = getValidPropertyValue(maxValue, true, dataType);
+    properties.setProperty(getKeyFor(column, MAX_VALUE), validMaxValue);
+
+    // setting the property for signifying that the mon or max value is escaped.
+    if (validMinValue.compareTo(minValue) != 0

Review Comment:
   We need to use the unescape if we add `\\u` based values to handle whitespace and hence need to account for modified value as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] abhioncbr commented on a diff in pull request #11218: Fix for escape/unescape issue for segment min/max values.

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr commented on code in PR #11218:
URL: https://github.com/apache/pinot/pull/11218#discussion_r1278470408


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -605,30 +615,30 @@ static boolean isValidPropertyValue(String value) {
   @VisibleForTesting
   static String getValidPropertyValue(String value, boolean isMax, DataType dataType) {
     String valueWithinLengthLimit = getValueWithinLengthLimit(value, isMax, dataType);
-    int length = valueWithinLengthLimit.length();
+    String escapedValue = StringEscapeUtils.escapeJava(valueWithinLengthLimit);
+    int length = escapedValue.length();
 
     if (length > 0) {
       // check and replace first character if it's a white space
-      if (Character.isWhitespace(valueWithinLengthLimit.charAt(0))) {
-        String unicodeValue = "\\u" + String.format("%04x", (int) valueWithinLengthLimit.charAt(0));
-        valueWithinLengthLimit = unicodeValue + valueWithinLengthLimit.substring(1);
+      if (Character.isWhitespace(escapedValue.charAt(0))) {

Review Comment:
   Here is my understanding of the problem.
   
   - If the original value has leading and trailing whitespace characters, we must escape them to get the same value from the saved property file. This issue happens because of the String method `trim` usage. Here are the properties-configuration codepoints, reading the entire [line](https://github.com/apache/commons-configuration/blob/RELEASE_1_10_BRANCH/src/main/java/org/apache/commons/configuration/PropertiesConfiguration.java#L748) and value [parse](https://github.com/apache/commons-configuration/blob/RELEASE_1_10_BRANCH/src/main/java/org/apache/commons/configuration/PropertiesConfiguration.java#L927)
   
   - Here is the valid example code for which it fails with the use of escaping as well, 
   ```java
   public static void main(String[] args)
         throws ConfigurationException {
       writeProperty();
       readProperty();
     }
   
     public static void writeProperty()
         throws ConfigurationException {
       // case 1: fails for leading space, we need to preserve the leading space.
       PropertiesConfiguration configuration = new PropertiesConfiguration(new File("test"));
   
       String valueWithLeadingSpace = "  valueWithLeadingSpace";
       String escapedValueWithLeadingSpace = StringEscapeUtils.escapeJava(valueWithLeadingSpace);
       System.out.println(escapedValueWithLeadingSpace); // output: '  valueWithLeadingSpace'
       configuration.setProperty("valueWithLeadingSpace", escapedValueWithLeadingSpace);
   
   
       // case 2: fails for comma, we need to escape commas
       String valueWithComma = "value,with,comma";
       String escapedValueWithComma =  StringEscapeUtils.escapeJava(valueWithComma);
       System.out.println(escapedValueWithComma); // output: 'value,with,comma'
       configuration.setProperty("valueWithComma", escapedValueWithComma);
   
   
       configuration.save();
     }
   
     public static void readProperty()
         throws ConfigurationException {
       PropertiesConfiguration configuration = new PropertiesConfiguration(new File("test"));
   
       String valueFromPropertyFile = (String) configuration.getProperty("valueWithLeadingSpace");
       System.out.println(valueFromPropertyFile);  // output: 'valueWithLeadingSpace'
       String unescapedValueWithLeadingSpace = StringEscapeUtils.unescapeJava(valueFromPropertyFile);
       System.out.println(unescapedValueWithLeadingSpace); // output: 'valueWithLeadingSpace'
   
       String valueWithCommaFromPropertyFile = (String) configuration.getProperty("valueWithComma"); // throws exception here considering it as array
       System.out.println(valueWithCommaFromPropertyFile);
       String unescapedValueWithCommaFromPropertyFile = StringEscapeUtils.unescapeJava(valueWithCommaFromPropertyFile);
       System.out.println(unescapedValueWithCommaFromPropertyFile);
     }
   ```
   For handling the above cases, we are adding the Unicode-based string value of the whitespace character and unescaping it while reading it back.
   
   please let me know, if I am missing something or my interpretation is incorrect. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] abhioncbr commented on a diff in pull request #11218: Fix for escape/unescape issue for segment min/max values.

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr commented on code in PR #11218:
URL: https://github.com/apache/pinot/pull/11218#discussion_r1278504476


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -605,30 +615,30 @@ static boolean isValidPropertyValue(String value) {
   @VisibleForTesting
   static String getValidPropertyValue(String value, boolean isMax, DataType dataType) {
     String valueWithinLengthLimit = getValueWithinLengthLimit(value, isMax, dataType);
-    int length = valueWithinLengthLimit.length();
+    String escapedValue = StringEscapeUtils.escapeJava(valueWithinLengthLimit);
+    int length = escapedValue.length();
 
     if (length > 0) {
       // check and replace first character if it's a white space
-      if (Character.isWhitespace(valueWithinLengthLimit.charAt(0))) {
-        String unicodeValue = "\\u" + String.format("%04x", (int) valueWithinLengthLimit.charAt(0));
-        valueWithinLengthLimit = unicodeValue + valueWithinLengthLimit.substring(1);
+      if (Character.isWhitespace(escapedValue.charAt(0))) {

Review Comment:
   Just found out that properties-configuration is internally also using [escape](https://github.com/apache/commons-configuration/blob/RELEASE_1_10_BRANCH/src/main/java/org/apache/commons/configuration/PropertiesConfiguration.java#L1210) and [unescape](https://github.com/apache/commons-configuration/blob/RELEASE_1_10_BRANCH/src/main/java/org/apache/commons/configuration/PropertiesConfiguration.java#L1376)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] abhioncbr closed pull request #11218: Fix for escape/unescape issue for segment min/max values.

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr closed pull request #11218: Fix for escape/unescape issue for segment min/max values.
URL: https://github.com/apache/pinot/pull/11218


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11218: Fix for escape/unescape issue for segment min/max values.

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11218:
URL: https://github.com/apache/pinot/pull/11218#discussion_r1278375741


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -571,8 +572,19 @@ public static void addColumnMetadataInfo(PropertiesConfiguration properties, Str
 
   public static void addColumnMinMaxValueInfo(PropertiesConfiguration properties, String column, String minValue,
       String maxValue, DataType dataType) {
-    properties.setProperty(getKeyFor(column, MIN_VALUE), getValidPropertyValue(minValue, false, dataType));
-    properties.setProperty(getKeyFor(column, MAX_VALUE), getValidPropertyValue(maxValue, true, dataType));
+    String validMinValue = getValidPropertyValue(minValue, false, dataType);
+    properties.setProperty(getKeyFor(column, MIN_VALUE), validMinValue);
+    // setting the property for signifying that the min value is escaped.
+    if (validMinValue.compareTo(minValue) != 0) {

Review Comment:
   We want to compare value within the `getValidPropertyValue` because even the value is changed, it might not be escaped (e.g. only truncated). Also we can use `equals()` instead of `compareTo()` which is cheaper



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] abhioncbr commented on a diff in pull request #11218: Fix for escape/unescape issue for segment min/max values.

Posted by "abhioncbr (via GitHub)" <gi...@apache.org>.
abhioncbr commented on code in PR #11218:
URL: https://github.com/apache/pinot/pull/11218#discussion_r1278214186


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -605,30 +615,30 @@ static boolean isValidPropertyValue(String value) {
   @VisibleForTesting
   static String getValidPropertyValue(String value, boolean isMax, DataType dataType) {
     String valueWithinLengthLimit = getValueWithinLengthLimit(value, isMax, dataType);
-    int length = valueWithinLengthLimit.length();
+    String escapedValue = StringEscapeUtils.escapeJava(valueWithinLengthLimit);

Review Comment:
   Escaping string here because we are adding the Unicode value of whitespace on line 624 and 630 and escape later leading it to failure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org