You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "Aravind-Suresh (via GitHub)" <gi...@apache.org> on 2023/06/12 16:58:50 UTC

[GitHub] [pinot] Aravind-Suresh opened a new pull request, #10897: Fixes SQL wildcard escaping in LIKE queries

Aravind-Suresh opened a new pull request, #10897:
URL: https://github.com/apache/pinot/pull/10897

   Potential fix for #10849.
   
   **Approach**: The escaping of SQL wildcards (`\_`, `\%`) is handled as a special case. The escaping of `\` is handled separately from the escaping of other meta characters.
   
   **Test plan**: Existing unit tests pass. Added unit tests to validate the specific case that this fix solves.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] chenboat commented on a diff in pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "chenboat (via GitHub)" <gi...@apache.org>.
chenboat commented on code in PR #10897:
URL: https://github.com/apache/pinot/pull/10897#discussion_r1235876975


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,27 +70,54 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
+    likePattern = likePattern.substring(start, end);
+    StringBuilder sb = new StringBuilder();
     sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
 
+    // handling SQL wildcards by replacing them with corresponding regex equivalents

Review Comment:
   Can you be more specific on what characters are wildchards in SQL? Please give a complete list.



##########
pinot-common/src/test/java/org/apache/pinot/common/utils/RegexpPatternConverterUtilsTest.java:
##########
@@ -125,4 +125,27 @@ public void testTrailingSize2() {
     String regexpLikePattern = RegexpPatternConverterUtils.likeToRegexpLike("z%");
     assertEquals(regexpLikePattern, "^z");
   }
+
+  @Test
+  public void testEscapedWildcard1() {
+    String regexpLikePattern = RegexpPatternConverterUtils.likeToRegexpLike("a\\_b_\\");
+    assertEquals(regexpLikePattern, "^a\\_b.\\\\$");
+    String luceneRegExpPattern = RegexpPatternConverterUtils.regexpLikeToLuceneRegExp(regexpLikePattern);
+    assertEquals(luceneRegExpPattern, "a\\_b.\\\\");

Review Comment:
   Ditto as above.



##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,27 +70,54 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
+    likePattern = likePattern.substring(start, end);
+    StringBuilder sb = new StringBuilder();
     sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
 
+    // handling SQL wildcards by replacing them with corresponding regex equivalents
+    // we ignore them if the SQL wildcards are escaped
     int i = 0;
-    while (i < sb.length()) {
-      char c = sb.charAt(i);
+    boolean isPrevCharBackSlash = false;

Review Comment:
   Can we extract all codes below into a function about escape char handling? It is pretty long at the current form.



##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,27 +70,54 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
+    likePattern = likePattern.substring(start, end);
+    StringBuilder sb = new StringBuilder();
     sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
 
+    // handling SQL wildcards by replacing them with corresponding regex equivalents
+    // we ignore them if the SQL wildcards are escaped
     int i = 0;
-    while (i < sb.length()) {
-      char c = sb.charAt(i);
+    boolean isPrevCharBackSlash = false;
+    while (i < likePattern.length()) {
+      char c = likePattern.charAt(i);
       if (c == '_') {
-        sb.replace(i, i + 1, ".");
+        sb.append(isPrevCharBackSlash ? c : ".");
       } else if (c == '%') {
-        sb.replace(i, i + 1, ".*");
-        i++;
+        sb.append(isPrevCharBackSlash ? c : ".*");
+      } else if (indexOf(REGEXP_METACHARACTERS, c) >= 0) {
+        sb.append(BACK_SLASH).append(c);
+      } else {
+        if (isPrevCharBackSlash) {
+          // this means the previous character is a \
+          // but it was not used for escaping SQL wildcards
+          // so let's escape this \ in the output
+          // this case is separately handled outside of the meta characters list
+          sb.append(BACK_SLASH);
+        }
+        sb.append(c);
       }
-      i++;
+      isPrevCharBackSlash = (c == BACK_SLASH);
+      ++i;
+    }
+
+    // handle trailing \
+    if (isPrevCharBackSlash) {
+      sb.append(BACK_SLASH);
     }
 
+    sb.append(suffix);
     return sb.toString();
   }
 
+  private static int indexOf(char[] arr, char c) {

Review Comment:
   why re-imple indexOf rather than using existing lib such as
   
   https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/IterableUtils.html#find-java.lang.Iterable-org.apache.commons.collections4.Predicate-



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Aravind-Suresh commented on a diff in pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "Aravind-Suresh (via GitHub)" <gi...@apache.org>.
Aravind-Suresh commented on code in PR #10897:
URL: https://github.com/apache/pinot/pull/10897#discussion_r1236882803


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,24 +72,56 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
-    sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
+    likePattern = likePattern.substring(start, end);
+    return escapeMetaCharsAndWildcards(likePattern, prefix, suffix);
+  }
 
+  /**
+   * Escapes the provided pattern by considering the following constraints:
+   * <ul>
+   *     <li> SQL wildcards escaping is handled (_, %) </li>
+   *     <li> Regex meta characters escaping is handled </li>
+   * </ul>
+   * @param input the provided input string
+   * @param prefix the prefix to be added to the output string
+   * @param suffix the suffix to be added to the output string
+   * @return the final output string
+   */
+  private static String escapeMetaCharsAndWildcards(String input, String prefix, String suffix) {
+    StringBuilder sb = new StringBuilder();
+    sb.append(prefix);
+    // handling SQL wildcards (_, %) by replacing them with corresponding regex equivalents
+    // we ignore them if the SQL wildcards are escaped
     int i = 0;
-    while (i < sb.length()) {
-      char c = sb.charAt(i);
+    boolean isPrevCharBackSlash = false;
+    while (i < input.length()) {
+      char c = input.charAt(i);
       if (c == '_') {
-        sb.replace(i, i + 1, ".");
+        sb.append(isPrevCharBackSlash ? c : ".");
       } else if (c == '%') {
-        sb.replace(i, i + 1, ".*");
-        i++;
+        sb.append(isPrevCharBackSlash ? c : ".*");
+      } else if (Chars.indexOf(REGEXP_METACHARACTERS, c) >= 0) {

Review Comment:
   Did a minor refactor to fix this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Aravind-Suresh commented on a diff in pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "Aravind-Suresh (via GitHub)" <gi...@apache.org>.
Aravind-Suresh commented on code in PR #10897:
URL: https://github.com/apache/pinot/pull/10897#discussion_r1228971813


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,24 +71,42 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
+    likePattern = likePattern.substring(start, end);
+    StringBuilder sb = new StringBuilder();
     sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
 
+    // handling SQL wildcards by replacing them with corresponding regex equivalents
+    // we ignore them if the SQL wildcards are escaped
     int i = 0;
-    while (i < sb.length()) {
-      char c = sb.charAt(i);
+    boolean isPrevCharBackSlash = false;
+    while (i < likePattern.length()) {
+      char c = likePattern.charAt(i);
       if (c == '_') {
-        sb.replace(i, i + 1, ".");

Review Comment:
   Ack, will fix this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Aravind-Suresh commented on a diff in pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "Aravind-Suresh (via GitHub)" <gi...@apache.org>.
Aravind-Suresh commented on code in PR #10897:
URL: https://github.com/apache/pinot/pull/10897#discussion_r1228964194


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,24 +71,42 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
+    likePattern = likePattern.substring(start, end);
+    StringBuilder sb = new StringBuilder();
     sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
 
+    // handling SQL wildcards by replacing them with corresponding regex equivalents
+    // we ignore them if the SQL wildcards are escaped
     int i = 0;
-    while (i < sb.length()) {
-      char c = sb.charAt(i);
+    boolean isPrevCharBackSlash = false;
+    while (i < likePattern.length()) {
+      char c = likePattern.charAt(i);
       if (c == '_') {
-        sb.replace(i, i + 1, ".");
+        sb.append(isPrevCharBackSlash ? c : ".");
       } else if (c == '%') {
-        sb.replace(i, i + 1, ".*");
-        i++;
+        sb.append(isPrevCharBackSlash ? c : ".*");
+      } else if (REGEXP_METACHARACTERS.contains(String.valueOf(c))) {
+        sb.append('\\').append(c);
+      } else {
+        if (isPrevCharBackSlash) {
+          // this means the previous character is a \
+          // but it was not used for escaping SQL wildcards
+          // so let's escape this \ in the output
+          // this case is separately handled outside of the meta characters list
+          sb.append('\\');

Review Comment:
   We can't merge this in line 89 as this requires look-ahead at the next character - we could do that as well, but instead of look-ahead, we look-back here. So if c = '\', we don't know if it should be escaped or not. We should escape it if the next character is not a SQL wildcard. So, this cannot be merged with line 89 where we escape every meta character (this is why '\' was removed from the meta characters list).
   
   Let me know if I'm missing something here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Aravind-Suresh commented on a diff in pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "Aravind-Suresh (via GitHub)" <gi...@apache.org>.
Aravind-Suresh commented on code in PR #10897:
URL: https://github.com/apache/pinot/pull/10897#discussion_r1236289402


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,27 +70,54 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
+    likePattern = likePattern.substring(start, end);
+    StringBuilder sb = new StringBuilder();
     sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
 
+    // handling SQL wildcards by replacing them with corresponding regex equivalents

Review Comment:
   Ack, fixed it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Aravind-Suresh commented on a diff in pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "Aravind-Suresh (via GitHub)" <gi...@apache.org>.
Aravind-Suresh commented on code in PR #10897:
URL: https://github.com/apache/pinot/pull/10897#discussion_r1228964194


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,24 +71,42 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
+    likePattern = likePattern.substring(start, end);
+    StringBuilder sb = new StringBuilder();
     sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
 
+    // handling SQL wildcards by replacing them with corresponding regex equivalents
+    // we ignore them if the SQL wildcards are escaped
     int i = 0;
-    while (i < sb.length()) {
-      char c = sb.charAt(i);
+    boolean isPrevCharBackSlash = false;
+    while (i < likePattern.length()) {
+      char c = likePattern.charAt(i);
       if (c == '_') {
-        sb.replace(i, i + 1, ".");
+        sb.append(isPrevCharBackSlash ? c : ".");
       } else if (c == '%') {
-        sb.replace(i, i + 1, ".*");
-        i++;
+        sb.append(isPrevCharBackSlash ? c : ".*");
+      } else if (REGEXP_METACHARACTERS.contains(String.valueOf(c))) {
+        sb.append('\\').append(c);
+      } else {
+        if (isPrevCharBackSlash) {
+          // this means the previous character is a \
+          // but it was not used for escaping SQL wildcards
+          // so let's escape this \ in the output
+          // this case is separately handled outside of the meta characters list
+          sb.append('\\');

Review Comment:
   We can't merge this in line 89 as this requires look-ahead at the next character - we could do that as well, but instead of look-ahead, we look-back here. So if c = '\\', we don't know if it should be escaped or not. We should escape it if the next character is not a SQL wildcard. So, this cannot be merged with line 89 where we escape every meta character (this is why '\\' was removed from the meta characters list).
   
   Let me know if I'm missing something here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Aravind-Suresh commented on a diff in pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "Aravind-Suresh (via GitHub)" <gi...@apache.org>.
Aravind-Suresh commented on code in PR #10897:
URL: https://github.com/apache/pinot/pull/10897#discussion_r1236800873


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,24 +72,56 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
-    sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
+    likePattern = likePattern.substring(start, end);
+    return escapeMetaCharsAndWildcards(likePattern, prefix, suffix);
+  }
 
+  /**
+   * Escapes the provided pattern by considering the following constraints:
+   * <ul>
+   *     <li> SQL wildcards escaping is handled (_, %) </li>
+   *     <li> Regex meta characters escaping is handled </li>
+   * </ul>
+   * @param input the provided input string
+   * @param prefix the prefix to be added to the output string
+   * @param suffix the suffix to be added to the output string
+   * @return the final output string
+   */
+  private static String escapeMetaCharsAndWildcards(String input, String prefix, String suffix) {
+    StringBuilder sb = new StringBuilder();
+    sb.append(prefix);
+    // handling SQL wildcards (_, %) by replacing them with corresponding regex equivalents
+    // we ignore them if the SQL wildcards are escaped
     int i = 0;
-    while (i < sb.length()) {
-      char c = sb.charAt(i);
+    boolean isPrevCharBackSlash = false;
+    while (i < input.length()) {

Review Comment:
   Fixed it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #10897:
URL: https://github.com/apache/pinot/pull/10897#issuecomment-1587890129

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/10897?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#10897](https://app.codecov.io/gh/apache/pinot/pull/10897?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (e47e637) into [master](https://app.codecov.io/gh/apache/pinot/commit/86b3b3e1c5bad5092f357358888ff28903ab05df?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (86b3b3e) will **not change** coverage.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@           Coverage Diff           @@
   ##           master   #10897   +/-   ##
   =======================================
     Coverage    0.11%    0.11%           
   =======================================
     Files        2186     2186           
     Lines      117493   117500    +7     
     Branches    17762    17767    +5     
   =======================================
     Hits          137      137           
   - Misses     117336   117343    +7     
     Partials       20       20           
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1temurin11 | `?` | |
   | integration1temurin17 | `?` | |
   | integration1temurin20 | `?` | |
   | integration2temurin11 | `0.00% <0.00%> (ø)` | |
   | integration2temurin17 | `?` | |
   | integration2temurin20 | `?` | |
   | unittests1temurin11 | `?` | |
   | unittests1temurin17 | `?` | |
   | unittests1temurin20 | `0.00% <0.00%> (ø)` | |
   | unittests2temurin11 | `0.11% <0.00%> (ø)` | |
   | unittests2temurin17 | `?` | |
   | unittests2temurin20 | `0.11% <0.00%> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://app.codecov.io/gh/apache/pinot/pull/10897?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [...inot/common/utils/RegexpPatternConverterUtils.java](https://app.codecov.io/gh/apache/pinot/pull/10897?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvUmVnZXhwUGF0dGVybkNvbnZlcnRlclV0aWxzLmphdmE=) | `0.00% <0.00%> (ø)` | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Aravind-Suresh commented on a diff in pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "Aravind-Suresh (via GitHub)" <gi...@apache.org>.
Aravind-Suresh commented on code in PR #10897:
URL: https://github.com/apache/pinot/pull/10897#discussion_r1228963312


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,24 +71,42 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
+    likePattern = likePattern.substring(start, end);
+    StringBuilder sb = new StringBuilder();
     sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
 
+    // handling SQL wildcards by replacing them with corresponding regex equivalents
+    // we ignore them if the SQL wildcards are escaped
     int i = 0;
-    while (i < sb.length()) {
-      char c = sb.charAt(i);
+    boolean isPrevCharBackSlash = false;
+    while (i < likePattern.length()) {
+      char c = likePattern.charAt(i);
       if (c == '_') {
-        sb.replace(i, i + 1, ".");
+        sb.append(isPrevCharBackSlash ? c : ".");
       } else if (c == '%') {
-        sb.replace(i, i + 1, ".*");
-        i++;
+        sb.append(isPrevCharBackSlash ? c : ".*");

Review Comment:
   So, escapeMetaCharacters was escaping every instance of a meta character. Now "\\" was also a meta character, but we shouldn't escape that if that was used to escape the following character - like: \\_, \\%. Basically there are two flavours of "\\" - one as a meta character, one as an escape character for the following character. We should escape the first and not the latter in the output. Since we can't replace every occurrence of "\\" with "\\\\" - so I incorporated that logic here where if it's not before any SQL wildcard, we end up escaping it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Aravind-Suresh commented on a diff in pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "Aravind-Suresh (via GitHub)" <gi...@apache.org>.
Aravind-Suresh commented on code in PR #10897:
URL: https://github.com/apache/pinot/pull/10897#discussion_r1228971334


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -18,16 +18,23 @@
  */
 package org.apache.pinot.common.utils;
 
+import com.google.common.collect.ImmutableSet;
+import java.util.Set;
+
 /**
  * Utility for converting regex patterns.
  */
 public class RegexpPatternConverterUtils {
   private RegexpPatternConverterUtils() {
   }
 
-  /* Represents all metacharacters to be processed */
-  public static final String[] REGEXP_METACHARACTERS =
-      {"\\", "^", "$", ".", "{", "}", "[", "]", "(", ")", "*", "+", "?", "|", "<", ">", "-", "&", "/"};
+  /*
+   * Represents all metacharacters to be processed.
+   * This excludes the \ (back slash) character as that doubles up as an escape character as well.
+   * So it is handled separately in the conversion logic.
+   */
+  public static final Set<String> REGEXP_METACHARACTERS = ImmutableSet.of(

Review Comment:
   Ack, will fix this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] chenboat commented on a diff in pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "chenboat (via GitHub)" <gi...@apache.org>.
chenboat commented on code in PR #10897:
URL: https://github.com/apache/pinot/pull/10897#discussion_r1235861637


##########
pinot-common/src/test/java/org/apache/pinot/common/utils/RegexpPatternConverterUtilsTest.java:
##########
@@ -125,4 +125,27 @@ public void testTrailingSize2() {
     String regexpLikePattern = RegexpPatternConverterUtils.likeToRegexpLike("z%");
     assertEquals(regexpLikePattern, "^z");
   }
+
+  @Test
+  public void testEscapedWildcard1() {
+    String regexpLikePattern = RegexpPatternConverterUtils.likeToRegexpLike("a\\_b_\\");

Review Comment:
   Is it correct that "b_" is translated "b."? Even it is correct, it is not obvious. Can you add some comments why it is converted as such?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "atris (via GitHub)" <gi...@apache.org>.
atris commented on PR #10897:
URL: https://github.com/apache/pinot/pull/10897#issuecomment-1587735861

   @Aravind-Suresh Thanks for raising the PR. I will review this once the tests pass


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Aravind-Suresh commented on a diff in pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "Aravind-Suresh (via GitHub)" <gi...@apache.org>.
Aravind-Suresh commented on code in PR #10897:
URL: https://github.com/apache/pinot/pull/10897#discussion_r1228963312


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,24 +71,42 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
+    likePattern = likePattern.substring(start, end);
+    StringBuilder sb = new StringBuilder();
     sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
 
+    // handling SQL wildcards by replacing them with corresponding regex equivalents
+    // we ignore them if the SQL wildcards are escaped
     int i = 0;
-    while (i < sb.length()) {
-      char c = sb.charAt(i);
+    boolean isPrevCharBackSlash = false;
+    while (i < likePattern.length()) {
+      char c = likePattern.charAt(i);
       if (c == '_') {
-        sb.replace(i, i + 1, ".");
+        sb.append(isPrevCharBackSlash ? c : ".");
       } else if (c == '%') {
-        sb.replace(i, i + 1, ".*");
-        i++;
+        sb.append(isPrevCharBackSlash ? c : ".*");

Review Comment:
   So, escapeMetaCharacters was escaping every instance of a meta character. Now "\" was also a meta character, but we shouldn't escape that if that was used to escape the following character - like: \_, \%. Basically there are two flavours of "\" - one as a meta character, one as an escape character for the following character. We should escape the first and not the latter in the output. Since we can't replace every occurrence of "\" with "\\" - so I incorporated that logic here where if it's not before any SQL wildcard, we end up escaping it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Aravind-Suresh commented on a diff in pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "Aravind-Suresh (via GitHub)" <gi...@apache.org>.
Aravind-Suresh commented on code in PR #10897:
URL: https://github.com/apache/pinot/pull/10897#discussion_r1236803233


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,24 +72,56 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
-    sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
+    likePattern = likePattern.substring(start, end);
+    return escapeMetaCharsAndWildcards(likePattern, prefix, suffix);
+  }
 
+  /**
+   * Escapes the provided pattern by considering the following constraints:
+   * <ul>
+   *     <li> SQL wildcards escaping is handled (_, %) </li>
+   *     <li> Regex meta characters escaping is handled </li>
+   * </ul>
+   * @param input the provided input string
+   * @param prefix the prefix to be added to the output string
+   * @param suffix the suffix to be added to the output string
+   * @return the final output string
+   */
+  private static String escapeMetaCharsAndWildcards(String input, String prefix, String suffix) {
+    StringBuilder sb = new StringBuilder();
+    sb.append(prefix);
+    // handling SQL wildcards (_, %) by replacing them with corresponding regex equivalents
+    // we ignore them if the SQL wildcards are escaped
     int i = 0;
-    while (i < sb.length()) {
-      char c = sb.charAt(i);
+    boolean isPrevCharBackSlash = false;
+    while (i < input.length()) {
+      char c = input.charAt(i);
       if (c == '_') {
-        sb.replace(i, i + 1, ".");
+        sb.append(isPrevCharBackSlash ? c : ".");
       } else if (c == '%') {
-        sb.replace(i, i + 1, ".*");
-        i++;
+        sb.append(isPrevCharBackSlash ? c : ".*");
+      } else if (Chars.indexOf(REGEXP_METACHARACTERS, c) >= 0) {
+        sb.append(BACK_SLASH).append(c);
+      } else {
+        if (isPrevCharBackSlash) {
+          // this means the previous character is a \
+          // but it was not used for escaping SQL wildcards
+          // so let's escape this \ in the output
+          // this case is separately handled outside of the meta characters list
+          sb.append(BACK_SLASH);
+        }
+        sb.append(c);
       }
-      i++;
+      isPrevCharBackSlash = (c == BACK_SLASH);
+      ++i;

Review Comment:
   Fixed it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris merged pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "atris (via GitHub)" <gi...@apache.org>.
atris merged PR #10897:
URL: https://github.com/apache/pinot/pull/10897


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Aravind-Suresh commented on a diff in pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "Aravind-Suresh (via GitHub)" <gi...@apache.org>.
Aravind-Suresh commented on code in PR #10897:
URL: https://github.com/apache/pinot/pull/10897#discussion_r1228963312


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,24 +71,42 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
+    likePattern = likePattern.substring(start, end);
+    StringBuilder sb = new StringBuilder();
     sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
 
+    // handling SQL wildcards by replacing them with corresponding regex equivalents
+    // we ignore them if the SQL wildcards are escaped
     int i = 0;
-    while (i < sb.length()) {
-      char c = sb.charAt(i);
+    boolean isPrevCharBackSlash = false;
+    while (i < likePattern.length()) {
+      char c = likePattern.charAt(i);
       if (c == '_') {
-        sb.replace(i, i + 1, ".");
+        sb.append(isPrevCharBackSlash ? c : ".");
       } else if (c == '%') {
-        sb.replace(i, i + 1, ".*");
-        i++;
+        sb.append(isPrevCharBackSlash ? c : ".*");

Review Comment:
   So, escapeMetaCharacters was escaping every instance of a meta character. Now "\\" was also a meta character, but we shouldn't escape that if that was used to escape the following character - like: \\_, \\%. Basically there are two flavours of "\" - one as a meta character, one as an escape character for the following character. We should escape the first and not the latter in the output. Since we can't replace every occurrence of "\\" with "\\\\" - so I incorporated that logic here where if it's not before any SQL wildcard, we end up escaping it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Aravind-Suresh commented on a diff in pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "Aravind-Suresh (via GitHub)" <gi...@apache.org>.
Aravind-Suresh commented on code in PR #10897:
URL: https://github.com/apache/pinot/pull/10897#discussion_r1236801239


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,24 +71,42 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
+    likePattern = likePattern.substring(start, end);
+    StringBuilder sb = new StringBuilder();
     sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
 
+    // handling SQL wildcards by replacing them with corresponding regex equivalents
+    // we ignore them if the SQL wildcards are escaped
     int i = 0;
-    while (i < sb.length()) {
-      char c = sb.charAt(i);
+    boolean isPrevCharBackSlash = false;
+    while (i < likePattern.length()) {
+      char c = likePattern.charAt(i);
       if (c == '_') {
-        sb.replace(i, i + 1, ".");
+        sb.append(isPrevCharBackSlash ? c : ".");
       } else if (c == '%') {
-        sb.replace(i, i + 1, ".*");
-        i++;
+        sb.append(isPrevCharBackSlash ? c : ".*");

Review Comment:
   Makes sense, I've addressed this in the latest diff. Please take a look.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Aravind-Suresh commented on a diff in pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "Aravind-Suresh (via GitHub)" <gi...@apache.org>.
Aravind-Suresh commented on code in PR #10897:
URL: https://github.com/apache/pinot/pull/10897#discussion_r1236332602


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,27 +70,54 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
+    likePattern = likePattern.substring(start, end);
+    StringBuilder sb = new StringBuilder();
     sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
 
+    // handling SQL wildcards by replacing them with corresponding regex equivalents
+    // we ignore them if the SQL wildcards are escaped
     int i = 0;
-    while (i < sb.length()) {
-      char c = sb.charAt(i);
+    boolean isPrevCharBackSlash = false;

Review Comment:
   Done, fixed it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] chenboat commented on a diff in pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "chenboat (via GitHub)" <gi...@apache.org>.
chenboat commented on code in PR #10897:
URL: https://github.com/apache/pinot/pull/10897#discussion_r1235861637


##########
pinot-common/src/test/java/org/apache/pinot/common/utils/RegexpPatternConverterUtilsTest.java:
##########
@@ -125,4 +125,27 @@ public void testTrailingSize2() {
     String regexpLikePattern = RegexpPatternConverterUtils.likeToRegexpLike("z%");
     assertEquals(regexpLikePattern, "^z");
   }
+
+  @Test
+  public void testEscapedWildcard1() {
+    String regexpLikePattern = RegexpPatternConverterUtils.likeToRegexpLike("a\\_b_\\");

Review Comment:
   Is it correct that "b_" is translated "b."? Even it is correctly, it is not obvious. Can you add some comments why it is converted as such?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Aravind-Suresh commented on a diff in pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "Aravind-Suresh (via GitHub)" <gi...@apache.org>.
Aravind-Suresh commented on code in PR #10897:
URL: https://github.com/apache/pinot/pull/10897#discussion_r1236298168


##########
pinot-common/src/test/java/org/apache/pinot/common/utils/RegexpPatternConverterUtilsTest.java:
##########
@@ -125,4 +125,27 @@ public void testTrailingSize2() {
     String regexpLikePattern = RegexpPatternConverterUtils.likeToRegexpLike("z%");
     assertEquals(regexpLikePattern, "^z");
   }
+
+  @Test
+  public void testEscapedWildcard1() {
+    String regexpLikePattern = RegexpPatternConverterUtils.likeToRegexpLike("a\\_b_\\");
+    assertEquals(regexpLikePattern, "^a\\_b.\\\\$");
+    String luceneRegExpPattern = RegexpPatternConverterUtils.regexpLikeToLuceneRegExp(regexpLikePattern);
+    assertEquals(luceneRegExpPattern, "a\\_b.\\\\");

Review Comment:
   Fixed it.



##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,27 +70,54 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
+    likePattern = likePattern.substring(start, end);
+    StringBuilder sb = new StringBuilder();
     sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
 
+    // handling SQL wildcards by replacing them with corresponding regex equivalents
+    // we ignore them if the SQL wildcards are escaped
     int i = 0;
-    while (i < sb.length()) {
-      char c = sb.charAt(i);
+    boolean isPrevCharBackSlash = false;
+    while (i < likePattern.length()) {
+      char c = likePattern.charAt(i);
       if (c == '_') {
-        sb.replace(i, i + 1, ".");
+        sb.append(isPrevCharBackSlash ? c : ".");
       } else if (c == '%') {
-        sb.replace(i, i + 1, ".*");
-        i++;
+        sb.append(isPrevCharBackSlash ? c : ".*");
+      } else if (indexOf(REGEXP_METACHARACTERS, c) >= 0) {
+        sb.append(BACK_SLASH).append(c);
+      } else {
+        if (isPrevCharBackSlash) {
+          // this means the previous character is a \
+          // but it was not used for escaping SQL wildcards
+          // so let's escape this \ in the output
+          // this case is separately handled outside of the meta characters list
+          sb.append(BACK_SLASH);
+        }
+        sb.append(c);
       }
-      i++;
+      isPrevCharBackSlash = (c == BACK_SLASH);
+      ++i;
+    }
+
+    // handle trailing \
+    if (isPrevCharBackSlash) {
+      sb.append(BACK_SLASH);
     }
 
+    sb.append(suffix);
     return sb.toString();
   }
 
+  private static int indexOf(char[] arr, char c) {

Review Comment:
   Tried searching for it before -- didn't find one in an already included package.
   
   But found Guava's Chars.indexOf which does this -- replaced it with that.



##########
pinot-common/src/test/java/org/apache/pinot/common/utils/RegexpPatternConverterUtilsTest.java:
##########
@@ -125,4 +125,27 @@ public void testTrailingSize2() {
     String regexpLikePattern = RegexpPatternConverterUtils.likeToRegexpLike("z%");
     assertEquals(regexpLikePattern, "^z");
   }
+
+  @Test
+  public void testEscapedWildcard1() {
+    String regexpLikePattern = RegexpPatternConverterUtils.likeToRegexpLike("a\\_b_\\");

Review Comment:
   Sure, added in-line comments to explain this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on a diff in pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "atris (via GitHub)" <gi...@apache.org>.
atris commented on code in PR #10897:
URL: https://github.com/apache/pinot/pull/10897#discussion_r1236485819


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,24 +72,56 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
-    sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
+    likePattern = likePattern.substring(start, end);
+    return escapeMetaCharsAndWildcards(likePattern, prefix, suffix);
+  }
 
+  /**
+   * Escapes the provided pattern by considering the following constraints:
+   * <ul>
+   *     <li> SQL wildcards escaping is handled (_, %) </li>
+   *     <li> Regex meta characters escaping is handled </li>
+   * </ul>
+   * @param input the provided input string
+   * @param prefix the prefix to be added to the output string
+   * @param suffix the suffix to be added to the output string
+   * @return the final output string
+   */
+  private static String escapeMetaCharsAndWildcards(String input, String prefix, String suffix) {
+    StringBuilder sb = new StringBuilder();
+    sb.append(prefix);
+    // handling SQL wildcards (_, %) by replacing them with corresponding regex equivalents
+    // we ignore them if the SQL wildcards are escaped
     int i = 0;
-    while (i < sb.length()) {
-      char c = sb.charAt(i);
+    boolean isPrevCharBackSlash = false;
+    while (i < input.length()) {

Review Comment:
   Nit: Cache this length locally



##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,24 +72,56 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
-    sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
+    likePattern = likePattern.substring(start, end);
+    return escapeMetaCharsAndWildcards(likePattern, prefix, suffix);
+  }
 
+  /**
+   * Escapes the provided pattern by considering the following constraints:
+   * <ul>
+   *     <li> SQL wildcards escaping is handled (_, %) </li>
+   *     <li> Regex meta characters escaping is handled </li>
+   * </ul>
+   * @param input the provided input string
+   * @param prefix the prefix to be added to the output string
+   * @param suffix the suffix to be added to the output string
+   * @return the final output string
+   */
+  private static String escapeMetaCharsAndWildcards(String input, String prefix, String suffix) {
+    StringBuilder sb = new StringBuilder();
+    sb.append(prefix);
+    // handling SQL wildcards (_, %) by replacing them with corresponding regex equivalents
+    // we ignore them if the SQL wildcards are escaped
     int i = 0;
-    while (i < sb.length()) {
-      char c = sb.charAt(i);
+    boolean isPrevCharBackSlash = false;
+    while (i < input.length()) {
+      char c = input.charAt(i);
       if (c == '_') {
-        sb.replace(i, i + 1, ".");
+        sb.append(isPrevCharBackSlash ? c : ".");
       } else if (c == '%') {
-        sb.replace(i, i + 1, ".*");
-        i++;
+        sb.append(isPrevCharBackSlash ? c : ".*");
+      } else if (Chars.indexOf(REGEXP_METACHARACTERS, c) >= 0) {
+        sb.append(BACK_SLASH).append(c);
+      } else {
+        if (isPrevCharBackSlash) {
+          // this means the previous character is a \
+          // but it was not used for escaping SQL wildcards
+          // so let's escape this \ in the output
+          // this case is separately handled outside of the meta characters list
+          sb.append(BACK_SLASH);
+        }
+        sb.append(c);
       }
-      i++;
+      isPrevCharBackSlash = (c == BACK_SLASH);
+      ++i;

Review Comment:
   Nit: Why the pre increment?



##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,24 +72,56 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
-    sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
+    likePattern = likePattern.substring(start, end);
+    return escapeMetaCharsAndWildcards(likePattern, prefix, suffix);
+  }
 
+  /**
+   * Escapes the provided pattern by considering the following constraints:
+   * <ul>
+   *     <li> SQL wildcards escaping is handled (_, %) </li>
+   *     <li> Regex meta characters escaping is handled </li>
+   * </ul>
+   * @param input the provided input string
+   * @param prefix the prefix to be added to the output string
+   * @param suffix the suffix to be added to the output string
+   * @return the final output string
+   */
+  private static String escapeMetaCharsAndWildcards(String input, String prefix, String suffix) {
+    StringBuilder sb = new StringBuilder();
+    sb.append(prefix);
+    // handling SQL wildcards (_, %) by replacing them with corresponding regex equivalents
+    // we ignore them if the SQL wildcards are escaped
     int i = 0;
-    while (i < sb.length()) {
-      char c = sb.charAt(i);
+    boolean isPrevCharBackSlash = false;
+    while (i < input.length()) {
+      char c = input.charAt(i);
       if (c == '_') {
-        sb.replace(i, i + 1, ".");
+        sb.append(isPrevCharBackSlash ? c : ".");
       } else if (c == '%') {
-        sb.replace(i, i + 1, ".*");
-        i++;
+        sb.append(isPrevCharBackSlash ? c : ".*");
+      } else if (Chars.indexOf(REGEXP_METACHARACTERS, c) >= 0) {

Review Comment:
   Nit: I wonder if we can model this as a switch statement. The if branching hurts my eyes



##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,24 +71,42 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
+    likePattern = likePattern.substring(start, end);
+    StringBuilder sb = new StringBuilder();
     sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
 
+    // handling SQL wildcards by replacing them with corresponding regex equivalents
+    // we ignore them if the SQL wildcards are escaped
     int i = 0;
-    while (i < sb.length()) {
-      char c = sb.charAt(i);
+    boolean isPrevCharBackSlash = false;
+    while (i < likePattern.length()) {
+      char c = likePattern.charAt(i);
       if (c == '_') {
-        sb.replace(i, i + 1, ".");
+        sb.append(isPrevCharBackSlash ? c : ".");
       } else if (c == '%') {
-        sb.replace(i, i + 1, ".*");
-        i++;
+        sb.append(isPrevCharBackSlash ? c : ".*");

Review Comment:
   That should not impact doing the processing in a separate method. I would still advise moving to a separate method to ensure a higher degree of readability.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on a diff in pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "atris (via GitHub)" <gi...@apache.org>.
atris commented on code in PR #10897:
URL: https://github.com/apache/pinot/pull/10897#discussion_r1228567126


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -18,16 +18,23 @@
  */
 package org.apache.pinot.common.utils;
 
+import com.google.common.collect.ImmutableSet;
+import java.util.Set;
+
 /**
  * Utility for converting regex patterns.
  */
 public class RegexpPatternConverterUtils {
   private RegexpPatternConverterUtils() {
   }
 
-  /* Represents all metacharacters to be processed */
-  public static final String[] REGEXP_METACHARACTERS =
-      {"\\", "^", "$", ".", "{", "}", "[", "]", "(", ")", "*", "+", "?", "|", "<", ">", "-", "&", "/"};
+  /*
+   * Represents all metacharacters to be processed.
+   * This excludes the \ (back slash) character as that doubles up as an escape character as well.
+   * So it is handled separately in the conversion logic.
+   */
+  public static final Set<String> REGEXP_METACHARACTERS = ImmutableSet.of(

Review Comment:
   Why the move to a set? Since the number of characters is small, isn't an array more space efficient?



##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,24 +71,42 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
+    likePattern = likePattern.substring(start, end);
+    StringBuilder sb = new StringBuilder();
     sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
 
+    // handling SQL wildcards by replacing them with corresponding regex equivalents
+    // we ignore them if the SQL wildcards are escaped
     int i = 0;
-    while (i < sb.length()) {
-      char c = sb.charAt(i);
+    boolean isPrevCharBackSlash = false;
+    while (i < likePattern.length()) {
+      char c = likePattern.charAt(i);
       if (c == '_') {
-        sb.replace(i, i + 1, ".");
+        sb.append(isPrevCharBackSlash ? c : ".");
       } else if (c == '%') {
-        sb.replace(i, i + 1, ".*");
-        i++;
+        sb.append(isPrevCharBackSlash ? c : ".*");

Review Comment:
   Why was this logic moved out of escapeMetaCharacters?



##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,24 +71,42 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
+    likePattern = likePattern.substring(start, end);
+    StringBuilder sb = new StringBuilder();
     sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
 
+    // handling SQL wildcards by replacing them with corresponding regex equivalents
+    // we ignore them if the SQL wildcards are escaped
     int i = 0;
-    while (i < sb.length()) {
-      char c = sb.charAt(i);
+    boolean isPrevCharBackSlash = false;
+    while (i < likePattern.length()) {
+      char c = likePattern.charAt(i);
       if (c == '_') {
-        sb.replace(i, i + 1, ".");
+        sb.append(isPrevCharBackSlash ? c : ".");
       } else if (c == '%') {
-        sb.replace(i, i + 1, ".*");
-        i++;
+        sb.append(isPrevCharBackSlash ? c : ".*");
+      } else if (REGEXP_METACHARACTERS.contains(String.valueOf(c))) {
+        sb.append('\\').append(c);
+      } else {
+        if (isPrevCharBackSlash) {
+          // this means the previous character is a \
+          // but it was not used for escaping SQL wildcards
+          // so let's escape this \ in the output
+          // this case is separately handled outside of the meta characters list
+          sb.append('\\');

Review Comment:
   Why can this be not merged with line 89?



##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,24 +71,42 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
+    likePattern = likePattern.substring(start, end);
+    StringBuilder sb = new StringBuilder();
     sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
 
+    // handling SQL wildcards by replacing them with corresponding regex equivalents
+    // we ignore them if the SQL wildcards are escaped
     int i = 0;
-    while (i < sb.length()) {
-      char c = sb.charAt(i);
+    boolean isPrevCharBackSlash = false;
+    while (i < likePattern.length()) {
+      char c = likePattern.charAt(i);
       if (c == '_') {
-        sb.replace(i, i + 1, ".");

Review Comment:
   NIT: Lets use constant named variables instead of symbols here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Aravind-Suresh commented on a diff in pull request #10897: Fixes SQL wildcard escaping in LIKE queries

Posted by "Aravind-Suresh (via GitHub)" <gi...@apache.org>.
Aravind-Suresh commented on code in PR #10897:
URL: https://github.com/apache/pinot/pull/10897#discussion_r1228992428


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/RegexpPatternConverterUtils.java:
##########
@@ -64,24 +71,42 @@ public static String likeToRegexpLike(String likePattern) {
         break;
     }
 
-    String escaped = escapeMetaCharacters(likePattern.substring(start, end));
-    StringBuilder sb = new StringBuilder(escaped.length() + 2);
+    likePattern = likePattern.substring(start, end);
+    StringBuilder sb = new StringBuilder();
     sb.append(prefix);
-    sb.append(escaped);
-    sb.append(suffix);
 
+    // handling SQL wildcards by replacing them with corresponding regex equivalents
+    // we ignore them if the SQL wildcards are escaped
     int i = 0;
-    while (i < sb.length()) {
-      char c = sb.charAt(i);
+    boolean isPrevCharBackSlash = false;
+    while (i < likePattern.length()) {
+      char c = likePattern.charAt(i);
       if (c == '_') {
-        sb.replace(i, i + 1, ".");

Review Comment:
   I've replaced the ones (BACK_SLASH) which occur more than once here. Rest I've left for readability.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org