You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org> on 2018/07/11 00:21:30 UTC

[Impala-ASF-CR] IMPALA-2422: Fix escaping in the LIKE clause

Taras Bobrovytsky has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10910


Change subject: IMPALA-2422: Fix escaping in the LIKE clause
......................................................................

IMPALA-2422: Fix escaping in the LIKE clause

There are two stages to processing a like clause. First, we determine if
it is possible to convert the expression to a simpler function, such as
StartsWith() or EndsWith(). If not, then we use a Regex libarary to
compute the result.

There was a problem in the logic that determines if it is possible to
use a simpler function. It did not take into account escape characters.
For example, "abc\%" was incorrectly converted to StartsWith("abc\").

There was another problem. We always unescaped strings in the frontend.
The RE2 regex function also unescapes the regex before proceeding. So
regexes were unescaped twice, which caused some ambiguity. For example,
"abc\%" and "abc\\%" are unescaped in the frontend and the same pattern,
"abc\%" is sent to the backend. The backend could not decide if this
pattern is an exact or prefix match. To fix this problem, we avoid
unescaping regex pattens in the frontend.

Testing:
 -Added expr tests.

Change-Id: I553412318525820a36d2f401aa7e93958d22f70e
---
M be/src/exprs/expr-test.cc
M be/src/exprs/like-predicate.cc
M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
5 files changed, 77 insertions(+), 23 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/10910/1
-- 
To view, visit http://gerrit.cloudera.org:8080/10910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I553412318525820a36d2f401aa7e93958d22f70e
Gerrit-Change-Number: 10910
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>

[Impala-ASF-CR] IMPALA-2422: Fix escaping in the LIKE clause

Posted by "Vuk Ercegovac (Code Review)" <ge...@cloudera.org>.
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/10910 )

Change subject: IMPALA-2422: Fix escaping in the LIKE clause
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10910/1/be/src/exprs/like-predicate.cc
File be/src/exprs/like-predicate.cc:

http://gerrit.cloudera.org:8080/#/c/10910/1/be/src/exprs/like-predicate.cc@265
PS1, Line 265: 	  StringValue tmp_val = StringValue::FromStringVal(val);
ws


http://gerrit.cloudera.org:8080/#/c/10910/1/fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
File fe/src/main/java/org/apache/impala/analysis/LikePredicate.java:

http://gerrit.cloudera.org:8080/#/c/10910/1/fe/src/main/java/org/apache/impala/analysis/LikePredicate.java@132
PS1, Line 132: StringLiteral
basic question: what happens with the interaction between constant folding and analyzing this predicate? my understanding is that child[1] will get replaced; if its replaced with some string that should be unescaped, is it analyzed at the right time so that this flag is flipped? if there isn't a test for this, lets add one.


http://gerrit.cloudera.org:8080/#/c/10910/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

http://gerrit.cloudera.org:8080/#/c/10910/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@a304
PS1, Line 304: 
basic question: this changes the test. are inputs expected to change? just checking if there is any compatibility issue here.



-- 
To view, visit http://gerrit.cloudera.org:8080/10910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I553412318525820a36d2f401aa7e93958d22f70e
Gerrit-Change-Number: 10910
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Wed, 11 Jul 2018 02:19:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2422: Fix escaping in the LIKE clause

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has abandoned this change. ( http://gerrit.cloudera.org:8080/10910 )

Change subject: IMPALA-2422: Fix escaping in the LIKE clause
......................................................................


Abandoned
-- 
To view, visit http://gerrit.cloudera.org:8080/10910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I553412318525820a36d2f401aa7e93958d22f70e
Gerrit-Change-Number: 10910
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] IMPALA-2422: Fix escaping in the LIKE clause

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10910 )

Change subject: IMPALA-2422: Fix escaping in the LIKE clause
......................................................................


Patch Set 1:

How does this related to the previous version of this patch?

Would be great to get this fixed.


-- 
To view, visit http://gerrit.cloudera.org:8080/10910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I553412318525820a36d2f401aa7e93958d22f70e
Gerrit-Change-Number: 10910
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Wed, 11 Jul 2018 17:02:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-2422: Fix escaping in the LIKE clause

Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/10910 )

Change subject: IMPALA-2422: Fix escaping in the LIKE clause
......................................................................

IMPALA-2422: Fix escaping in the LIKE clause

There are two stages to processing a like clause. First, we determine if
it is possible to convert the expression to a simpler function, such as
StartsWith() or EndsWith(). If not, then we use a Regex libarary to
compute the result.

There was a problem in the logic that determines if it is possible to
use a simpler function. It did not take into account escape characters.
For example, "abc\%" was incorrectly converted to StartsWith("abc\").

There was another problem. We always unescaped strings in the frontend.
The RE2 regex function also unescapes the regex before proceeding. So
regexes were unescaped twice, which caused some ambiguity. For example,
"abc\%" and "abc\\%" are unescaped in the frontend and the same pattern,
"abc\%" is sent to the backend. The backend could not decide if this
pattern is an exact or prefix match. To fix this problem, we avoid
unescaping regex pattens in the frontend.

Testing:
 -Added expr tests.

Change-Id: I553412318525820a36d2f401aa7e93958d22f70e
---
M be/src/exprs/expr-test.cc
M be/src/exprs/like-predicate.cc
M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
5 files changed, 68 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/10910/2
-- 
To view, visit http://gerrit.cloudera.org:8080/10910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I553412318525820a36d2f401aa7e93958d22f70e
Gerrit-Change-Number: 10910
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] IMPALA-2422: Fix escaping in the LIKE clause

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10910 )

Change subject: IMPALA-2422: Fix escaping in the LIKE clause
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10910/2/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/10910/2/be/src/exprs/expr-test.cc@3481
PS2, Line 3481:   // We return a different result if the second argument is not a string literal.
Am I understanding this right that LIKE now behaves differently depending on whether its argument is constant or a variable that evaluates to the same value? That's going to cause a lot of issues...



-- 
To view, visit http://gerrit.cloudera.org:8080/10910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I553412318525820a36d2f401aa7e93958d22f70e
Gerrit-Change-Number: 10910
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Thu, 12 Jul 2018 21:21:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2422: Fix escaping in the LIKE clause

Posted by "Vuk Ercegovac (Code Review)" <ge...@cloudera.org>.
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/10910 )

Change subject: IMPALA-2422: Fix escaping in the LIKE clause
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10910/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

http://gerrit.cloudera.org:8080/#/c/10910/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@a304
PS1, Line 304: 
> Not quite sure what you mean. The test is the same as before, except we rem
my question was about backwards compatibility. lets update the jira description to emphasize that the bug fix, while good that it tries to fix the issue, will change existing queries.



-- 
To view, visit http://gerrit.cloudera.org:8080/10910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I553412318525820a36d2f401aa7e93958d22f70e
Gerrit-Change-Number: 10910
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Thu, 12 Jul 2018 22:21:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2422: Fix escaping in the LIKE clause

Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/10910 )

Change subject: IMPALA-2422: Fix escaping in the LIKE clause
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10910/1/be/src/exprs/like-predicate.cc
File be/src/exprs/like-predicate.cc:

http://gerrit.cloudera.org:8080/#/c/10910/1/be/src/exprs/like-predicate.cc@265
PS1, Line 265: 	  StringValue tmp_val = StringValue::FromStringVal(val);
> ws
Oops , this was temporary garbage that I forgot to remove (same as below).


http://gerrit.cloudera.org:8080/#/c/10910/1/fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
File fe/src/main/java/org/apache/impala/analysis/LikePredicate.java:

http://gerrit.cloudera.org:8080/#/c/10910/1/fe/src/main/java/org/apache/impala/analysis/LikePredicate.java@132
PS1, Line 132: StringLiteral
> basic question: what happens with the interaction between constant folding 
If the second child becomes a string literal due to constant folding, then the NeedsUnescaping will not be flipped to false here. And I think this is OK.

I added a test for this.


http://gerrit.cloudera.org:8080/#/c/10910/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

http://gerrit.cloudera.org:8080/#/c/10910/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@a304
PS1, Line 304: 
> basic question: this changes the test. are inputs expected to change? just 
Not quite sure what you mean. The test is the same as before, except we remove one level of escaping. Before, with four slashes the escaping worked as follows:
1. The first level escaped the Java escaping
2. The second level escaped the FE escaping. So the BE would receive this: "\_"

After my change, the backend receives the same thing because the FE escaping is eliminated.



-- 
To view, visit http://gerrit.cloudera.org:8080/10910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I553412318525820a36d2f401aa7e93958d22f70e
Gerrit-Change-Number: 10910
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Thu, 12 Jul 2018 02:43:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2422: Fix escaping in the LIKE clause

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10910 )

Change subject: IMPALA-2422: Fix escaping in the LIKE clause
......................................................................


Patch Set 2: Code-Review-1

This is stale but I'm -1ing because of the issue I previously mentioned.


-- 
To view, visit http://gerrit.cloudera.org:8080/10910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I553412318525820a36d2f401aa7e93958d22f70e
Gerrit-Change-Number: 10910
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 Aug 2018 16:59:05 +0000
Gerrit-HasComments: No