You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Vincent Tran (Code Review)" <ge...@cloudera.org> on 2017/06/13 13:39:44 UTC

[Impala-ASF-CR] IMPALA-5494: NOT IN predicate shares the same selectivity as IN predicate.

Vincent Tran has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7168

Change subject: IMPALA-5494: NOT IN predicate shares the same selectivity as IN predicate.
......................................................................

IMPALA-5494: NOT IN predicate shares the same selectivity as
IN predicate.

This change modifies the logic of NOT IN predicate so that
the planner can calculate the correct node cardinality. Prior
to this change, both IN and NOT IN predicates shared the
same selectivity, which resulted in the same cardinality
during planning.

The selectivity is calculated by the following heuristic:

selectivity = 1 - (num of predicae children /
                num of distinct values)

Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
---
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M tests/metadata/test_explain.py
2 files changed, 17 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7168/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>

[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by "Vincent Tran (Code Review)" <ge...@cloudera.org>.
Vincent Tran has uploaded a new patch set (#6).

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
......................................................................

IMPALA-5494: Fixes the selectivity of NOT IN predicates

This change modifies the logic of NOT IN predicate so that
the planner can calculate the correct node cardinality. Prior
to this change, both IN and NOT IN predicates shared the
same selectivity, which resulted in the same cardinality
during planning.

The selectivity is calculated by the following heuristic:

selectivity = 1 - (num of predicate children /
                num of distinct values)

Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
---
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
A fe/src/test/java/org/apache/impala/analysis/ExprSelectivityTest.java
2 files changed, 79 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7168/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>

[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
......................................................................


Patch Set 8:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/744/

-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5494: NOT IN predicate shares the same selectivity as IN predicate.

Posted by "Vincent Tran (Code Review)" <ge...@cloudera.org>.
Vincent Tran has uploaded a new patch set (#2).

Change subject: IMPALA-5494: NOT IN predicate shares the same selectivity as IN predicate.
......................................................................

IMPALA-5494: NOT IN predicate shares the same selectivity as
IN predicate.

This change modifies the logic of NOT IN predicate so that
the planner can calculate the correct node cardinality. Prior
to this change, both IN and NOT IN predicates shared the
same selectivity, which resulted in the same cardinality
during planning.

The selectivity is calculated by the following heuristic:

selectivity = 1 - (num of predicae children /
                num of distinct values)

Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
---
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M tests/metadata/test_explain.py
2 files changed, 17 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7168/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>

[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7168/3/tests/metadata/test_explain.py
File tests/metadata/test_explain.py:

Line 105:     result = self.execute_query("explain select * from %s.%s where id "
> Instead of this end-to-end test, please add a new FE unit test ExprSelectiv
Agree, e-e test seems like an overkill.


-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>
Gerrit-HasComments: Yes

Re: [Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by Alexander Behm <al...@cloudera.com>.
Planner test failed in PlannerTest.testKuduSelectivity. Please fix,

On Wed, Jun 14, 2017 at 9:16 PM, Impala Public Jenkins (Code Review) <
gerrit@cloudera.org> wrote:

> Impala Public Jenkins has posted comments on this change.
>
> Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
> ......................................................................
>
>
> Patch Set 7: Verified-1
>
> Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/731/
>
> --
> To view, visit http://gerrit.cloudera.org:8080/7168
> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>
> Gerrit-MessageType: comment
> Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
> Gerrit-PatchSet: 7
> Gerrit-Project: Impala-ASF
> Gerrit-Branch: master
> Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
> Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
> Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
> Gerrit-Reviewer: Impala Public Jenkins
> Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>
> Gerrit-HasComments: No
>
> --
> You received this message because you are subscribed to the Google Groups
> "impala-cr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to impala-cr+unsubscribe@cloudera.com.
> For more options, visit https://groups.google.com/a/cloudera.com/d/optout.
>

[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
......................................................................


Patch Set 7: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/731/

-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by "Vincent Tran (Code Review)" <ge...@cloudera.org>.
Vincent Tran has posted comments on this change.

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
......................................................................


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/7168/6/fe/src/test/java/org/apache/impala/analysis/ExprSelectivityTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprSelectivityTest.java:

Line 31:     public void verifySelectivityStmt(String stmtStr, double expectedSel)
> we indent 2
Done


Line 33:       SelectStmt stmt = (SelectStmt) ParsesOk(stmtStr);
> simplify with AnalyzesOk()
Done


Line 38:       assertEquals(calculatedSel,expectedSel);
> space after comma
Done


Line 43:       String stmtStr = "select * from functional.alltypes where id " +
> simplify this to "select <predicate> from functional.alltypes"
Done


Line 52:     private double getPredSel(int numPredChild) {
> Very specific to IN predicate. How about we hardcode the expected values fo
Done


Line 69:       verifySel("in (1,3,5,7)", getPredSel(4));
> Seems cleaner to provide the full expr as input and not assume "id". That w
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
......................................................................


Patch Set 3:

(1 comment)

Thanks for finding and fixing this.

http://gerrit.cloudera.org:8080/#/c/7168/3/tests/metadata/test_explain.py
File tests/metadata/test_explain.py:

Line 105:     result = self.execute_query("explain select * from %s.%s where id "
Instead of this end-to-end test, please add a new FE unit test ExprSelectivityTest similar to ExprNdvTest. Relying on a full explain seems like overkill, and checking the scan cardinality is really an indirect test.


-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
......................................................................


Patch Set 7: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>
Gerrit-HasComments: No

Re: [Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by Alexander Behm <al...@cloudera.com>.
Random failure to start Kudu:
F0616 16:05:10.200038 23162 tablet_server_main.cc:72] Check failed: _s.ok()
Bad status: Service unavailable: Cannot initialize clock: Error reading
clock. Clock considered unsynchronized

Will retry.

On Fri, Jun 16, 2017 at 11:24 AM, Impala Public Jenkins (Code Review) <
gerrit@cloudera.org> wrote:

> Impala Public Jenkins has posted comments on this change.
>
> Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
> ......................................................................
>
>
> Patch Set 8: Verified-1
>
> Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/742/
>
> --
> To view, visit http://gerrit.cloudera.org:8080/7168
> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>
> Gerrit-MessageType: comment
> Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
> Gerrit-PatchSet: 8
> Gerrit-Project: Impala-ASF
> Gerrit-Branch: master
> Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
> Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
> Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
> Gerrit-Reviewer: Impala Public Jenkins
> Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>
> Gerrit-HasComments: No
>
> --
> You received this message because you are subscribed to the Google Groups
> "impala-cr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to impala-cr+unsubscribe@cloudera.com.
> For more options, visit https://groups.google.com/a/cloudera.com/d/optout.
>

[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
......................................................................


Patch Set 8: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/742/

-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by "Vincent Tran (Code Review)" <ge...@cloudera.org>.
Vincent Tran has posted comments on this change.

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
......................................................................


Patch Set 7:

One query plan in kudu-selectivity.test had a NOT IN predicate. The actual scan node cardinality was changed from 4 to 5. The order of predicates was also changed (NOT IN has the highest cardinality so was moved to the most right side of the predicate list). Private build looks good now.

-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by "Vincent Tran (Code Review)" <ge...@cloudera.org>.
Vincent Tran has posted comments on this change.

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7168/2//COMMIT_MSG
Commit Message:

PS2, Line 7: NOT IN predicate shares the same selectivity as
           : IN predicate.
> Change to what this commit fixes may be? Something like Fix the selectivity
Done


PS2, Line 18: predicae
> typo
Done


http://gerrit.cloudera.org:8080/#/c/7168/2/tests/metadata/test_explain.py
File tests/metadata/test_explain.py:

Line 104:     # IN predicate, cardinality should be 7300*(num of children/num of distinct values)
> Not totally sure if there is a better place to add these tests, Alex any id
Maybe I should just specify the expected value as a constant?


Line 109:     # NOT IN predicate, cardinality should be 7300*(1-(num of children/num of distinct values))
> nit: long line.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
......................................................................


Patch Set 8: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
......................................................................


IMPALA-5494: Fixes the selectivity of NOT IN predicates

This change modifies the logic of NOT IN predicate so that
the planner can calculate the correct node cardinality. Prior
to this change, both IN and NOT IN predicates shared the
same selectivity, which resulted in the same cardinality
during planning.

The selectivity is calculated by the following heuristic:

selectivity = 1 - (num of predicate children /
                num of distinct values)

Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Reviewed-on: http://gerrit.cloudera.org:8080/7168
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
A fe/src/test/java/org/apache/impala/analysis/ExprSelectivityTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
3 files changed, 73 insertions(+), 6 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>

[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by "Vincent Tran (Code Review)" <ge...@cloudera.org>.
Vincent Tran has uploaded a new patch set (#5).

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
......................................................................

IMPALA-5494: Fixes the selectivity of NOT IN predicates

This change modifies the logic of NOT IN predicate so that
the planner can calculate the correct node cardinality. Prior
to this change, both IN and NOT IN predicates shared the
same selectivity, which resulted in the same cardinality
during planning.

The selectivity is calculated by the following heuristic:

selectivity = 1 - (num of predicate children /
                num of distinct values)

Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
---
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
A fe/src/test/java/org/apache/impala/analysis/ExprSelectivityTest.java
2 files changed, 79 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7168/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>

[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
......................................................................


Patch Set 7:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/731/

-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5494: NOT IN predicate shares the same selectivity as IN predicate.

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-5494: NOT IN predicate shares the same selectivity as IN predicate.
......................................................................


Patch Set 2:

(4 comments)

Patch looks good to me overall, I can +1 it once you fix the nits.

http://gerrit.cloudera.org:8080/#/c/7168/2//COMMIT_MSG
Commit Message:

PS2, Line 7: NOT IN predicate shares the same selectivity as
           : IN predicate.
Change to what this commit fixes may be? Something like Fix the selectivity of NOT IN .... Easier to read.


PS2, Line 18: predicae
typo


http://gerrit.cloudera.org:8080/#/c/7168/2/tests/metadata/test_explain.py
File tests/metadata/test_explain.py:

Line 104:     # IN predicate, cardinality should be 7300*(num of children/num of distinct values)
Not totally sure if there is a better place to add these tests, Alex any idea?


Line 109:     # NOT IN predicate, cardinality should be 7300*(1-(num of children/num of distinct values))
nit: long line.


-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5494: NOT IN predicate shares the same selectivity as IN predicate.

Posted by "Vincent Tran (Code Review)" <ge...@cloudera.org>.
Vincent Tran has posted comments on this change.

Change subject: IMPALA-5494: NOT IN predicate shares the same selectivity as IN predicate.
......................................................................


Patch Set 2:

removed extraneous space characters.

-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by "Vincent Tran (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, Alex Behm,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7168

to look at the new patch set (#8).

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
......................................................................

IMPALA-5494: Fixes the selectivity of NOT IN predicates

This change modifies the logic of NOT IN predicate so that
the planner can calculate the correct node cardinality. Prior
to this change, both IN and NOT IN predicates shared the
same selectivity, which resulted in the same cardinality
during planning.

The selectivity is calculated by the following heuristic:

selectivity = 1 - (num of predicate children /
                num of distinct values)

Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
---
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
A fe/src/test/java/org/apache/impala/analysis/ExprSelectivityTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
3 files changed, 73 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7168/8
-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>

[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
......................................................................


Patch Set 8: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by "Vincent Tran (Code Review)" <ge...@cloudera.org>.
Vincent Tran has uploaded a new patch set (#4).

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
......................................................................

IMPALA-5494: Fixes the selectivity of NOT IN predicates

This change modifies the logic of NOT IN predicate so that
the planner can calculate the correct node cardinality. Prior
to this change, both IN and NOT IN predicates shared the
same selectivity, which resulted in the same cardinality
during planning.

The selectivity is calculated by the following heuristic:

selectivity = 1 - (num of predicate children /
                num of distinct values)

Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
---
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
1 file changed, 7 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7168/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>

[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
......................................................................


Patch Set 8:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/742/

-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by "Vincent Tran (Code Review)" <ge...@cloudera.org>.
Vincent Tran has uploaded a new patch set (#3).

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
......................................................................

IMPALA-5494: Fixes the selectivity of NOT IN predicates

This change modifies the logic of NOT IN predicate so that
the planner can calculate the correct node cardinality. Prior
to this change, both IN and NOT IN predicates shared the
same selectivity, which resulted in the same cardinality
during planning.

The selectivity is calculated by the following heuristic:

selectivity = 1 - (num of predicate children /
                num of distinct values)

Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
---
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
M tests/metadata/test_explain.py
2 files changed, 18 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7168/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>

[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by "Vincent Tran (Code Review)" <ge...@cloudera.org>.
Vincent Tran has uploaded a new patch set (#7).

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
......................................................................

IMPALA-5494: Fixes the selectivity of NOT IN predicates

This change modifies the logic of NOT IN predicate so that
the planner can calculate the correct node cardinality. Prior
to this change, both IN and NOT IN predicates shared the
same selectivity, which resulted in the same cardinality
during planning.

The selectivity is calculated by the following heuristic:

selectivity = 1 - (num of predicate children /
                num of distinct values)

Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
---
M fe/src/main/java/org/apache/impala/analysis/InPredicate.java
A fe/src/test/java/org/apache/impala/analysis/ExprSelectivityTest.java
2 files changed, 71 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7168/7
-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>

[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
......................................................................


Patch Set 6:

(6 comments)

Thanks for adding the new test!

http://gerrit.cloudera.org:8080/#/c/7168/6/fe/src/test/java/org/apache/impala/analysis/ExprSelectivityTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprSelectivityTest.java:

Line 31:     public void verifySelectivityStmt(String stmtStr, double expectedSel)
we indent 2


Line 33:       SelectStmt stmt = (SelectStmt) ParsesOk(stmtStr);
simplify with AnalyzesOk()


Line 38:       assertEquals(calculatedSel,expectedSel);
space after comma


Line 43:       String stmtStr = "select * from functional.alltypes where id " +
simplify this to "select <predicate> from functional.alltypes"


Line 52:     private double getPredSel(int numPredChild) {
Very specific to IN predicate. How about we hardcode the expected values for now? It's not clear to me that it makes sense to re-implement the same logic again for testing purposes.


Line 69:       verifySel("in (1,3,5,7)", getPredSel(4));
Seems cleaner to provide the full expr as input and not assume "id". That way this framework should work for arbitrary exprs.


-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5494: Fixes the selectivity of NOT IN predicates

Posted by "Vincent Tran (Code Review)" <ge...@cloudera.org>.
Vincent Tran has posted comments on this change.

Change subject: IMPALA-5494: Fixes the selectivity of NOT IN predicates
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7168/3/tests/metadata/test_explain.py
File tests/metadata/test_explain.py:

Line 105:     result = self.execute_query("explain select * from %s.%s where id "
> Instead of this end-to-end test, please add a new FE unit test ExprSelectiv
That makes sense. Done.


Line 105:     result = self.execute_query("explain select * from %s.%s where id "
> Agree, e-e test seems like an overkill.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran <vt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Vincent Tran <vt...@cloudera.com>
Gerrit-HasComments: Yes