You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zach Amsden (Code Review)" <ge...@cloudera.org> on 2017/04/06 23:43:51 UTC

[Impala-ASF-CR] IMPALA-5180: Don't balk at non-deterministic exprs

Zach Amsden has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6575

Change subject: IMPALA-5180: Don't balk at non-deterministic exprs
......................................................................

IMPALA-5180: Don't balk at non-deterministic exprs

HDFS partition pruning can run afoul of our logic of considering
Exprs which have no SlotRefs as bound by default when it gets a
non-deterministic expression.  The fix is to simply not use such
expressions for considering partition pruning.

Change-Id: I91054c6bf017401242259a1eff5e859085285546
---
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
3 files changed, 21 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/6575/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition pruning
......................................................................


Patch Set 11: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition pruning
......................................................................


Patch Set 12:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/595/

-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition pruning
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6575/9/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java:

Line 112:           !conjunct.contains(Expr.IS_NONDETERMINISTIC_BUILTIN_FN_PREDICATE)) {
> Since we ignore these predicates here, they still get attached to the plan 
Yes, I think so:
* dropping the predicate would be wrong
* evaluating the predicate against partitions would arguably change the query semantics based on a physical design choice
* the same query against an unpartitioned table might give very different results

So we are left with evaluating the predicate for every row.


-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, Alex Behm,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6575

to look at the new patch set (#10).

Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition pruning
......................................................................

IMPALA-5180: Don't use non-deterministic exprs in partition pruning

Non-deterministic exprs which evaluate as constant should not be
used during HDFS partition pruning.  We consider Exprs which have no
SlotRefs as bound by default, and thus we end up trying to apply
them indisrciminately.  Constant propagation makes this situation
easier to run into and the behavior is rather unexpected.

The fix for now is to explicitly disallow non-deterministic Exprs
in partition pruning.

Change-Id: I91054c6bf017401242259a1eff5e859085285546
---
M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
4 files changed, 40 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/6575/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5180: Don't balk at non-deterministic exprs

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5180: Don't balk at non-deterministic exprs
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6575/2/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 1052:   public boolean isStrictlyBoundBySlotIds(List<SlotId> slotIds) {
I'm not a fan of this new function. It feels like a quick hack to solve a very narrow problem that nobody has ever complained about.

I'm much more in favor of fixing the isBoundBy() family of functions to behave more appropriately when invoked on constant expressions. The changes I had in mind:
- Must be invoked by a non-empty list of slot/tuple ids
- Returns false for constant expressions
- Maybe require that the expr must be analyzed? This one might be somewhat difficult and maybe not necessary for now


-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition pruning
......................................................................


Patch Set 9:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/525/

-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition pruning
......................................................................


Patch Set 9: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5180: Don't balk at non-deterministic exprs

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#2).

Change subject: IMPALA-5180: Don't balk at non-deterministic exprs
......................................................................

IMPALA-5180: Don't balk at non-deterministic exprs

HDFS partition pruning can run afoul of our logic of considering
Exprs which have no SlotRefs as bound by default when it gets a
non-deterministic expression.  The fix is to simply not use such
expressions for considering partition pruning.

Change-Id: I91054c6bf017401242259a1eff5e859085285546
---
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
4 files changed, 52 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/6575/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>

[Impala-ASF-CR] IMPALA-5180: Don't balk at non-deterministic exprs

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5180: Don't balk at non-deterministic exprs
......................................................................


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6575/3/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 1029:   public boolean isBoundByTupleIds(List<TupleId> tids) {
> I thought about this a little more and I think fixing the isBound() functio
While consistency is nice, I'd rather keep this fix small and contained.


Line 1037:    * Returns true if expr is fully bound by slotIds and contains at least
> Not your change, but can you modify the comment to explain what "bound" mea
Done


Line 1045:     HashSet<SlotId> idSet = Sets.newHashSet();
> idSet -> exprSlotIds
Done


Line 1046:     int members = 0;
> Why is this needed? Do you mean Preconditions.checkState(!exprSlotIds.isEmp
I could just return false in that case, will probably be shorter and more concise.


Line 1047:     this.getIdsHelper(null, idSet);
> remove 'this'
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5180: Don't balk at non-deterministic exprs

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#4).

Change subject: IMPALA-5180: Don't balk at non-deterministic exprs
......................................................................

IMPALA-5180: Don't balk at non-deterministic exprs

HDFS partition pruning can run afoul of our logic of considering
Exprs which have no SlotRefs as bound by default when it gets a
non-deterministic expression.  The fix is to simply not consider
constant and unbound exprs as bound to slots, which was
counter-intuitive.

Change-Id: I91054c6bf017401242259a1eff5e859085285546
---
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
6 files changed, 46 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/6575/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5180: Don't balk at non-deterministic exprs

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5180: Don't balk at non-deterministic exprs
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6575/2/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 1052:   public boolean isStrictlyBoundBySlotIds(List<SlotId> slotIds) {
> I'm not a fan of this new function. It feels like a quick hack to solve a v
I just fixed the original function.  It is only called in two places so the change is easy to reason about.

The first requirement turned out to require too much code restructuring - it is simpler to just return false if the list of slot IDs is empty.  Otherwise, we have to duplicate or otherwise mangle the code in both HdfsPartitionPruner and PartitionFilter.

We do require it to be analyzed, as the slotRef must be analyzed to get its slotId.


-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition pruning
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6575/3/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 1029:   public boolean isBoundByTupleIds(List<TupleId> tids) {
> While consistency is nice, I'd rather keep this fix small and contained.
I don't think consistency is just "nice" here. The isBound() family of functions is a core FE concept that we rely on in many places. It's bad if someone else reading the code cannot understand the meaning or gets confused due to inconsistency. These little inconsistencies have a tendency to cause other problems in the future.

We are introducing this inconsistency for a rather minor bugfix, which I think is the wrong tradeoff.

If the goal of this patch is to fix IMPALA-5180 specifically, then it's simpler to have HdfsPartitionPruner.prunePartitions() check

if (conjunct.isBoundBySlotIds(partitionSlots_) && !conjunct.contains(Expr.IS_NONDETERMINISTIC_BUILTIN_FN_PREDICATE)) {
  // Use conjunct for partition pruning
  ...
}

and not change the meaning of isBoundBySlotIds()


-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5180: Don't balk at non-deterministic exprs

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5180: Don't balk at non-deterministic exprs
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6575/3/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 1029:   public boolean isBoundByTupleIds(List<TupleId> tids) {
> What about this? Our isBound() family of functions should be consistent.
I thought about this a little more and I think fixing the isBound() functions in a consistent way may be quite difficult. My guess is that if you make corresponding changes to this function, then we end up not assigning the rand() predicate in your new tests. Addressing might be hard because our predicate assignment relies on isBound() (yes, it is already broken today in many ways for unbound, non-deterministic predicates).


-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5180: Don't balk at non-deterministic exprs

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5180: Don't balk at non-deterministic exprs
......................................................................


Patch Set 3:

(8 comments)

Fix looks much better

http://gerrit.cloudera.org:8080/#/c/6575/3//COMMIT_MSG
Commit Message:

Line 7: IMPALA-5180: Don't balk at non-deterministic exprs
Msg is cryptic. Can you summarize what is changed in this patch, as opposed to what bug/behavior is fixed?


Line 9: HDFS partition pruning can run afoul of our logic of considering
Not sure what "run afoul" means, please be specific.


http://gerrit.cloudera.org:8080/#/c/6575/3/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 1029:   public boolean isBoundByTupleIds(List<TupleId> tids) {
What about this? Our isBound() family of functions should be consistent.


Line 1037:    * Returns true if expr is fully bound by slotIds and contains at least
Not your change, but can you modify the comment to explain what "bound" means? The concept has caused a lot of confusion in the past especially to newcomers.
Also, please split up the comment into more sentences.


Line 1045:     HashSet<SlotId> idSet = Sets.newHashSet();
idSet -> exprSlotIds


Line 1046:     int members = 0;
Why is this needed? Do you mean Preconditions.checkState(!exprSlotIds.isEmpty())?


Line 1047:     this.getIdsHelper(null, idSet);
remove 'this'


http://gerrit.cloudera.org:8080/#/c/6575/3/testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
File testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test:

Line 1026: select count(*) from
keep this query as simple as possible, i.e., same as above


-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#7).

Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition pruning
......................................................................

IMPALA-5180: Don't use non-deterministic exprs in partition pruning

Non-deterministic exprs which evaluate as constant should not be
used during HDFS partition pruning.  We consider Exprs which have no
SlotRefs as bound by default, and thus we end up trying to apply
them indisrciminately.  Constant propagation makes this situation
easier to run into and the behavior is rather unexpected.

The fix for now is to explicitly disallow non-deterministic Exprs
in partition pruning.

Change-Id: I91054c6bf017401242259a1eff5e859085285546
---
M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
4 files changed, 36 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/6575/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition pruning
......................................................................


Patch Set 12: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition pruning
......................................................................


IMPALA-5180: Don't use non-deterministic exprs in partition pruning

Non-deterministic exprs which evaluate as constant should not be
used during HDFS partition pruning.  We consider Exprs which have no
SlotRefs as bound by default, and thus we end up trying to apply
them indisrciminately.  Constant propagation makes this situation
easier to run into and the behavior is rather unexpected.

The fix for now is to explicitly disallow non-deterministic Exprs
in partition pruning.

Change-Id: I91054c6bf017401242259a1eff5e859085285546
Reviewed-on: http://gerrit.cloudera.org:8080/6575
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
4 files changed, 40 insertions(+), 10 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition pruning
......................................................................


Patch Set 12:

Rebased and ran planner test

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.impala.planner.PlannerTest
Tests run: 53, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 43.494 sec - in org.apache.impala.planner.PlannerTest

-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5180: Don't balk at non-deterministic exprs

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#3).

Change subject: IMPALA-5180: Don't balk at non-deterministic exprs
......................................................................

IMPALA-5180: Don't balk at non-deterministic exprs

HDFS partition pruning can run afoul of our logic of considering
Exprs which have no SlotRefs as bound by default when it gets a
non-deterministic expression.  The fix is to simply not consider
constant and unbound exprs as bound to slots, which was
counter-intuitive.

Change-Id: I91054c6bf017401242259a1eff5e859085285546
---
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
6 files changed, 48 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/6575/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition pruning
......................................................................


Patch Set 9: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/525/

-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition pruning
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6575/9/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java:

Line 112:           !conjunct.contains(Expr.IS_NONDETERMINISTIC_BUILTIN_FN_PREDICATE)) {
Since we ignore these predicates here, they still get attached to the plan node - resulting in this:

Failed tests: 
  PlannerTest.testHdfs:96->PlannerTestBase.runPlannerTestFile:764->PlannerTestBase.runPlannerTestFile:759 
Expected failure, but query produced PLAN.
Query:
select * from functional.alltypes where rand() > year

PLAN:
PLAN-ROOT SINK
|
00:SCAN HDFS [functional.alltypes]
   partitions=24/24 files=24 size=478.45KB
   predicates: year < rand()


So we can actually evaluate randomized predicates now, they just aren't used for partition filtering.  Do we actually want to do that?


-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition pruning
......................................................................


Patch Set 12: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#6).

Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition pruning
......................................................................

IMPALA-5180: Don't use non-deterministic exprs in partition pruning

Non-deterministic exprs which evaluate as constant should not be
used during HDFS partition pruning.  Previously, we considered
Exprs which have no SlotRefs as bound by default, and thus we end
up trying to apply them indisrciminately.  Constant propagation
makes this situation much more easy to run into and the behavior
is rather unexpected.

Change-Id: I91054c6bf017401242259a1eff5e859085285546
---
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
6 files changed, 46 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/6575/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5180: Don't balk at non-deterministic exprs

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5180: Don't balk at non-deterministic exprs
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6575/1/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 1053:     if (slotIds == null || slotIds.size() == 0) return false;
This was a horrible mistake.  The result was that Hdfs tables with no partitions attached every conjunct to an HdfsPartitionFilter.  This invoked a demon lying deep in the code where expressions involving nested types that get evaluated in the context of a partition filter are all non-constant and cause HdfsPartitionFilter to vomit blood.


-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5180: Don't use non-deterministic exprs in partition pruning

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#8).

Change subject: IMPALA-5180: Don't use non-deterministic exprs in partition pruning
......................................................................

IMPALA-5180: Don't use non-deterministic exprs in partition pruning

Non-deterministic exprs which evaluate as constant should not be
used during HDFS partition pruning.  We consider Exprs which have no
SlotRefs as bound by default, and thus we end up trying to apply
them indisrciminately.  Constant propagation makes this situation
easier to run into and the behavior is rather unexpected.

The fix for now is to explicitly disallow non-deterministic Exprs
in partition pruning.

Change-Id: I91054c6bf017401242259a1eff5e859085285546
---
M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
4 files changed, 35 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/6575/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91054c6bf017401242259a1eff5e859085285546
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>