You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org> on 2017/02/06 18:49:08 UTC

[Impala-ASF-CR] IMPALA-4731: Crash when sorting on non-deterministic expr

Thomas Tauber-Marshall has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5914

Change subject: IMPALA-4731: Crash when sorting on non-deterministic expr
......................................................................

IMPALA-4731: Crash when sorting on non-deterministic expr

When sorting on a non-deterministic expr, eg. 'order by random()',
we lazily evaluate the exprs being sorted on, leading to a different
value being generated for each tuple each time it is compared to
something during the sort.

This leads to violations of assumptions made by our sorting algorithm,
eg. in insertion sort the assumption that the portion that has already
been processed is in fact in sorted order, which leads to DCHECK
failures and crashes.

The fix is to switch from lazy evaluation to eager evaluation by
materializing the exprs into the sort tuple.

Testing:
- Added e2e test in test_sort.py that verifies 'order by random()'
  behaves as expected.

Change-Id: I5dcda32fc7770d42fc500ce87fc54d58e5b5dc00
---
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M testdata/workloads/functional-query/queries/QueryTest/sort.test
M tests/query_test/test_sort.py
3 files changed, 46 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/5914/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I5dcda32fc7770d42fc500ce87fc54d58e5b5dc00
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-4731: Crash when sorting on non-deterministic expr

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4731: Crash when sorting on non-deterministic expr
......................................................................


Patch Set 2:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/252/

-- 
To view, visit http://gerrit.cloudera.org:8080/5914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5dcda32fc7770d42fc500ce87fc54d58e5b5dc00
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has uploaded a new patch set (#3).

Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
......................................................................

IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs

Previously, exprs used in sorts were evaluated lazily. This can
potentially be bad for performance if the exprs are expensive to
evaluate, and it can lead to crashes if the exprs are
non-deterministic, as this violates assumptions of our sorting
algorithm.

This patch addresses these issues by materializing ordering exprs.
It does so when the expr is non-deterministic (including when it
contains a UDF, which we cannot currently know if they are
non-deterministic), or when its cost exceeds a threshold (or the
cost is unknown).

It also introduces the query option 'materialize_sort' which
overrides the above decision and forces materialization (or
non-materialization).

Testing:
- Added e2e tests in test_sort.py and test_queries.py.

Change-Id: I5dcda32fc7770d42fc500ce87fc54d58e5b5dc00
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionName.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/values.test
M testdata/workloads/functional-planner/queries/PlannerTest/views.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test
M testdata/workloads/functional-query/queries/QueryTest/sort.test
M tests/query_test/test_sort.py
42 files changed, 926 insertions(+), 702 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/5914/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5dcda32fc7770d42fc500ce87fc54d58e5b5dc00
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4731: Crash when sorting on non-deterministic expr

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4731: Crash when sorting on non-deterministic expr
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/5914/1//COMMIT_MSG
Commit Message:

Line 7: IMPALA-4731: Crash when sorting on non-deterministic expr
This also may fix IMPALA-4728 or IMPALA-397.

Might be good to add a targeted-perf test to confirm it fixes the perf bug IMPALA-4728. I think Greg Rahn had some examples of analytic functions where lazy materialisation was very slow.


http://gerrit.cloudera.org:8080/#/c/5914/1/fe/src/main/java/org/apache/impala/analysis/SortInfo.java
File fe/src/main/java/org/apache/impala/analysis/SortInfo.java:

Line 194:       } else {
This is a weird corner case case, but I think we should drop LiteralExprs instead of materialising them. It's plausible that real-world queries might have order-by expressions that can be rewritten into constants.


Line 207:     substituteOrderingExprs(substOrderBy, analyzer);
I think we only need to substitute the SlotRefs in the first branch of the if() above. We could just do the substitution there. Otherwise it's a bit hard to reason about materializedOrderingExprs containing a mix of SlotRefs referring to the old and new tuples.


http://gerrit.cloudera.org:8080/#/c/5914/1/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

Line 162:   def test_order_by_random(self):
I think the example in IMPALA-397 would also be a good test - there rand() is in the select list so we can verify that the order is correct.


-- 
To view, visit http://gerrit.cloudera.org:8080/5914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5dcda32fc7770d42fc500ce87fc54d58e5b5dc00
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4731: Crash when sorting on non-deterministic expr

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-4731: Crash when sorting on non-deterministic expr
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/5914/1//COMMIT_MSG
Commit Message:

Line 7: IMPALA-4731: Crash when sorting on non-deterministic expr
> This also may fix IMPALA-4728 or IMPALA-397.
It does fix IMPALA-397

It doesn't fix IMPALA-4728 as creation of the SortInfo for analytic functions doesn't go through this code path.

Greg said that the main motivation behind IMPALA-4728, to_date() being slow, has been fixed now, so I think we can hold off on dealing with that one for now.


http://gerrit.cloudera.org:8080/#/c/5914/1/fe/src/main/java/org/apache/impala/analysis/SortInfo.java
File fe/src/main/java/org/apache/impala/analysis/SortInfo.java:

Line 194:         // entirely if this eliminates all the ordering exprs.
> This is a weird corner case case, but I think we should drop LiteralExprs i
Done


Line 207:         materializedOrderingExprs.add(materializedRef);
> I think we only need to substitute the SlotRefs in the first branch of the 
Done


http://gerrit.cloudera.org:8080/#/c/5914/1/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

Line 162:   def test_order_by_random(self):
> I think the example in IMPALA-397 would also be a good test - there rand() 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5dcda32fc7770d42fc500ce87fc54d58e5b5dc00
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4731: Crash when sorting on non-deterministic expr

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has uploaded a new patch set (#2).

Change subject: IMPALA-4731: Crash when sorting on non-deterministic expr
......................................................................

IMPALA-4731: Crash when sorting on non-deterministic expr

When sorting on a non-deterministic expr, eg. 'order by random()',
we lazily evaluate the exprs being sorted on, leading to a different
value being generated for each tuple each time it is compared to
something during the sort.

This leads to violations of assumptions made by our sorting algorithm,
eg. in insertion sort the assumption that the portion that has already
been processed is in fact in sorted order, which leads to DCHECK
failures and crashes.

The fix is to switch from lazy evaluation to eager evaluation by
materializing the exprs into the sort tuple.

This also addresses IMPALA-397.

Testing:
- Added e2e test in test_sort.py that verifies 'order by random()'
  behaves as expected.

Change-Id: I5dcda32fc7770d42fc500ce87fc54d58e5b5dc00
---
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M testdata/workloads/functional-query/queries/QueryTest/sort.test
M tests/query_test/test_sort.py
3 files changed, 86 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/5914/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5dcda32fc7770d42fc500ce87fc54d58e5b5dc00
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs

Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5914/4/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

PS4, Line 226: materialize_sort
Right now this query options can have 3 values: true, false, not set and each one leads to a different behavior. I think that's confusing. I would rename this to "always_materialize_sort" and set it to false by default.


http://gerrit.cloudera.org:8080/#/c/5914/4/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 272:   public boolean computeIsDeterministic() {
If the function name starts with "compute" that implies to me that it will compute the value and store it somewhere. Instead, this value is returned. Maybe rename it to simply isDeterministic()


http://gerrit.cloudera.org:8080/#/c/5914/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java
File fe/src/main/java/org/apache/impala/analysis/SortInfo.java:

Line 227:             return new ExprSubstitutionMap();
Why not let it print all nondeterministic expressions? i.e. remove the return here.


-- 
To view, visit http://gerrit.cloudera.org:8080/5914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5dcda32fc7770d42fc500ce87fc54d58e5b5dc00
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has uploaded a new patch set (#4).

Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
......................................................................

IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs

Previously, exprs used in sorts were evaluated lazily. This can
potentially be bad for performance if the exprs are expensive to
evaluate, and it can lead to crashes if the exprs are
non-deterministic, as this violates assumptions of our sorting
algorithm.

This patch addresses these issues by materializing ordering exprs.
It does so when the expr is non-deterministic (including when it
contains a UDF, which we cannot currently know if they are
non-deterministic), or when its cost exceeds a threshold (or the
cost is unknown).

It also introduces the query option 'materialize_sort' which
overrides the above decision and forces materialization (or
non-materialization).

Testing:
- Added e2e tests in test_sort.py and test_queries.py.
- Updated planner tests.

Change-Id: I5dcda32fc7770d42fc500ce87fc54d58e5b5dc00
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionName.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/values.test
M testdata/workloads/functional-planner/queries/PlannerTest/views.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test
M testdata/workloads/functional-query/queries/QueryTest/sort.test
M tests/query_test/test_sort.py
42 files changed, 910 insertions(+), 699 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/5914/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5dcda32fc7770d42fc500ce87fc54d58e5b5dc00
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
......................................................................


Patch Set 4:

Can you make the design doc public and link to it? It's shared with me but other reviewers may want to look at it.

-- 
To view, visit http://gerrit.cloudera.org:8080/5914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5dcda32fc7770d42fc500ce87fc54d58e5b5dc00
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
......................................................................


Patch Set 4:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/5914/4/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

Line 257:   // non-deterministic expression.
I don't think we should trust users not to set this to false on non-deterministic expressions - they shouldn't be able to potentially crash an Impala daemon with a bad query option value.


http://gerrit.cloudera.org:8080/#/c/5914/4/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 60:   private final static List<String> NONDETERMINISTIC_BUILTINS =
https://gerrit.cloudera.org/#/c/5904/ also factored out this list, so when you rebase on top of master we should be able to reuse that.


http://gerrit.cloudera.org:8080/#/c/5914/4/fe/src/main/java/org/apache/impala/analysis/SortInfo.java
File fe/src/main/java/org/apache/impala/analysis/SortInfo.java:

Line 39:   private static final int SORT_MATERIALIZATION_COST_THRESHOLD = 5;
Comment on how it's derived? If it's somewhat arbitrary its good for readers to know.


Line 225:             analyzer.addWarning("Expression '" + orderingExpr.toSql() + " may be " +
I mentioned in an earlier comment - this seems to be unsafe - we shouldn't trust the query option too much.


Line 246:         // LiteralExprs don't affect the sort order. TODO: consider removing the sort node
Do we have a test that covers the case when all ordering exprs are eliminated?


Line 263:         orderingExprsIter.set(materializedRef);
Is this needed? Is this handled by putting it in the substitution map?


http://gerrit.cloudera.org:8080/#/c/5914/4/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
File fe/src/main/java/org/apache/impala/planner/ExchangeNode.java:

Line 152:         if (mergeInfo_.getIsMaterialized().get(i)) {
Also mentioned on the design doc - I think we should only show this at a higher explain levels. Any maybe spell out "MATERIALIZED" so that it's less cryptic.


http://gerrit.cloudera.org:8080/#/c/5914/4/fe/src/main/java/org/apache/impala/planner/SortNode.java
File fe/src/main/java/org/apache/impala/planner/SortNode.java:

Line 187:         if (info_.getIsMaterialized().get(i)) {
See previous comment


http://gerrit.cloudera.org:8080/#/c/5914/4/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test:

Line 785: |  order by: id ASC MAT
I guess it's useful for these planner tests to show it, so maybe it is good to have it at this explain level. Would be interested what others think.


http://gerrit.cloudera.org:8080/#/c/5914/4/testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
File testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test:

Line 46: |  order by: int_col ASC NULLS FIRST, tinyint_col ASC NULLS FIRST
It's weird that there's no "MAT" here but there are some in the sort below - is that a bug? It seems like we should be showing all plain slotrefs as materialised.


Line 98: |  order by: int_col ASC
Same here


Line 454: |  order by: tinyint_col + 1 ASC NULLS FIRST, double_col / 2 ASC NULLS FIRST, 4 - int_col ASC, 4 * smallint_col ASC
Looks like the cost-based analysis is working here


Line 789: |  order by: double_col ASC NULLS FIRST, tinyint_col ASC NULLS FIRST, int_col ASC
It seems like we're only materialising things in the bottom-most sort for some reason.


http://gerrit.cloudera.org:8080/#/c/5914/4/testdata/workloads/functional-planner/queries/PlannerTest/order.test
File testdata/workloads/functional-planner/queries/PlannerTest/order.test:

Line 1245: ====
It would be good to also add some basic planner tests for the materialize_sort query option - just as as sanity check that it actually has the intended effect.


http://gerrit.cloudera.org:8080/#/c/5914/4/testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test
File testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test:

Line 1921: # Tests 'order by' on a non-deterministic function, return order isn't guaranteed.
It would also be good to test the materialize_sort option for analytics


http://gerrit.cloudera.org:8080/#/c/5914/4/testdata/workloads/functional-query/queries/QueryTest/sort.test
File testdata/workloads/functional-query/queries/QueryTest/sort.test:

Line 4146: select id from alltypestiny order by random()
It would be good to add a couple of tests for the materialize_sort query option to make sure that it produces the correct results.


Line 4150: select id from alltypestiny order by "AAA"
I think this answers my earlier question about test coverage.


http://gerrit.cloudera.org:8080/#/c/5914/4/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

PS4, Line 158: TestSort
TestRandomSort?


-- 
To view, visit http://gerrit.cloudera.org:8080/5914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5dcda32fc7770d42fc500ce87fc54d58e5b5dc00
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has abandoned this change.

Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
......................................................................


Abandoned

Still working out the design. Doc:
https://docs.google.com/document/d/1tBTOzMqW53w1WSYMnqGGyPWVp1hgZGRyQVDmXsHXtTY/edit?usp=sharing

-- 
To view, visit http://gerrit.cloudera.org:8080/5914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I5dcda32fc7770d42fc500ce87fc54d58e5b5dc00
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
......................................................................


Patch Set 3:

(1 comment)

A couple of notes on the latest review:

- It has not been tested for performance. That testing will determine the value of 'SORT_MATERIALIZATION_COST_THRESHOLD'
- I'm not super happy with the functional testing, but I'm having a hard time coming up with good tests.
- It doesn't include a query option, as I started looking into it and I think its going to be complicated, so I figured I would submit a follow up patch for it. There's also some design decisions here, which I've laid out in the design doc.

http://gerrit.cloudera.org:8080/#/c/5914/2/fe/src/main/java/org/apache/impala/analysis/SortInfo.java
File fe/src/main/java/org/apache/impala/analysis/SortInfo.java:

Line 173:    * result expressions. Those slot refs in the ordering and result exprs are substituted
> It's important to update this smap according to the new materialization you
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5dcda32fc7770d42fc500ce87fc54d58e5b5dc00
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4731: Crash when sorting on non-deterministic expr

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4731: Crash when sorting on non-deterministic expr
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5914/2/fe/src/main/java/org/apache/impala/analysis/SortInfo.java
File fe/src/main/java/org/apache/impala/analysis/SortInfo.java:

Line 173:     ExprSubstitutionMap substOrderBy = new ExprSubstitutionMap();
It's important to update this smap according to the new materialization you are doing below. I'd prefer to keep the original flow where we populate this smap and then substituteOrderingExprs().

For example, one case that I believe will not work as expected is:

select rand() r from t order by r

The reason is that we will not substitute the rand() from the select list with the materialized slot produced in the order by - but we should. Similar arguments apply for something like:

select my_expensive_expr() e from t order by e

We will redundantly evaluate my_expensive_expr in the select list again.


Line 186:     // TODO: it may be better for performance to not materialize some exprs that are
I think we should address this TODO while we are here.

We should materialize:
1. All known non-deterministic functions.
We can add an Expr.isDeterministic() function and implement that for FunctionCallExpr like we implement isConstant().

2. All UDFs. We can check whether an Expr is a UDF.

3. Exprs that are 'expensive' according to some cost threshold. If the cost is unknown, I'd say materialize.


-- 
To view, visit http://gerrit.cloudera.org:8080/5914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5dcda32fc7770d42fc500ce87fc54d58e5b5dc00
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes