You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Alex Behm (Code Review)" <ge...@cloudera.org> on 2017/04/03 22:10:37 UTC

[Impala-ASF-CR] IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs

Alex Behm has posted comments on this change.

Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
......................................................................


Patch Set 2:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 292:     // We currently don't have a way to indicate if a UDF is deterministic, so just always
> Not sure what you mean by 'two separate members". Maybe Expr.containsNonDet
Sorry for the confusion, let me try to clarify:

You added a new member Expr.isDeterministic_ which is set to false if the Expr is not deterministic or a UDF. I'm saying we should not mix the UDF and deterministic concepts. One way to do that is to have separate members like containsNonDeterministicBuiltin_ and containsUDF_. It might be even easier to add a Predicate in Expr like IS_EQ_BINARY_PREDICATE and then use Expr.contains() in the appropriate places.

I agree that FunctionCallExpr.isNondeterministicBuiltinFn() and Expr.isDeterministic() return consistent values, but that seems confusing to be (they return consistent values for non-builtins because we consider non-builtins to be non-deterministic)


http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/SortInfo.java
File fe/src/main/java/org/apache/impala/analysis/SortInfo.java:

Line 228:    * exprs that get materialized.
> There's already a lot in this review, and I wasn't sure how easy this would
Good to know that the sort will work. I agree we should tackle this later, probably easiest to just leave the literals alone for now.


http://gerrit.cloudera.org:8080/#/c/6322/2/fe/src/main/java/org/apache/impala/analysis/SortInfo.java
File fe/src/main/java/org/apache/impala/analysis/SortInfo.java:

Line 51:   // Original Exprs for any ordering exprs that are materialized. Populated in
Subset of ordering exprs that are materialized. Populated ...


Line 166:    * output by the sort node. Done by materializing slot refs in the order-by and given
update comment


Line 170:    * TODO: We could do something more sophisticated than simply copying input slot refs -
TODO is addressed


Line 174:       List<Expr> resultExprs, Analyzer analyzer) {
do these fit into the previous line?


Line 180:     // substOrderBy is a mapping from slot refs in the sort node's input and ordering
Comment is confusing because of the original 'sort node's input' which is not clear.

I suggest rephrasing to be more explicit about what the left-hand side of the smap contains:
1. materialized ordering Exprs
2. SlotRefs of non-materialized ordering Exprs
3. SlotRefs of resultExprs after substituting materialized ordering exprs

I think point (3) is not yet reflected in the code, but we should do it imo.

Example:

select concat(a, b, c, d) as x from t order by x

The current code will materialize concat(), but also the SlotRefs a, b, c, d.


Line 192:     List<Expr> nonmaterializedOrderingExprs = Lists.newArrayList(orderingExprs_);
nonMaterializedOrderingExprs


Line 204:     // The ordering exprs are still the original exprs and still point to the old slot
Suggest simplifying comment to something like:

The ordering exprs are evaluated against the sort tuple, so must reflect our materialization decision above.


Line 218:    * threshold, or don't have a cost set, by creating SlotRefs for them with
shorter: by creating slots for them in 'sortTupleDesc'

(we are adding slots to the tuple, not SlotRefs)


Line 230:   public void createMaterializedOrderExprs(
Considering all my comments here and elsewhere I think the simplest interface is:

public ExprSubstitutionMap createMaterializedOrderExprs(TupleDescriptor sortTupleDesc, Analyzer analyzer)


Line 240:       if (origOrderingExpr instanceof LiteralExpr) {
Thinking about this again: This behavior complicates the code and function comment above, so I think it would be better to leave literals alone for now.

This is only a partial solution to the literals problem anyway. Consider something like:

select * from (select 1 as col from t) x order by col

We'll ultimately still sort on a literal, so maybe this is not the right place for this check.

Even further, I think this solution will not work through inline views at all, e.g.:

select * from (select uuid() as col from t) x order by col

Let's think about how to address this problem and whether we need to do it now or defer.


http://gerrit.cloudera.org:8080/#/c/6322/2/testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test
File testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test:

Line 1: # expensive order exprs get materialized, cheap ones don't
Tests should ideally cover the 3 cases:
1. non-deterministic exprs
2. exprs above/below the cost threshold
3. exprs with an unknown cost (do we have these or are they a "bug"?)


Line 102: ---- DISTRIBUTEDPLAN
no need for this


http://gerrit.cloudera.org:8080/#/c/6322/1/testdata/workloads/functional-planner/queries/PlannerTest/sort-materialization.test
File testdata/workloads/functional-planner/queries/PlannerTest/sort-materialization.test:

Line 19
> I'm not sure how to make that work with PlannerTest, but I added a test to 
I think you should be able to add a function using FrontendTestBase.addTestFunction() for a planner test.

You are right about the UDA, not sure what I was thinking.


-- 
To view, visit http://gerrit.cloudera.org:8080/6322
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes