You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Homer <lo...@gmail.com> on 2016/01/22 04:30:49 UTC
Scoping Question
Hi,
When using calcite to generate a plan I am trying to understand scope of
the $n variables
I have a simple table
tab 3 table is a super simple table
i1 integer
i2 integer
t3 text
t4 text
if I do
select i1 as fred from tab3;
I get
LogicalProject(fred=[$0])
LogicalTableScan(table=[[CATALOG, mapd, tab3]])
if I look at my catalog layout I see that i1 is the first column in the
catalog for this table thus the $0. make sense.
If I give calcite a kind of silly query for testing and trying to
understand the scoping of the result it expects me to execute it gets a
little more confusing
select count(*) from tab3 as alias1 where (select max(alias2.i1) from
tab3 alias2, tab3 as alias3 where alias2.i1 = alias3.i1) > alias1.i1;
the execution plan I get back looks like this:
LogicalAggregate(group=[{}], fred=[COUNT()])
LogicalProject($f0=[0])
LogicalFilter(condition=[>($5, $0)])
LogicalJoin(condition=[true], joinType=[left])
LogicalTableScan(table=[[CATALOG, mapd, tab3]])
LogicalAggregate(group=[{}], EXPR$0=[MAX($0)])
LogicalProject(i1=[$0])
LogicalFilter(condition=[=($0, $5)])
LogicalJoin(condition=[true], joinType=[inner])
LogicalTableScan(table=[[CATALOG, mapd, tab3]])
LogicalTableScan(table=[[CATALOG, mapd, tab3]])
My question is about understanding the mapping and scoping of the $n
variables in the plan
The plan uses $0 and $5 in various different scopes here, initially in
this part
LogicalProject(i1=[$0])
LogicalFilter(condition=[=($0, $5)])
LogicalJoin(condition=[true], joinType=[inner])
LogicalTableScan(table=[[CATALOG, mapd, tab3]])
LogicalTableScan(table=[[CATALOG, mapd, tab3]])
I understand it as the two scans are of the same table, so $0 for
alias2.i1 $5 for the second table alias3.i1.
we then move into my shakey ground
LogicalAggregate(group=[{}], fred=[COUNT()])
LogicalProject($f0=[0])
LogicalFilter(condition=[>($5, $0)])
LogicalJoin(condition=[true], joinType=[left])
LogicalTableScan(table=[[CATALOG, mapd, tab3]])
LogicalAggregate(group=[{}], EXPR$0=[MAX($0)])
We see another join here referring to $5 and $0 but these are different
$5 and $0. In this scope I think $5 is referring to the MAX()
projection from below and $0 is referring to alias1.i1
I am wondering what is the best way to consider the scoping of these $n
placeholders. What is the best way to think of this model? At what
point does $0 and $5 move to the new meaning?
thanks
Homer
Re: Scoping Question
Posted by Julian Hyde <jh...@apache.org>.
The scoping rule is very simple - a relational expression can only see its input relational expression(s). It can’t see the input of its input, or its sibling. So $n is the n’th column of the input.
For instance, the DEPTNO field might be $3 coming out of a Scan of the EMP table, would still be $3 after passing through a Filter, and might be $1 after passing through a Project.
A Join has two inputs, so you imagine the input columns laid side by side. If the left has 3 and the right has 4, then the valid references are $0, $1, $2 reading from the left input and $3, $4, $5, $6 reading from the right.
You need to know how each operator maps the fields in the input row(s) to an output row.
* Most operators (Filter, Sort, Exchange, SemiJoin), output a row that has the same fields as the input.
* The set operators (Union, Intersect, Minus) output a row that matches their first input (in fact they require that all inputs have the same row type).
* Join outputs the fields from the left input followed by the fields from the right input. Some fields become nullable if it is a left, right or full outer join. Correlate is similar to Join.
* The leaf operators, Scan and Values, don’t have an input but they just output their data.
* Project outputs one column for each expression.
* Aggregate outputs the group keys, followed optionally by the indicators, followed by the aggregate functions.
Expressions are also allowed to read correlating variables, called $cor0, $cor1 etc. which are assigned by a relational expression elsewhere in the graph. But they are an advanced topic that don’t crop up in many queries.
Julian
> On Jan 21, 2016, at 7:30 PM, Homer <lo...@gmail.com> wrote:
>
> Hi,
>
> When using calcite to generate a plan I am trying to understand scope of the $n variables
>
> I have a simple table
>
> tab 3 table is a super simple table
>
> i1 integer
> i2 integer
> t3 text
> t4 text
>
> if I do
>
> select i1 as fred from tab3;
>
> I get
>
> LogicalProject(fred=[$0])
> LogicalTableScan(table=[[CATALOG, mapd, tab3]])
>
> if I look at my catalog layout I see that i1 is the first column in the catalog for this table thus the $0. make sense.
>
> If I give calcite a kind of silly query for testing and trying to understand the scoping of the result it expects me to execute it gets a little more confusing
>
> select count(*) from tab3 as alias1 where (select max(alias2.i1) from tab3 alias2, tab3 as alias3 where alias2.i1 = alias3.i1) > alias1.i1;
>
> the execution plan I get back looks like this:
>
> LogicalAggregate(group=[{}], fred=[COUNT()])
> LogicalProject($f0=[0])
> LogicalFilter(condition=[>($5, $0)])
> LogicalJoin(condition=[true], joinType=[left])
> LogicalTableScan(table=[[CATALOG, mapd, tab3]])
> LogicalAggregate(group=[{}], EXPR$0=[MAX($0)])
> LogicalProject(i1=[$0])
> LogicalFilter(condition=[=($0, $5)])
> LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, mapd, tab3]])
> LogicalTableScan(table=[[CATALOG, mapd, tab3]])
>
> My question is about understanding the mapping and scoping of the $n variables in the plan
>
> The plan uses $0 and $5 in various different scopes here, initially in this part
>
> LogicalProject(i1=[$0])
> LogicalFilter(condition=[=($0, $5)])
> LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, mapd, tab3]])
> LogicalTableScan(table=[[CATALOG, mapd, tab3]])
>
> I understand it as the two scans are of the same table, so $0 for alias2.i1 $5 for the second table alias3.i1.
>
> we then move into my shakey ground
>
> LogicalAggregate(group=[{}], fred=[COUNT()])
> LogicalProject($f0=[0])
> LogicalFilter(condition=[>($5, $0)])
> LogicalJoin(condition=[true], joinType=[left])
> LogicalTableScan(table=[[CATALOG, mapd, tab3]])
> LogicalAggregate(group=[{}], EXPR$0=[MAX($0)])
>
>
> We see another join here referring to $5 and $0 but these are different $5 and $0. In this scope I think $5 is referring to the MAX() projection from below and $0 is referring to alias1.i1
>
> I am wondering what is the best way to consider the scoping of these $n placeholders. What is the best way to think of this model? At what point does $0 and $5 move to the new meaning?
>
> thanks
> Homer
>
>
>
>
>
>
>