You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Homer <lo...@gmail.com> on 2016/01/22 04:30:49 UTC

Scoping Question

Hi,

When using calcite to generate a plan I am trying to understand scope of 
the $n variables

I have a simple table

tab 3 table is a super simple table

i1 integer
i2 integer
t3 text
t4 text

if I do

select i1 as fred from tab3;

I get

LogicalProject(fred=[$0])
   LogicalTableScan(table=[[CATALOG, mapd, tab3]])

if I look at my catalog layout I see that i1 is the first column in the 
catalog for this table thus the $0. make sense.

If I give calcite a kind of silly query for testing and trying to 
understand the scoping of the result it expects me to execute it gets a 
little more confusing

select count(*) from tab3 as alias1 where (select max(alias2.i1) from 
tab3 alias2, tab3 as alias3 where alias2.i1 = alias3.i1) > alias1.i1;

the execution plan I get back looks like this:

LogicalAggregate(group=[{}], fred=[COUNT()])
   LogicalProject($f0=[0])
     LogicalFilter(condition=[>($5, $0)])
       LogicalJoin(condition=[true], joinType=[left])
         LogicalTableScan(table=[[CATALOG, mapd, tab3]])
         LogicalAggregate(group=[{}], EXPR$0=[MAX($0)])
           LogicalProject(i1=[$0])
             LogicalFilter(condition=[=($0, $5)])
               LogicalJoin(condition=[true], joinType=[inner])
                 LogicalTableScan(table=[[CATALOG, mapd, tab3]])
                 LogicalTableScan(table=[[CATALOG, mapd, tab3]])

My question is about understanding the mapping and scoping of the $n 
variables in the plan

The plan uses $0 and $5 in various different scopes here, initially in 
this part

           LogicalProject(i1=[$0])
             LogicalFilter(condition=[=($0, $5)])
               LogicalJoin(condition=[true], joinType=[inner])
                 LogicalTableScan(table=[[CATALOG, mapd, tab3]])
                 LogicalTableScan(table=[[CATALOG, mapd, tab3]])

I understand it as the two scans are of the same table, so $0 for 
alias2.i1 $5 for the second table alias3.i1.

we then move into my shakey ground

LogicalAggregate(group=[{}], fred=[COUNT()])
   LogicalProject($f0=[0])
     LogicalFilter(condition=[>($5, $0)])
       LogicalJoin(condition=[true], joinType=[left])
         LogicalTableScan(table=[[CATALOG, mapd, tab3]])
         LogicalAggregate(group=[{}], EXPR$0=[MAX($0)])


We see another join here referring to $5 and $0 but these are different 
$5 and $0.  In this scope I think $5 is referring to the MAX() 
projection from below and $0 is referring to alias1.i1

I am wondering what is the best way to consider the scoping of these $n 
placeholders.  What is the best way to think of this model?  At what 
point does $0 and $5 move to the new meaning?

thanks
Homer

Re: Scoping Question

Posted by Julian Hyde <jh...@apache.org>.

The scoping rule is very simple - a relational expression can only see its input relational expression(s). It can’t see the input of its input, or its sibling. So $n is the n’th column of the input.

For instance, the DEPTNO field might be $3 coming out of a Scan of the EMP table, would still be $3 after passing through a Filter, and might be $1 after passing through a Project.

A Join has two inputs, so you imagine the input columns laid side by side. If the left has 3 and the right has 4, then the valid references are $0, $1, $2 reading from the left input and $3, $4, $5, $6 reading from the right.

You need to know how each operator maps the fields in the input row(s) to an output row.
* Most operators (Filter, Sort, Exchange, SemiJoin), output a row that has the same fields as the input.
* The set operators (Union, Intersect, Minus) output a row that matches their first input (in fact they require that all inputs have the same row type).
* Join outputs the fields from the left input followed by the fields from the right input. Some fields become nullable if it is a left, right or full outer join. Correlate is similar to Join.
* The leaf operators, Scan and Values, don’t have an input but they just output their data.
* Project outputs one column for each expression.
* Aggregate outputs the group keys, followed optionally by the indicators, followed by the aggregate functions.

Expressions are also allowed to read correlating variables, called $cor0, $cor1 etc. which are assigned by a relational expression elsewhere in the graph. But they are an advanced topic that don’t crop up in many queries.

Julian

> On Jan 21, 2016, at 7:30 PM, Homer <lo...@gmail.com> wrote:
> 
> Hi,
> 
> When using calcite to generate a plan I am trying to understand scope of the $n variables
> 
> I have a simple table
> 
> tab 3 table is a super simple table
> 
> i1 integer
> i2 integer
> t3 text
> t4 text
> 
> if I do
> 
> select i1 as fred from tab3;
> 
> I get
> 
> LogicalProject(fred=[$0])
>  LogicalTableScan(table=[[CATALOG, mapd, tab3]])
> 
> if I look at my catalog layout I see that i1 is the first column in the catalog for this table thus the $0. make sense.
> 
> If I give calcite a kind of silly query for testing and trying to understand the scoping of the result it expects me to execute it gets a little more confusing
> 
> select count(*) from tab3 as alias1 where (select max(alias2.i1) from tab3 alias2, tab3 as alias3 where alias2.i1 = alias3.i1) > alias1.i1;
> 
> the execution plan I get back looks like this:
> 
> LogicalAggregate(group=[{}], fred=[COUNT()])
>  LogicalProject($f0=[0])
>    LogicalFilter(condition=[>($5, $0)])
>      LogicalJoin(condition=[true], joinType=[left])
>        LogicalTableScan(table=[[CATALOG, mapd, tab3]])
>        LogicalAggregate(group=[{}], EXPR$0=[MAX($0)])
>          LogicalProject(i1=[$0])
>            LogicalFilter(condition=[=($0, $5)])
>              LogicalJoin(condition=[true], joinType=[inner])
>                LogicalTableScan(table=[[CATALOG, mapd, tab3]])
>                LogicalTableScan(table=[[CATALOG, mapd, tab3]])
> 
> My question is about understanding the mapping and scoping of the $n variables in the plan
> 
> The plan uses $0 and $5 in various different scopes here, initially in this part
> 
>          LogicalProject(i1=[$0])
>            LogicalFilter(condition=[=($0, $5)])
>              LogicalJoin(condition=[true], joinType=[inner])
>                LogicalTableScan(table=[[CATALOG, mapd, tab3]])
>                LogicalTableScan(table=[[CATALOG, mapd, tab3]])
> 
> I understand it as the two scans are of the same table, so $0 for alias2.i1 $5 for the second table alias3.i1.
> 
> we then move into my shakey ground
> 
> LogicalAggregate(group=[{}], fred=[COUNT()])
>  LogicalProject($f0=[0])
>    LogicalFilter(condition=[>($5, $0)])
>      LogicalJoin(condition=[true], joinType=[left])
>        LogicalTableScan(table=[[CATALOG, mapd, tab3]])
>        LogicalAggregate(group=[{}], EXPR$0=[MAX($0)])
> 
> 
> We see another join here referring to $5 and $0 but these are different $5 and $0.  In this scope I think $5 is referring to the MAX() projection from below and $0 is referring to alias1.i1
> 
> I am wondering what is the best way to consider the scoping of these $n placeholders.  What is the best way to think of this model?  At what point does $0 and $5 move to the new meaning?
> 
> thanks
> Homer
> 
> 
> 
> 
> 
> 
>