You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Johannes Schulte <jo...@gmail.com> on 2016/08/24 21:56:09 UTC

Re: Filters in Subselect

Julian,

do you have some hints on how to debug this more fruitfully? I spent some
time to figure out whats happening and saw that the always-true
joinCondition is created in

org.apache.calcite.rel.rules.SubQueryRemoveRule.apply(RexSubQuery,
Set<CorrelationId>, Logic, RelBuilder, int, int)

when the rule is fired. The push of the project onto the table scan seems
to happen with the ProjectTableScanRule in the simple case. In the subquery
case, this rule is also fired but the successor doesn't seem to be picked
up by later rules (Cant find the rel#70 in any subsequent output)

2016-07-16 09:54:05,516 [main] DEBUG - Transform to: rel#70 via
ProjectScanRule:interpreter
2016-07-16 09:54:05,517 [main] DEBUG - call#487 generated 1 successors:
[rel#70:BindableTableScan.BINDABLE.[](table=[ourSchema,
ourTable],projects=[0])]
2016-07-16 09:54:05,518 [main] DEBUG - PLANNER = org.apache.calcite.plan.
volcano.VolcanoPlanner@31d0e481; TICK = 35/33; PHASE = OPTIMIZE; COST =
{1883.767018598809 rows, 1402.02 cpu, 0.0 io}
2016-07-16 09:54:05,518 [main] DEBUG - Pop match: rule
[EnumerableInterpreterRule] rels
[rel#60:BindableTableScan.BINDABLE.[](table=[ourSchema,
ourTable])]

Is this related to some cost associated with the result?

Thanks,

Johannes


On Sat, Jul 16, 2016 at 10:00 AM, Atri Sharma <at...@apache.org> wrote:

> +1 on that
>
> On 16 Jul 2016 5:57 a.m., "Julian Hyde" <jh...@apache.org> wrote:
>
> > By the way, I'd like one day to be able to optimize
> >
> >   select * from t
> >   where y = (select max(y) from t)
> >
> > to
> >
> >   select * from t order by y desc limit 1
> >
> > if y is unique and not null, or something a little more complex if it
> > is not unique. I don't know whether this optimization would help all
> > of the cases you are looking at. I logged
> > https://issues.apache.org/jira/browse/CALCITE-1317.
> >
> > Julian
> >
> > On Fri, Jul 15, 2016 at 5:04 PM, Julian Hyde <jh...@apache.org> wrote:
> > > Can you turn on tracing and post the final planner state? (If you
> > > like, log a JIRA case, and attach the output if it is large.)
> > >
> > > I would have expected there to be a Project immediately above the
> > > BindableTableScan. (Just as in the first query, there was almost
> > > certainly a Project above a TableScan, and then the projects were
> > > correctly pushed into the TableScan.)
> > >
> > > Why is that Project not present? Maybe it is present, but the cost is
> > > wrong. Or maybe the Project was not pushed through the Join (the fact
> > > that the join condition is [true] is a worrying sign).
> > >
> > > The planner state should help answer these questions.
> > >
> > > Julian
> > >
> > >
> > > On Fri, Jul 15, 2016 at 12:32 PM, Johannes Schulte
> > > <jo...@gmail.com> wrote:
> > >> Hi,
> > >>
> > >> I am trying to build something similar to drills way of querying
> > directly
> > >> on multiple files, without the columnar layer. I am using the
> > >> ProjectableFilterableTable to filter out files not part of the query.
> > No i
> > >> got some problems and I hope i can express what I want to say:
> > >>
> > >>
> > >> select max(path) from y
> > >> allows for only checking the path column since it is projected
> > >>
> > >> [EnumerableAggregate(group=[{}], EXPR$0=[MAX($0)])
> > >>   EnumerableInterpreter
> > >>     BindableTableScan(table=[[ourSchema, ourTable]], projects=[[0]])
> > >> ]
> > >>
> > >> but when trying to query the maximum path and using it as an filter,
> the
> > >> projects are gone
> > >>
> > >> select * from y where path = (select max(path) from y).
> > >>
> > >> The explain plan then looks like this:
> > >>
> > >> [EnumerableCalc(expr#0..2=[{inputs}], expr#3=[CAST($t0):VARCHAR(1)
> > >> CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"],
> > >> expr#4=[CAST($t2):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE
> > >> "ISO-8859-1$en_US$primary"], expr#5=[=($t3, $t4)],
> proj#0..1=[{exprs}],
> > >> $condition=[$t5])
> > >>   EnumerableJoin(condition=[true], joinType=[left])
> > >>     EnumerableInterpreter
> > >>       BindableTableScan(table=[[ourSchema, ourTable]])
> > >>     EnumerableAggregate(group=[{}], EXPR$0=[MAX($0)])
> > >>       EnumerableInterpreter
> > >>         BindableTableScan(table=[[ourSchema, ourTable]])
> > >> ]]
> > >>
> > >> Is there a way to alter the query so ProjectableFilterableTable can
> > still
> > >> be used? Or is the only way for getting this a translatable table?
> > >>
> > >> Thanks so far,
> > >>
> > >> Johannes
> >
>