You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Kenneth Knowles <kl...@google.com.INVALID> on 2018/06/20 20:10:54 UTC

Re: Best practice for exhaustive planning

Hi all,

Bumping this again because I'd like to be quite sure the answer is "Calcite
doesn't support this". For example, I'd like to reject full cartesian
joins. Currently, all joins can be converted to Beam convention and then
there's some logic later to complain about cross joins. I would prefer to
do this in the rule set, making a cross join just not convertible to Beam
convention, to incentivize finding other plans, but still give a user a
good error message.

What do people actually do in this situation? Possibilities: (a) scrape the
syntax before planning, missing opportunities where a transformation might
end up with a viable plan (b) make an "ErrorRel" with impossibly high cost
so it will only be chosen as the last resort, somewhat like yacc error
productions, could be hard to get a decent error message. I don't like
these options, particularly.

Kenn

On Wed, May 30, 2018 at 6:10 AM Michael Mior <mm...@apache.org> wrote:

> Unfortunately, I'm not sure of the best way how to proceed from here, but
> it seems like you're making progress :)
> --
> Michael Mior
> mmior@apache.org
>
>
>
> Le mar. 29 mai 2018 à 18:29, Kenneth Knowles <kl...@google.com.invalid> a
> écrit :
>
> > Thanks Michael,
> >
> > I don't think that applies in our case - we aren't doing a table scan and
> > having Calcite implement the rest, but are translating the whole plan to
> a
> > Beam pipeline to run on e.g. Flink, Spark, Dataflow.
> >
> > Here's an example:
> >
> >     SELECT * FROM UNNEST (ARRAY ['a', 'b', 'c'])
> >
> > With logical plan:
> >
> >     LogicalProject(EXPR$0=[$0])
> >       Uncollect
> >         LogicalProject(EXPR$0=[ARRAY('a', 'b', 'c')])
> >           LogicalValues(tuples=[[{ 0 }]])
> >
> > And the planner dumps "could not be implemented" when going for Beam's
> > calling convention. So I implement a rel & a rule.
> >
> > Then there's the corellated version exploding an array field from a
> table:
> >
> >     SELECT f_int, arrElems.f_string FROM main CROSS JOIN UNNEST
> > (main.f_stringArr) AS arrElems(f_string)
> >
> > With logical plan:
> >
> >     LogicalProject(f_int=[$0], f_string=[$2])
> >       LogicalCorrelate(correlation=[$cor0], joinType=[inner],
> > requiredColumns=[{1}])
> >         BeamIOSourceRel(table=[[beam, main]])
> >         Uncollect
> >           LogicalProject(f_stringArr=[$cor0.f_stringArr_1])
> >             LogicalValues(tuples=[[{ 0 }]])
> >
> > I hacked something together to support this, too. I did not fully
> implement
> > Correlate; I would love to reject unsupported things in a meaningful
> way. I
> > would like to have confidence that there are not other permutations of
> > logical plans that we missed. For example for joins we match all joins
> and
> > translate them, then throw an error at a later stage.
> >
> > Incidentally, when I ran the decorrelation [1] it appeared to have no
> > effect. We probably want to implement it directly in Beam anyhow in this
> > case.
> >
> > Kenn
> >
> > [1]
> >
> >
> https://calcite.apache.org/apidocs/org/apache/calcite/sql2rel/SqlToRelConverter.html#decorrelate-org.apache.calcite.sql.SqlNode-org.apache.calcite.rel.RelNode-
> >
> > On Tue, May 22, 2018 at 6:39 PM Michael Mior <mm...@uwaterloo.ca> wrote:
> >
> > > For most queries, the only thing you should need to implement is a scan
> > and
> > > the rest can usually be implemented by Calcite. It would be good if you
> > > have a specific example of a query that fails.
> > >
> > > --
> > > Michael Mior
> > > mmior@uwaterloo.ca
> > >
> > >
> > > Le mar. 22 mai 2018 à 19:01, Kenneth Knowles <kl...@google.com.invalid>
> a
> > > écrit :
> > >
> > > > Bumping this, as it ended up in spam for some people.
> > > >
> > > > Kenn
> > > >
> > > > On Tue, May 15, 2018 at 9:26 AM Kenneth Knowles <kl...@google.com>
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Beam SQL uses Calcite for parsing and (naive) planning. Currently
> it
> > is
> > > > > pretty easy to write a SQL query that parses and causes a "could
> not
> > > > plan"
> > > > > dump when we ask the planner to convert things to the Beam calling
> > > > > convention. One current example is using UNNEST on a column to
> yield
> > a
> > > > > LogicalCorrelate + Uncollect.
> > > > >
> > > > > There may obviously always be bits we don't support, but we'd like
> to
> > > > > ensure that the user encounters a careful error message rather
> than a
> > > > > planner dump. Is there a best practice for ensuring that we have
> > > covered
> > > > > all the cases? Is it just "everything name Logical*" or is there
> > > > something
> > > > > more clever?
> > > > >
> > > > > And if this question demonstrates that we are using Calcite
> entirely
> > > > > wrong, let us know :-)
> > > > >
> > > > > Kenn
> > > > >
> > > >
> > >
> >
>

Re: Best practice for exhaustive planning

Posted by Kenneth Knowles <kl...@google.com.INVALID>.
Hi Jacques,

Thanks for the tip. Your example is exactly what I am dealing with right
now. Blocking conversion works and makes sure more queries run, but for the
queries that don't work it changes the failure mode from "Beam doesn't
support cartesian join" to "CannotPlanException... <not user friendly dump
of planner state>".

I will try your suggestion of examining the plan in such a situation. Any
example code that does this?

Kenn

On Mon, Jun 25, 2018 at 4:45 PM Jacques Nadeau <ja...@apache.org> wrote:

> My advice: block the transformation to a particular convention. Then, if
> you get cannot plan, example the plan to determine if there are specific
> problematic patterns. If there are, do a best guess of the particular
> reason and return to user. This covers situations additional situations
> that wouldn't work in syntax scraping, such as when a user writes this
> query:
>
> select * from a,b
> where a.id = b.id
>
> In this case, with the correct rules, this will get planned. However, a SQL
> scrape would have said this was an invalid cartesian join potentially.
>
>
>
> On Wed, Jun 20, 2018 at 1:10 PM, Kenneth Knowles <kl...@google.com.invalid>
> wrote:
>
> > Hi all,
> >
> > Bumping this again because I'd like to be quite sure the answer is
> "Calcite
> > doesn't support this". For example, I'd like to reject full cartesian
> > joins. Currently, all joins can be converted to Beam convention and then
> > there's some logic later to complain about cross joins. I would prefer to
> > do this in the rule set, making a cross join just not convertible to Beam
> > convention, to incentivize finding other plans, but still give a user a
> > good error message.
> >
> > What do people actually do in this situation? Possibilities: (a) scrape
> the
> > syntax before planning, missing opportunities where a transformation
> might
> > end up with a viable plan (b) make an "ErrorRel" with impossibly high
> cost
> > so it will only be chosen as the last resort, somewhat like yacc error
> > productions, could be hard to get a decent error message. I don't like
> > these options, particularly.
> >
> > Kenn
> >
> > On Wed, May 30, 2018 at 6:10 AM Michael Mior <mm...@apache.org> wrote:
> >
> > > Unfortunately, I'm not sure of the best way how to proceed from here,
> but
> > > it seems like you're making progress :)
> > > --
> > > Michael Mior
> > > mmior@apache.org
> > >
> > >
> > >
> > > Le mar. 29 mai 2018 à 18:29, Kenneth Knowles <kl...@google.com.invalid>
> a
> > > écrit :
> > >
> > > > Thanks Michael,
> > > >
> > > > I don't think that applies in our case - we aren't doing a table scan
> > and
> > > > having Calcite implement the rest, but are translating the whole plan
> > to
> > > a
> > > > Beam pipeline to run on e.g. Flink, Spark, Dataflow.
> > > >
> > > > Here's an example:
> > > >
> > > >     SELECT * FROM UNNEST (ARRAY ['a', 'b', 'c'])
> > > >
> > > > With logical plan:
> > > >
> > > >     LogicalProject(EXPR$0=[$0])
> > > >       Uncollect
> > > >         LogicalProject(EXPR$0=[ARRAY('a', 'b', 'c')])
> > > >           LogicalValues(tuples=[[{ 0 }]])
> > > >
> > > > And the planner dumps "could not be implemented" when going for
> Beam's
> > > > calling convention. So I implement a rel & a rule.
> > > >
> > > > Then there's the corellated version exploding an array field from a
> > > table:
> > > >
> > > >     SELECT f_int, arrElems.f_string FROM main CROSS JOIN UNNEST
> > > > (main.f_stringArr) AS arrElems(f_string)
> > > >
> > > > With logical plan:
> > > >
> > > >     LogicalProject(f_int=[$0], f_string=[$2])
> > > >       LogicalCorrelate(correlation=[$cor0], joinType=[inner],
> > > > requiredColumns=[{1}])
> > > >         BeamIOSourceRel(table=[[beam, main]])
> > > >         Uncollect
> > > >           LogicalProject(f_stringArr=[$cor0.f_stringArr_1])
> > > >             LogicalValues(tuples=[[{ 0 }]])
> > > >
> > > > I hacked something together to support this, too. I did not fully
> > > implement
> > > > Correlate; I would love to reject unsupported things in a meaningful
> > > way. I
> > > > would like to have confidence that there are not other permutations
> of
> > > > logical plans that we missed. For example for joins we match all
> joins
> > > and
> > > > translate them, then throw an error at a later stage.
> > > >
> > > > Incidentally, when I ran the decorrelation [1] it appeared to have no
> > > > effect. We probably want to implement it directly in Beam anyhow in
> > this
> > > > case.
> > > >
> > > > Kenn
> > > >
> > > > [1]
> > > >
> > > >
> > > https://calcite.apache.org/apidocs/org/apache/calcite/
> > sql2rel/SqlToRelConverter.html#decorrelate-org.apache.
> > calcite.sql.SqlNode-org.apache.calcite.rel.RelNode-
> > > >
> > > > On Tue, May 22, 2018 at 6:39 PM Michael Mior <mm...@uwaterloo.ca>
> > wrote:
> > > >
> > > > > For most queries, the only thing you should need to implement is a
> > scan
> > > > and
> > > > > the rest can usually be implemented by Calcite. It would be good if
> > you
> > > > > have a specific example of a query that fails.
> > > > >
> > > > > --
> > > > > Michael Mior
> > > > > mmior@uwaterloo.ca
> > > > >
> > > > >
> > > > > Le mar. 22 mai 2018 à 19:01, Kenneth Knowles
> <klk@google.com.invalid
> > >
> > > a
> > > > > écrit :
> > > > >
> > > > > > Bumping this, as it ended up in spam for some people.
> > > > > >
> > > > > > Kenn
> > > > > >
> > > > > > On Tue, May 15, 2018 at 9:26 AM Kenneth Knowles <kl...@google.com>
> > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > Beam SQL uses Calcite for parsing and (naive) planning.
> Currently
> > > it
> > > > is
> > > > > > > pretty easy to write a SQL query that parses and causes a
> "could
> > > not
> > > > > > plan"
> > > > > > > dump when we ask the planner to convert things to the Beam
> > calling
> > > > > > > convention. One current example is using UNNEST on a column to
> > > yield
> > > > a
> > > > > > > LogicalCorrelate + Uncollect.
> > > > > > >
> > > > > > > There may obviously always be bits we don't support, but we'd
> > like
> > > to
> > > > > > > ensure that the user encounters a careful error message rather
> > > than a
> > > > > > > planner dump. Is there a best practice for ensuring that we
> have
> > > > > covered
> > > > > > > all the cases? Is it just "everything name Logical*" or is
> there
> > > > > > something
> > > > > > > more clever?
> > > > > > >
> > > > > > > And if this question demonstrates that we are using Calcite
> > > entirely
> > > > > > > wrong, let us know :-)
> > > > > > >
> > > > > > > Kenn
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Best practice for exhaustive planning

Posted by Jacques Nadeau <ja...@apache.org>.
My advice: block the transformation to a particular convention. Then, if
you get cannot plan, example the plan to determine if there are specific
problematic patterns. If there are, do a best guess of the particular
reason and return to user. This covers situations additional situations
that wouldn't work in syntax scraping, such as when a user writes this
query:

select * from a,b
where a.id = b.id

In this case, with the correct rules, this will get planned. However, a SQL
scrape would have said this was an invalid cartesian join potentially.



On Wed, Jun 20, 2018 at 1:10 PM, Kenneth Knowles <kl...@google.com.invalid>
wrote:

> Hi all,
>
> Bumping this again because I'd like to be quite sure the answer is "Calcite
> doesn't support this". For example, I'd like to reject full cartesian
> joins. Currently, all joins can be converted to Beam convention and then
> there's some logic later to complain about cross joins. I would prefer to
> do this in the rule set, making a cross join just not convertible to Beam
> convention, to incentivize finding other plans, but still give a user a
> good error message.
>
> What do people actually do in this situation? Possibilities: (a) scrape the
> syntax before planning, missing opportunities where a transformation might
> end up with a viable plan (b) make an "ErrorRel" with impossibly high cost
> so it will only be chosen as the last resort, somewhat like yacc error
> productions, could be hard to get a decent error message. I don't like
> these options, particularly.
>
> Kenn
>
> On Wed, May 30, 2018 at 6:10 AM Michael Mior <mm...@apache.org> wrote:
>
> > Unfortunately, I'm not sure of the best way how to proceed from here, but
> > it seems like you're making progress :)
> > --
> > Michael Mior
> > mmior@apache.org
> >
> >
> >
> > Le mar. 29 mai 2018 à 18:29, Kenneth Knowles <kl...@google.com.invalid> a
> > écrit :
> >
> > > Thanks Michael,
> > >
> > > I don't think that applies in our case - we aren't doing a table scan
> and
> > > having Calcite implement the rest, but are translating the whole plan
> to
> > a
> > > Beam pipeline to run on e.g. Flink, Spark, Dataflow.
> > >
> > > Here's an example:
> > >
> > >     SELECT * FROM UNNEST (ARRAY ['a', 'b', 'c'])
> > >
> > > With logical plan:
> > >
> > >     LogicalProject(EXPR$0=[$0])
> > >       Uncollect
> > >         LogicalProject(EXPR$0=[ARRAY('a', 'b', 'c')])
> > >           LogicalValues(tuples=[[{ 0 }]])
> > >
> > > And the planner dumps "could not be implemented" when going for Beam's
> > > calling convention. So I implement a rel & a rule.
> > >
> > > Then there's the corellated version exploding an array field from a
> > table:
> > >
> > >     SELECT f_int, arrElems.f_string FROM main CROSS JOIN UNNEST
> > > (main.f_stringArr) AS arrElems(f_string)
> > >
> > > With logical plan:
> > >
> > >     LogicalProject(f_int=[$0], f_string=[$2])
> > >       LogicalCorrelate(correlation=[$cor0], joinType=[inner],
> > > requiredColumns=[{1}])
> > >         BeamIOSourceRel(table=[[beam, main]])
> > >         Uncollect
> > >           LogicalProject(f_stringArr=[$cor0.f_stringArr_1])
> > >             LogicalValues(tuples=[[{ 0 }]])
> > >
> > > I hacked something together to support this, too. I did not fully
> > implement
> > > Correlate; I would love to reject unsupported things in a meaningful
> > way. I
> > > would like to have confidence that there are not other permutations of
> > > logical plans that we missed. For example for joins we match all joins
> > and
> > > translate them, then throw an error at a later stage.
> > >
> > > Incidentally, when I ran the decorrelation [1] it appeared to have no
> > > effect. We probably want to implement it directly in Beam anyhow in
> this
> > > case.
> > >
> > > Kenn
> > >
> > > [1]
> > >
> > >
> > https://calcite.apache.org/apidocs/org/apache/calcite/
> sql2rel/SqlToRelConverter.html#decorrelate-org.apache.
> calcite.sql.SqlNode-org.apache.calcite.rel.RelNode-
> > >
> > > On Tue, May 22, 2018 at 6:39 PM Michael Mior <mm...@uwaterloo.ca>
> wrote:
> > >
> > > > For most queries, the only thing you should need to implement is a
> scan
> > > and
> > > > the rest can usually be implemented by Calcite. It would be good if
> you
> > > > have a specific example of a query that fails.
> > > >
> > > > --
> > > > Michael Mior
> > > > mmior@uwaterloo.ca
> > > >
> > > >
> > > > Le mar. 22 mai 2018 à 19:01, Kenneth Knowles <klk@google.com.invalid
> >
> > a
> > > > écrit :
> > > >
> > > > > Bumping this, as it ended up in spam for some people.
> > > > >
> > > > > Kenn
> > > > >
> > > > > On Tue, May 15, 2018 at 9:26 AM Kenneth Knowles <kl...@google.com>
> > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Beam SQL uses Calcite for parsing and (naive) planning. Currently
> > it
> > > is
> > > > > > pretty easy to write a SQL query that parses and causes a "could
> > not
> > > > > plan"
> > > > > > dump when we ask the planner to convert things to the Beam
> calling
> > > > > > convention. One current example is using UNNEST on a column to
> > yield
> > > a
> > > > > > LogicalCorrelate + Uncollect.
> > > > > >
> > > > > > There may obviously always be bits we don't support, but we'd
> like
> > to
> > > > > > ensure that the user encounters a careful error message rather
> > than a
> > > > > > planner dump. Is there a best practice for ensuring that we have
> > > > covered
> > > > > > all the cases? Is it just "everything name Logical*" or is there
> > > > > something
> > > > > > more clever?
> > > > > >
> > > > > > And if this question demonstrates that we are using Calcite
> > entirely
> > > > > > wrong, let us know :-)
> > > > > >
> > > > > > Kenn
> > > > > >
> > > > >
> > > >
> > >
> >
>