You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Gavin Ray <ra...@gmail.com> on 2022/03/22 21:10:21 UTC

A basic, DB-agnostic question about the concept of a "Physical Plan"

I'm on my second pass of the book "How Query Engines Work" by Arrow's own
Andy Grove
(Really great read, huge recommendation:
https://leanpub.com/how-query-engines-work)

Something I'm not sure I'm fully understanding is what qualifies something
as a Physical Plan
A logical plan is straightforward, say I have an expression, "Select name
from users where ID is less than 5"

Then I can represent this as an abstract, logical operation like:

LogicalPlan(
  project = ["name"],
  filter = Filter(LessThan(Column("id"), Literal(5)))
)

Now say I want to give this plan to a database and have it run it.
I need to write an implementation for translating this to an executable
expression
(Probably SQL)

Is the class that implements the translation to SQL that gets executed the
Physical Plan
Or is there no Physical Plan, and that's the database's job to figure out?

Re: A basic, DB-agnostic question about the concept of a "Physical Plan"

Posted by Gavin Ray <ra...@gmail.com>.

Ahhh, thanks all
So if I'm understanding right, a "Physical Plan" would be something like
this?

JdbcToEnumerableConverter
  JdbcProject(fields=["name"])
    JdbcFilter(condition=[<("id", 5)])
      JdbcTableScan(table=[["user"]])

Where this is a plan that says to "physically" use a JDBC source
for the underlying operations

And the translation of the Logical Plan -> SQL and execution against
the DB would be an IMPLEMENTATION DETAIL of these Physical Plan operators?

Something like this?

class JdbcToEnumerableConverter {
    public Result implement() {
        // Visit child expressions, produce SQL,
        // execute against JDBC source, return Enumerable of rows
    }
}




On Wed, Mar 23, 2022 at 7:31 AM Justin Swanhart <gr...@gmail.com> wrote:

> Hi,
>
> Generally a cost-based optimizer chooses physical plans (sometimes with
> help from a rules-based optimizer).  Hints (/*+USE_NL*/ /*+HASH_JOIN*/) etc
> (depends on RDBMS) generally allow the user to override the optimizer and
> choose a physical plan that differs from what the database would pick.
>
> On Wed, Mar 23, 2022 at 8:19 AM Charles Givre <cg...@gmail.com> wrote:
>
> > Hi Gavin,
> > Thanks for the book recommendation.  That looks really solid and I'm
> > definitely going to pick up a copy.  To continue the conversation about
> > physical plans, Drill does allow you to view the logical and physical
> plans
> > from a query AND modify them, or submit your own.  Here's a doc link that
> > explains how: https://drill.apache.org/docs/query-plans/ <
> > https://drill.apache.org/docs/query-plans/>
> > Best,
> > -- C
> >
> > > On Mar 23, 2022, at 5:26 AM, Alessandro Solimando <
> > alessandro.solimando@gmail.com> wrote:
> > >
> > > Hi Gavin,
> > > in a nutshell, a logical plan consists of logical operators (say, a
> > join),
> > > which can be implemented in several ways at the physical level (say,
> > merge
> > > join, hash join, etc.), and are therefore associated with some
> > > corresponding physical operators.
> > >
> > > How the logical vs physical planning is performed depends on the
> > > optimization framework you use.
> > >
> > > SQL is not a plan, it's a declarative language, it only dictates "what"
> > you
> > > will be getting, the logical plan sketches how you will get it, the
> > > physical plan fills the missing details.
> > > Then there is also query compilation which translates to "real
> > > instructions" (be it machine code, a DAG for an execution engine like
> for
> > > Spark or Hive, etc.).
> > >
> > > In federated queries, you will plan and split the work across different
> > > databases, you will pass them either SQL or some native query language
> > (CQL
> > > for Cassandra, for example), but then the DB will do all the steps
> again,
> > > parse, validate, etc., build a logical plan, optimize it and run it as
> it
> > > thinks it's best, which can be radically different from what you
> > envisioned.
> > >
> > > I have never heard of systems where you can directly inject a
> > > physical/execution plan, but I haven't really looked for that.
> > >
> > > HTH,
> > > Alessandro
> > >
> > > On Tue, 22 Mar 2022 at 22:10, Gavin Ray <ra...@gmail.com> wrote:
> > >
> > >> I'm on my second pass of the book "How Query Engines Work" by Arrow's
> > own
> > >> Andy Grove
> > >> (Really great read, huge recommendation:
> > >> https://leanpub.com/how-query-engines-work)
> > >>
> > >> Something I'm not sure I'm fully understanding is what qualifies
> > something
> > >> as a Physical Plan
> > >> A logical plan is straightforward, say I have an expression, "Select
> > name
> > >> from users where ID is less than 5"
> > >>
> > >> Then I can represent this as an abstract, logical operation like:
> > >>
> > >> LogicalPlan(
> > >>  project = ["name"],
> > >>  filter = Filter(LessThan(Column("id"), Literal(5)))
> > >> )
> > >>
> > >> Now say I want to give this plan to a database and have it run it.
> > >> I need to write an implementation for translating this to an
> executable
> > >> expression
> > >> (Probably SQL)
> > >>
> > >> Is the class that implements the translation to SQL that gets executed
> > the
> > >> Physical Plan
> > >> Or is there no Physical Plan, and that's the database's job to figure
> > out?
> > >>
> >
> >
>

Re: A basic, DB-agnostic question about the concept of a "Physical Plan"

Posted by Justin Swanhart <gr...@gmail.com>.

Hi,

Generally a cost-based optimizer chooses physical plans (sometimes with
help from a rules-based optimizer).  Hints (/*+USE_NL*/ /*+HASH_JOIN*/) etc
(depends on RDBMS) generally allow the user to override the optimizer and
choose a physical plan that differs from what the database would pick.

On Wed, Mar 23, 2022 at 8:19 AM Charles Givre <cg...@gmail.com> wrote:

> Hi Gavin,
> Thanks for the book recommendation.  That looks really solid and I'm
> definitely going to pick up a copy.  To continue the conversation about
> physical plans, Drill does allow you to view the logical and physical plans
> from a query AND modify them, or submit your own.  Here's a doc link that
> explains how: https://drill.apache.org/docs/query-plans/ <
> https://drill.apache.org/docs/query-plans/>
> Best,
> -- C
>
> > On Mar 23, 2022, at 5:26 AM, Alessandro Solimando <
> alessandro.solimando@gmail.com> wrote:
> >
> > Hi Gavin,
> > in a nutshell, a logical plan consists of logical operators (say, a
> join),
> > which can be implemented in several ways at the physical level (say,
> merge
> > join, hash join, etc.), and are therefore associated with some
> > corresponding physical operators.
> >
> > How the logical vs physical planning is performed depends on the
> > optimization framework you use.
> >
> > SQL is not a plan, it's a declarative language, it only dictates "what"
> you
> > will be getting, the logical plan sketches how you will get it, the
> > physical plan fills the missing details.
> > Then there is also query compilation which translates to "real
> > instructions" (be it machine code, a DAG for an execution engine like for
> > Spark or Hive, etc.).
> >
> > In federated queries, you will plan and split the work across different
> > databases, you will pass them either SQL or some native query language
> (CQL
> > for Cassandra, for example), but then the DB will do all the steps again,
> > parse, validate, etc., build a logical plan, optimize it and run it as it
> > thinks it's best, which can be radically different from what you
> envisioned.
> >
> > I have never heard of systems where you can directly inject a
> > physical/execution plan, but I haven't really looked for that.
> >
> > HTH,
> > Alessandro
> >
> > On Tue, 22 Mar 2022 at 22:10, Gavin Ray <ra...@gmail.com> wrote:
> >
> >> I'm on my second pass of the book "How Query Engines Work" by Arrow's
> own
> >> Andy Grove
> >> (Really great read, huge recommendation:
> >> https://leanpub.com/how-query-engines-work)
> >>
> >> Something I'm not sure I'm fully understanding is what qualifies
> something
> >> as a Physical Plan
> >> A logical plan is straightforward, say I have an expression, "Select
> name
> >> from users where ID is less than 5"
> >>
> >> Then I can represent this as an abstract, logical operation like:
> >>
> >> LogicalPlan(
> >>  project = ["name"],
> >>  filter = Filter(LessThan(Column("id"), Literal(5)))
> >> )
> >>
> >> Now say I want to give this plan to a database and have it run it.
> >> I need to write an implementation for translating this to an executable
> >> expression
> >> (Probably SQL)
> >>
> >> Is the class that implements the translation to SQL that gets executed
> the
> >> Physical Plan
> >> Or is there no Physical Plan, and that's the database's job to figure
> out?
> >>
>
>

Re: A basic, DB-agnostic question about the concept of a "Physical Plan"

Posted by Charles Givre <cg...@gmail.com>.

Hi Gavin, 
Thanks for the book recommendation.  That looks really solid and I'm definitely going to pick up a copy.  To continue the conversation about physical plans, Drill does allow you to view the logical and physical plans from a query AND modify them, or submit your own.  Here's a doc link that explains how: https://drill.apache.org/docs/query-plans/ <https://drill.apache.org/docs/query-plans/>
Best,
-- C

> On Mar 23, 2022, at 5:26 AM, Alessandro Solimando <al...@gmail.com> wrote:
> 
> Hi Gavin,
> in a nutshell, a logical plan consists of logical operators (say, a join),
> which can be implemented in several ways at the physical level (say, merge
> join, hash join, etc.), and are therefore associated with some
> corresponding physical operators.
> 
> How the logical vs physical planning is performed depends on the
> optimization framework you use.
> 
> SQL is not a plan, it's a declarative language, it only dictates "what" you
> will be getting, the logical plan sketches how you will get it, the
> physical plan fills the missing details.
> Then there is also query compilation which translates to "real
> instructions" (be it machine code, a DAG for an execution engine like for
> Spark or Hive, etc.).
> 
> In federated queries, you will plan and split the work across different
> databases, you will pass them either SQL or some native query language (CQL
> for Cassandra, for example), but then the DB will do all the steps again,
> parse, validate, etc., build a logical plan, optimize it and run it as it
> thinks it's best, which can be radically different from what you envisioned.
> 
> I have never heard of systems where you can directly inject a
> physical/execution plan, but I haven't really looked for that.
> 
> HTH,
> Alessandro
> 
> On Tue, 22 Mar 2022 at 22:10, Gavin Ray <ra...@gmail.com> wrote:
> 
>> I'm on my second pass of the book "How Query Engines Work" by Arrow's own
>> Andy Grove
>> (Really great read, huge recommendation:
>> https://leanpub.com/how-query-engines-work)
>> 
>> Something I'm not sure I'm fully understanding is what qualifies something
>> as a Physical Plan
>> A logical plan is straightforward, say I have an expression, "Select name
>> from users where ID is less than 5"
>> 
>> Then I can represent this as an abstract, logical operation like:
>> 
>> LogicalPlan(
>>  project = ["name"],
>>  filter = Filter(LessThan(Column("id"), Literal(5)))
>> )
>> 
>> Now say I want to give this plan to a database and have it run it.
>> I need to write an implementation for translating this to an executable
>> expression
>> (Probably SQL)
>> 
>> Is the class that implements the translation to SQL that gets executed the
>> Physical Plan
>> Or is there no Physical Plan, and that's the database's job to figure out?
>>

Re: A basic, DB-agnostic question about the concept of a "Physical Plan"

Posted by Alessandro Solimando <al...@gmail.com>.

Hi Gavin,
in a nutshell, a logical plan consists of logical operators (say, a join),
which can be implemented in several ways at the physical level (say, merge
join, hash join, etc.), and are therefore associated with some
corresponding physical operators.

How the logical vs physical planning is performed depends on the
optimization framework you use.

SQL is not a plan, it's a declarative language, it only dictates "what" you
will be getting, the logical plan sketches how you will get it, the
physical plan fills the missing details.
Then there is also query compilation which translates to "real
instructions" (be it machine code, a DAG for an execution engine like for
Spark or Hive, etc.).

In federated queries, you will plan and split the work across different
databases, you will pass them either SQL or some native query language (CQL
for Cassandra, for example), but then the DB will do all the steps again,
parse, validate, etc., build a logical plan, optimize it and run it as it
thinks it's best, which can be radically different from what you envisioned.

I have never heard of systems where you can directly inject a
physical/execution plan, but I haven't really looked for that.

HTH,
Alessandro

On Tue, 22 Mar 2022 at 22:10, Gavin Ray <ra...@gmail.com> wrote:

> I'm on my second pass of the book "How Query Engines Work" by Arrow's own
> Andy Grove
> (Really great read, huge recommendation:
> https://leanpub.com/how-query-engines-work)
>
> Something I'm not sure I'm fully understanding is what qualifies something
> as a Physical Plan
> A logical plan is straightforward, say I have an expression, "Select name
> from users where ID is less than 5"
>
> Then I can represent this as an abstract, logical operation like:
>
> LogicalPlan(
>   project = ["name"],
>   filter = Filter(LessThan(Column("id"), Literal(5)))
> )
>
> Now say I want to give this plan to a database and have it run it.
> I need to write an implementation for translating this to an executable
> expression
> (Probably SQL)
>
> Is the class that implements the translation to SQL that gets executed the
> Physical Plan
> Or is there no Physical Plan, and that's the database's job to figure out?
>