You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Shubham Kumar <sh...@gmail.com> on 2019/09/11 13:42:28 UTC

Is it possible to retrieve materialized view query rewrite from Calcite Plan

Hi Contributors,

I have recently started to experiment with Apache Calcite's materialized
views and I needed a bit of help.

I have explicitly defined materialized views which are stored in data
source by defining it in root.schema.materializations of model file.

Now for a query which should use the Materialized View:

Explain plan for "query" gives a plan which shows that the MV is being
utilized.

However, I wanted the optimized rewrite SQL query if possible, not just the
plan. So far, I tried to use the rel2sql api but when I print
RelNode.toString(), it gives me a plan which involves scanning the raw
tables and hence the SQL generated by rel2sql is not the one which utilized
the materialized view. Any pointers to get the rewrite query.

-- 
Thanks & Regards

Shubham Kumar

Re: Is it possible to retrieve materialized view query rewrite from Calcite Plan

Posted by Stamatis Zampetakis <za...@gmail.com>.
Hi Shubham,

View based rewriting is performed during planning (check the call hierarchy
of [1] for more details).
If you obtain a plan (RelNode) after the planner then most likely rel2sql
API should give you the right SQL string.
On the other hand if you are just using sql2rel API then rewritings are not
considered.

Best,
Stamatis

[1]
https://github.com/apache/calcite/blob/d3c718328d4c83fb24007c8349b31b420dddd187/core/src/main/java/org/apache/calcite/plan/RelOptPlanner.java#L177

On Wed, Sep 11, 2019 at 3:42 PM Shubham Kumar <sh...@gmail.com>
wrote:

> Hi Contributors,
>
> I have recently started to experiment with Apache Calcite's materialized
> views and I needed a bit of help.
>
> I have explicitly defined materialized views which are stored in data
> source by defining it in root.schema.materializations of model file.
>
> Now for a query which should use the Materialized View:
>
> Explain plan for "query" gives a plan which shows that the MV is being
> utilized.
>
> However, I wanted the optimized rewrite SQL query if possible, not just the
> plan. So far, I tried to use the rel2sql api but when I print
> RelNode.toString(), it gives me a plan which involves scanning the raw
> tables and hence the SQL generated by rel2sql is not the one which utilized
> the materialized view. Any pointers to get the rewrite query.
>
> --
> Thanks & Regards
>
> Shubham Kumar
>

Re: Is it possible to retrieve materialized view query rewrite from Calcite Plan

Posted by XING JIN <ji...@gmail.com>.
I think it will work. I once did similar things and was playing around with
VolcanoPlanner and RelOptMaterializations.



Shubham Kumar <sh...@gmail.com> 于2019年9月16日周一 下午8:32写道:

> Hey everyone,
>
> Thank you for the help.
>
> @Stamatis, @Danny : Thanks for pointing out that  the reltosql api should
> be used after volcano planner execution which considers materializations
> and lattices, I understand it now.
>
> @Xing: Yeah, it works perfectly fine, I am able to see simple substitutions
> in action done by Calcite Planner!
>
> Although I guess "*org.apache.calcite.tes*t" API can't be accessed in some
> other project. To extract out the materialized query,  I was playing around
> with Prepare and optimize of Calcite Prepare class[1]. Let me know if I am
> moving in the correct direction.
>
> Since, now I am able to get the rewrite query, I wanted to share the exact
> use case which I am trying to implement and get your insights:
>
> I have explicitly defined materialized views which are stored in data
> source and defining it in *root.schema.materializations *of model file
> (keeping the "*view*" field empty as its already exists). Now, I will be
> completely responsible for maintaining this materialized view table in my
> datasource as various datasources don't have built in Materialized View
> like Mysql etc. Finally, I aim the rewrite query that I retrieve from
> Calcite to execute it using Spark SQL (data of tables will be stored in
> HDFS). So basically query rewrite of Calcite will be a layer before Spark
> SQL and it uses the optimized query using materialized views for making the
> actual Spark plan and executing it in a distributed manner.
>
> I would love to have your insights on this on whether this is will be
> feasible and if it would improve query performance.
>
> Problems that I am facing currently is that while using "
> *org.apache.calcite.adapter.jdbc.JdbcSchema$Factory*" as the factory method
> I get the following exception:
>
>
> *Cannot define materialization; parent schema 'test_calcite' is not a
> SemiMutableSchema*
>
> Currently I am using CloneSchema$Factory for testing things out but I guess
> it maintains in memory tables which eventually I won't be able to use in
> case of large tables. I was wondering if I would have to write my own
> Factory classes and implementations for my use case or is there something
> already present.
>
> [1]
>
>
> https://github.com/apache/calcite/blob/73023148e7f37d494f6caf92b01b090f6dde13cd/core/src/main/java/org/apache/calcite/prepare/Prepare.java#L320
>
> Thanks
> Shubham
>
>
> On Mon, Sep 16, 2019 at 11:32 AM XING JIN <ji...@gmail.com> wrote:
>
> > Hi,  Shubham Kumar
> > If I understand correctly, you want get the optimized(by materialized
> view)
> > SQL String. I wrote a simple test as below, please check if it's helpful
> > for you.
> >
> > @Test public void testDEV() {
> >   final String m = "select \"deptno\", \"empid\", \"name\""
> >       + "from \"emps\" where \"deptno\" = 10";
> >   final String q = "select \"empid\" + 1 as x, \"name\""
> >       + "from \"emps\" where \"deptno\" = 10";
> >
> >   CalciteAssert.that()
> >       .withMaterializations(HR_FKUK_MODEL, "m0", m)
> >       .query(q)
> >       .withHook(Hook.SUB, (Consumer<RelNode>) r ->
> >       {
> >         RelToSqlConverter converter =
> >             new RelToSqlConverter(CalciteSqlDialect.DEFAULT);
> >         final SqlNode sqlNode = converter.visitChild(0, r).asStatement();
> >
> >
> System.out.println(sqlNode.toSqlString(CalciteSqlDialect.DEFAULT).getSql());
> >       })
> >       .enableMaterializations(true)
> >       .explainContains("hr, m0");
> > }
> >
> > The output is as below:
> >
> > SELECT "empid" + 1 AS "X", "name"
> > FROM "hr"."m0"
> >
> >
> > Danny Chan <yu...@gmail.com> 于2019年9月16日周一 下午1:44写道:
> >
> > > Hi, Shubham Kumar ~
> > >
> > > > However, I wanted the optimized rewrite SQL query if possible, not
> just
> > > the
> > > > plan. So far, I tried to use the rel2sql api but when I print
> > > > RelNode.toString(), it gives me a plan which involves scanning the
> raw
> > > > tables and hence the SQL generated by rel2sql is not the one which
> > > utilized
> > > > the materialized view. Any pointers to get the rewrite query.
> > >
> > > Did you try the RelNode.toString() after the volcano planner promotion,
> > it
> > > is not expected to involve scanning the raw tables.
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年9月11日 +0800 PM9:42,dev@calcite.apache.org,写道:
> > > >
> > > > However, I wanted the optimized rewrite SQL query if possible, not
> just
> > > the
> > > > plan. So far, I tried to use the rel2sql api but when I print
> > > > RelNode.toString(), it gives me a plan which involves scanning the
> raw
> > > > tables and hence the SQL generated by rel2sql is not the one which
> > > utilized
> > > > the materialized view. Any pointers to get the rewrite query.
> > >
> >
>
>
> --
> Thanks & Regards
>
> Shubham Kumar
>

Re: Is it possible to retrieve materialized view query rewrite from Calcite Plan

Posted by Shubham Kumar <sh...@gmail.com>.
Hey everyone,

Thank you for the help.

@Stamatis, @Danny : Thanks for pointing out that  the reltosql api should
be used after volcano planner execution which considers materializations
and lattices, I understand it now.

@Xing: Yeah, it works perfectly fine, I am able to see simple substitutions
in action done by Calcite Planner!

Although I guess "*org.apache.calcite.tes*t" API can't be accessed in some
other project. To extract out the materialized query,  I was playing around
with Prepare and optimize of Calcite Prepare class[1]. Let me know if I am
moving in the correct direction.

Since, now I am able to get the rewrite query, I wanted to share the exact
use case which I am trying to implement and get your insights:

I have explicitly defined materialized views which are stored in data
source and defining it in *root.schema.materializations *of model file
(keeping the "*view*" field empty as its already exists). Now, I will be
completely responsible for maintaining this materialized view table in my
datasource as various datasources don't have built in Materialized View
like Mysql etc. Finally, I aim the rewrite query that I retrieve from
Calcite to execute it using Spark SQL (data of tables will be stored in
HDFS). So basically query rewrite of Calcite will be a layer before Spark
SQL and it uses the optimized query using materialized views for making the
actual Spark plan and executing it in a distributed manner.

I would love to have your insights on this on whether this is will be
feasible and if it would improve query performance.

Problems that I am facing currently is that while using "
*org.apache.calcite.adapter.jdbc.JdbcSchema$Factory*" as the factory method
I get the following exception:


*Cannot define materialization; parent schema 'test_calcite' is not a
SemiMutableSchema*

Currently I am using CloneSchema$Factory for testing things out but I guess
it maintains in memory tables which eventually I won't be able to use in
case of large tables. I was wondering if I would have to write my own
Factory classes and implementations for my use case or is there something
already present.

[1]

https://github.com/apache/calcite/blob/73023148e7f37d494f6caf92b01b090f6dde13cd/core/src/main/java/org/apache/calcite/prepare/Prepare.java#L320

Thanks
Shubham


On Mon, Sep 16, 2019 at 11:32 AM XING JIN <ji...@gmail.com> wrote:

> Hi,  Shubham Kumar
> If I understand correctly, you want get the optimized(by materialized view)
> SQL String. I wrote a simple test as below, please check if it's helpful
> for you.
>
> @Test public void testDEV() {
>   final String m = "select \"deptno\", \"empid\", \"name\""
>       + "from \"emps\" where \"deptno\" = 10";
>   final String q = "select \"empid\" + 1 as x, \"name\""
>       + "from \"emps\" where \"deptno\" = 10";
>
>   CalciteAssert.that()
>       .withMaterializations(HR_FKUK_MODEL, "m0", m)
>       .query(q)
>       .withHook(Hook.SUB, (Consumer<RelNode>) r ->
>       {
>         RelToSqlConverter converter =
>             new RelToSqlConverter(CalciteSqlDialect.DEFAULT);
>         final SqlNode sqlNode = converter.visitChild(0, r).asStatement();
>
> System.out.println(sqlNode.toSqlString(CalciteSqlDialect.DEFAULT).getSql());
>       })
>       .enableMaterializations(true)
>       .explainContains("hr, m0");
> }
>
> The output is as below:
>
> SELECT "empid" + 1 AS "X", "name"
> FROM "hr"."m0"
>
>
> Danny Chan <yu...@gmail.com> 于2019年9月16日周一 下午1:44写道:
>
> > Hi, Shubham Kumar ~
> >
> > > However, I wanted the optimized rewrite SQL query if possible, not just
> > the
> > > plan. So far, I tried to use the rel2sql api but when I print
> > > RelNode.toString(), it gives me a plan which involves scanning the raw
> > > tables and hence the SQL generated by rel2sql is not the one which
> > utilized
> > > the materialized view. Any pointers to get the rewrite query.
> >
> > Did you try the RelNode.toString() after the volcano planner promotion,
> it
> > is not expected to involve scanning the raw tables.
> >
> > Best,
> > Danny Chan
> > 在 2019年9月11日 +0800 PM9:42,dev@calcite.apache.org,写道:
> > >
> > > However, I wanted the optimized rewrite SQL query if possible, not just
> > the
> > > plan. So far, I tried to use the rel2sql api but when I print
> > > RelNode.toString(), it gives me a plan which involves scanning the raw
> > > tables and hence the SQL generated by rel2sql is not the one which
> > utilized
> > > the materialized view. Any pointers to get the rewrite query.
> >
>


-- 
Thanks & Regards

Shubham Kumar

Re: Is it possible to retrieve materialized view query rewrite from Calcite Plan

Posted by XING JIN <ji...@gmail.com>.
Hi,  Shubham Kumar
If I understand correctly, you want get the optimized(by materialized view)
SQL String. I wrote a simple test as below, please check if it's helpful
for you.

@Test public void testDEV() {
  final String m = "select \"deptno\", \"empid\", \"name\""
      + "from \"emps\" where \"deptno\" = 10";
  final String q = "select \"empid\" + 1 as x, \"name\""
      + "from \"emps\" where \"deptno\" = 10";

  CalciteAssert.that()
      .withMaterializations(HR_FKUK_MODEL, "m0", m)
      .query(q)
      .withHook(Hook.SUB, (Consumer<RelNode>) r ->
      {
        RelToSqlConverter converter =
            new RelToSqlConverter(CalciteSqlDialect.DEFAULT);
        final SqlNode sqlNode = converter.visitChild(0, r).asStatement();
        System.out.println(sqlNode.toSqlString(CalciteSqlDialect.DEFAULT).getSql());
      })
      .enableMaterializations(true)
      .explainContains("hr, m0");
}

The output is as below:

SELECT "empid" + 1 AS "X", "name"
FROM "hr"."m0"


Danny Chan <yu...@gmail.com> 于2019年9月16日周一 下午1:44写道:

> Hi, Shubham Kumar ~
>
> > However, I wanted the optimized rewrite SQL query if possible, not just
> the
> > plan. So far, I tried to use the rel2sql api but when I print
> > RelNode.toString(), it gives me a plan which involves scanning the raw
> > tables and hence the SQL generated by rel2sql is not the one which
> utilized
> > the materialized view. Any pointers to get the rewrite query.
>
> Did you try the RelNode.toString() after the volcano planner promotion, it
> is not expected to involve scanning the raw tables.
>
> Best,
> Danny Chan
> 在 2019年9月11日 +0800 PM9:42,dev@calcite.apache.org,写道:
> >
> > However, I wanted the optimized rewrite SQL query if possible, not just
> the
> > plan. So far, I tried to use the rel2sql api but when I print
> > RelNode.toString(), it gives me a plan which involves scanning the raw
> > tables and hence the SQL generated by rel2sql is not the one which
> utilized
> > the materialized view. Any pointers to get the rewrite query.
>

Re: Is it possible to retrieve materialized view query rewrite from Calcite Plan

Posted by Danny Chan <yu...@gmail.com>.
Hi, Shubham Kumar ~

> However, I wanted the optimized rewrite SQL query if possible, not just the
> plan. So far, I tried to use the rel2sql api but when I print
> RelNode.toString(), it gives me a plan which involves scanning the raw
> tables and hence the SQL generated by rel2sql is not the one which utilized
> the materialized view. Any pointers to get the rewrite query.

Did you try the RelNode.toString() after the volcano planner promotion, it is not expected to involve scanning the raw tables.

Best,
Danny Chan
在 2019年9月11日 +0800 PM9:42,dev@calcite.apache.org,写道:
>
> However, I wanted the optimized rewrite SQL query if possible, not just the
> plan. So far, I tried to use the rel2sql api but when I print
> RelNode.toString(), it gives me a plan which involves scanning the raw
> tables and hence the SQL generated by rel2sql is not the one which utilized
> the materialized view. Any pointers to get the rewrite query.