You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Sarnath <st...@gmail.com> on 2015/11/25 19:42:41 UTC

Lattice, tiles and materialized views

Hi all,

Good morning!

Please bear with my newbie question here. I am still trying to find out
what calcite is all about.

I went through the latest calcite presentation on relational algebra. I
followed all of it well except the last few slides that talk about lattice,
tiles and materialized views.... I don't see the connection between these
and relational algebra. Can someone explain this to me, please?

Thanks a ton,
Best,
Sarnath

Re: Lattice, tiles and materialized views

Posted by Sarnath <st...@gmail.com>.
Thanks for the link. We will definitely check that. I think we are closer
to our objective of implementing a OLAP engine 0.1 than we thought earlier.
Reminds me of what is written on a rear view mirror. Thanks a ton!

Re: Lattice, tiles and materialized views

Posted by Julian Hyde <jh...@apache.org>.
If you want a demo of the lattice optimizer in action, edit core/src/test/resources/hsqldb-foodmart-lattice-model.json and change the line

  algorithm: false,

to

  algorithm: true,

Then connect to that model from sqlline:

$ sqlline
sqlline> !connect jdbc:calcite:model=core/src/test/resources/hsqldb-foodmart-lattice-model.json sa sa
sqlline> !tables

It takes a while to connect because it needs to load the embedded hsqldb data set and run the optimization algorithm. You will see in the output from ‘!tables’ that there are about 20 tables named “m{16, 17, 32}” or similar. These are the summary tables chosen by the algorithm and populated as in-memory tables. Run some queries against the foodmart schema and you will see those tables being used.

The algorithm can of course create tables in external databases, defer population, use tables you have designed manually, and so forth.

Julian


as the model file. When calcite connects, it runs the optimizer and chooses 
> On Nov 25, 2015, at 5:49 PM, Sarnath <st...@gmail.com> wrote:
> 
> Hi Julian,
> Thanks for the education. This makes lot of sense. I was under the
> impression that I will have to implement the summary table optimizer
> myself. But from your answer it looks like Calcite already provided a
> mechanism for this. This is really great! I will check out the docs further
> to see how to implement and register summary tables.
> Calcite is one of the most interesting project that I am recommending my
> colleagues to follow through. Even yesterday, we were recommending to check
> the Calcite-mongo connector for one of our customer's  research prototype.
> Thanks,
> Best,
> Sarnath


Re: Lattice, tiles and materialized views

Posted by Sarnath <st...@gmail.com>.
Hi Julian,
Thanks for the education. This makes lot of sense. I was under the
impression that I will have to implement the summary table optimizer
myself. But from your answer it looks like Calcite already provided a
mechanism for this. This is really great! I will check out the docs further
to see how to implement and register summary tables.
Calcite is one of the most interesting project that I am recommending my
colleagues to follow through. Even yesterday, we were recommending to check
the Calcite-mongo connector for one of our customer's  research prototype.
Thanks,
Best,
Sarnath

Re: Lattice, tiles and materialized views

Posted by Julian Hyde <jh...@apache.org>.
Here’s how relational algebra fit together (at least in my mind).

Relational algebra is the intermediate language for database queries. It has similar expressive power to, say, SQL, but for various reasons it is easier to manipulate expressions in relational algebra than SQL.

Query optimization is one of the main reasons you want to manipulate relational expressions. You transform the expression into another expression that is equivalent (i.e. gives the same result for all possible inputs) and has minimal cost. Manipulations include logical optimizations (e.g. pushing a filter through a project), physical optimizations (choosing to use merge join because the inputs are already sorted) and substitutions of materialized queries.

The last kind is especially important for OLAP. A lot of OLAP queries are aggregations of joins, and these results can be pre-computed in summary tables. Rewriting a query to use a summary table rather than the original table(s) is a really powerful optimization technique, because you can answer a query that might read billions of rows by reading a few rows from a summary table. A summary table is a materialized query. (By the way, since I know you are familiar with Kylin, I’ll point out that Kylin cuboids are basically summary tables.)

The problem with OLAP is that there are many, many possible summary tables. You can’t create them all - it would use much more disk space than the original database. You could in principle define and populate them by hand, but it is better automated. So, a lattice is a space that allows a “summary table optimizer” to create a good set of summary tables.

When a query is being optimized, the lattice isn’t really in play. There are summary tables T1, …, T17 and Calcite will pick the best applicable summary table (or none).

(This is a white lie. Actually lattices are in play at query optimization time. If a summary table belongs to a lattice, Calcite can use a more efficient process to figure out whether it will satisfy the query. But the results would be the same as if it just used the list of summary tables.)

Julian


> On Nov 25, 2015, at 10:42 AM, Sarnath <st...@gmail.com> wrote:
> 
> Hi all,
> 
> Good morning!
> 
> Please bear with my newbie question here. I am still trying to find out
> what calcite is all about.
> 
> I went through the latest calcite presentation on relational algebra. I
> followed all of it well except the last few slides that talk about lattice,
> tiles and materialized views.... I don't see the connection between these
> and relational algebra. Can someone explain this to me, please?
> 
> Thanks a ton,
> Best,
> Sarnath