You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Scott Reynolds <sd...@gmail.com> on 2021/08/10 16:56:41 UTC

Materialized View and Lattice Suggester help

Hi Calcite devs,

Today our team deploys a Calcite service that uses Daily OLAP Cube
tables and fine grain Fact tables. Daily OLAP Cube tables are used to
vastly improve latency when querying for a wide date range but comes
with a restriction – the date range *must* be UTC day granularity. When
a request comes into the Cube endpoint, our service picks the best Cube,
creates a Logical Plan for that query and lets Calcite optimize it into
Enumerable. Our Service has matured to the point where we would like to
*remove* the UTC day granularity restriction and would like to do Query
Rewriting and I am writing to you all to figure out the best way to do
that. Calcite supports Materialized Views and Lattices and I haven't
been able to pull this together. Here is a motivating example:

┌────
│ SELECT "SIM_ID", SUM("BYTES_DOWNLINK") + SUM(BYTES_UPLINK) as "SUM_BYTES"
│ FROM "kudu"."WirelessEvents"
│ WHERE "ACCOUNT_ID" = 18789 AND ("START_TIME" >= TIMESTAMP '2021-07-31
11:30:01' AND "START_TIME" < TIMESTAMP '2021-09-01 03:00:00')
│ GROUP BY "SIM_ID"
│ ORDER BY "SIM_ID" DESC;
└────

This is a query we would like to rewrite to have it use the UTC Daily
OLAP Cube. One possible rewrite of this:

┌────
│ SELECT "SIM_ID", SUM("SUM_BYTES_DOWNLINK") + SUM("SUM_BYTES_UPLINK") AS
"SUM_BYTES"
│
│ FROM (
│ /* Cube Table query with all the filters on dimensions */
│ /* First UTC Day and last UTC Day */
│ SELECT "SIM_ID", "SUM_BYTES_DOWNLINK", SUM_BYTES_UPLINK
│ FROM "kudu"."Wireless-Daily-Aggregation"
│ WHERE "ACCOUNT_ID" = 18789 AND ("START_TIME" >= TIMESTAMP '2021-08-01
00:00:00' AND "START_TIME" < TIMESTAMP '2021-08-31 23:59:59')
│
│ UNION ALL
│
│ /* Fact Table query with all the filters on dimensions */
│ /* Aliasing measure fields into their Cube table values */
│ /* Date filter is from startDate to the first UTC day in query */
│ SELECT "SIM_ID", "BYTES_DOWNLINK" AS "SUM_BYTES_DOWNLINK", "BYTES_UPLINK"
AS "SUM_BYTES_UPLINK"
│ FROM "kudu"."WirelessEvents"
│ WHERE "ACCOUNT_ID" = 18789 AND ("START_TIME" >= TIMESTAMP '2021-07-31
11:30:01' AND "START_TIME" < TIMESTAMP '2021-08-01 00:00:00')
│
│ UNION ALL
│
│ /* Fact Table query with all the filters on dimensions  */
│ /* Aliasing measure fields into their Cube table values */
│ /* Date filter is from last UTC day to endDate of the query */
│ SELECT "SIM_ID", "BYTES_DOWNLINK" AS "SUM_BYTES_DOWNLINK", "BYTES_UPLINK"
AS "SUM_BYTES_UPLINK"
│ FROM "kudu"."WirelessEvents"
│ WHERE "ACCOUNT_ID" = 18789 AND ("START_TIME" >= TIMESTAMP '2021-08-31
23:59:59' AND "START_TIME" < TIMESTAMP '2021-09-01 03:00:00')
│ )
│
│ GROUP BY "SIM_ID"
│ ORDER BY "SIM_ID" DESC;
└────

In this instance, Materialized View rule will need to:
1. Create two scans (or one with a Disjunction) on the ranges outside of
   UTC boundary
2. Create `Projection' that changes the names of the Fact columns to
   match the measure names – `BYTES_UPLINK' becomes `SUM_BYTES_UPLINK'

I am overwhelmed by [Materialized View] rules and [Lattice Suggester] and
hoping you all could describe in broad terms how you would approach this
task. What do I need to configure? What do I need to build?

Thanks so much !


[Materialized View]
https://calcite.apache.org/docs/materialized_views.html

[Lattice Suggester] https://calcite.apache.org/docs/lattice.html