You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Sanjay Raja <sa...@noesyssoftware.com> on 2017/01/23 12:33:39 UTC

Mondrian + Calcite + Druid

Hi,

 

For the MDX Below

 

With 

Member [Measures].[0] as IIF([Measures].[No Of Sessions]>0,1,0) 

Member [Measures].[1] as Aggregate({[Session Date].[Day].[2016-12-10].Lag(4):[Session Date].[Day].[2016-12-10]},[Measures].[0]) 

Member [Measures].[Repeat Users] as COUNT(FILTER([User].[User].Children,[Measures].[1]>1)) 

Select { [Measures].[Repeat Users] } on Columns From [Session Date Cube] Where { { [Session Date].[Day].[2016-12-10] } }

 

 

Setup Scenario 1

I have created a Table in Druid called DimDate which basically holds the Date Dimension.

The resultant Query and Expected result is fine – however

Mondrian generates the following query which Caclite Executes

 

select "dimdate"."QuarterName" as "c0", "dimdate"."Date" as "c1" from "dimdate" as "dimdate" group by "dimdate"."QuarterName", "dimdate"."Date" order by CASE WHEN "dimdate"."QuarterName" IS NULL THEN 0 ELSE 1 END, "dimdate"."QuarterName" ASC;

 

This basically times out against DRUID.

 

{"queryType":"groupBy","dataSource":"DimDate","granularity":"all","dimensions":["QuarterName","Date"],"limitSpec":{"type":"default"},"aggregations":[{"type":"longSum","name":"dummy_agg","fieldName":"dummy_agg"}],"intervals":["2000-01-01T00:00:00.000Z/2021-01-01T00:00:00.000Z"]}

 

1)      Why does Sqlline give back an empty result for this.

 

 

Scenario 2 

I tried creating an in-memory Table for the Date Dimension using a CustomTableFactory. Here the DimDate values are returned immediately, however the inner join Query Fails.

The generated query is 

 

select "datasource"."username_0" as "c0", "dimdate"."Date" as "c1", count("datasource"."mobilenumber_2") as "m0" from "datasource" as "datasource", "dimdate" as "dimdate" where "dimdate"."Date" in ('20161208', '20161209', '20161210', '20161211') and "datasource"."sessiondate_3_DateDim" = "dimdate"."Date" group by "datasource"."username_0", "dimdate"."Date"

 

The Error Is 

 

[mondrian.rolap.agg.SegmentCacheManager$sqlExecutor_1] DEBUG mondrian.sql  - 6: , failed (java.sql.SQLException: Error while executing SQL "select "datasource"."username_0" as "c0", "dimdate"."Date" as "c1", count("datasource"."mobilenumber_2") as "m0" from "datasource" as "datasource", "dimdate" as "dimdate" where "dimdate"."Date" in ('20161208', '20161209', '20161210', '20161211') and "datasource"."sessiondate_3_DateDim" = "dimdate"."Date" group by "datasource"."username_0", "dimdate"."Date"": org.apache.calcite.interpreter.Bindables$BindableJoin cannot be cast to org.apache.calcite.adapter.enumerable.EnumerableRel)

56092

 

 

Can someone please help with this

 

Regards,

Sanjay

 

 

 

 


Re: Mondrian + Calcite + Druid

Posted by Julian Hyde <jh...@apache.org>.
This is as far as it got: https://issues.apache.org/jira/browse/CALCITE-1358 <https://issues.apache.org/jira/browse/CALCITE-1358>

The time dimension table would need to exist (but could be an in-memory table in Calcite — after all, 10 years is small, only 3,600 rows). Calcite would need to know that the “the_year” column was equivalent to “extract(year from __time)”, probably using some materialized view magic.

The idea is that references to columns in the date dimension table get rewritten to expressions on “__time”, and then when no columns of the date dimension table are referenced it can be dropped from the query (using the foreign key/primary key semantics that come with lattices). Therefore you end up just using the Druid table.

Julian


> On Jan 23, 2017, at 2:22 PM, Gian Merlino <gi...@imply.io> wrote:
> 
> Do you have any docs or examples or suggestions around how that could be
> done? It does sound like it'd be useful but I don't know where to start.
> 
> Gian
> 
> On Mon, Jan 23, 2017 at 12:19 PM, Julian Hyde <jh...@apache.org> wrote:
> 
>> 
>>> On Jan 23, 2017, at 12:14 PM, Gian Merlino <gi...@imply.io> wrote:
>>> 
>>> I bet you could also write some cool Calcite rules to make a phantom
>>> "dimdate" table available, and convert joins on that table into
>> application
>>> of Druid native time functions on the base table. That'd allow the first
>>> SQL query to work in an efficient way.
>> 
>> That would indeed be cool (and useful).
>> 
>> If you define a Calcite lattice of, say, a sales and time_by_day tables,
>> backed by a Druid table, then Calcite will endeavor to convert join queries
>> into queries on the Druid table. Those Druid queries will not be joins.
>> 
>> Julian


Re: Mondrian + Calcite + Druid

Posted by Gian Merlino <gi...@imply.io>.
Do you have any docs or examples or suggestions around how that could be
done? It does sound like it'd be useful but I don't know where to start.

Gian

On Mon, Jan 23, 2017 at 12:19 PM, Julian Hyde <jh...@apache.org> wrote:

>
> > On Jan 23, 2017, at 12:14 PM, Gian Merlino <gi...@imply.io> wrote:
> >
> > I bet you could also write some cool Calcite rules to make a phantom
> > "dimdate" table available, and convert joins on that table into
> application
> > of Druid native time functions on the base table. That'd allow the first
> > SQL query to work in an efficient way.
>
> That would indeed be cool (and useful).
>
> If you define a Calcite lattice of, say, a sales and time_by_day tables,
> backed by a Druid table, then Calcite will endeavor to convert join queries
> into queries on the Druid table. Those Druid queries will not be joins.
>
> Julian

Re: Mondrian + Calcite + Druid

Posted by Julian Hyde <jh...@apache.org>.
> On Jan 23, 2017, at 12:14 PM, Gian Merlino <gi...@imply.io> wrote:
> 
> I bet you could also write some cool Calcite rules to make a phantom
> "dimdate" table available, and convert joins on that table into application
> of Druid native time functions on the base table. That'd allow the first
> SQL query to work in an efficient way.

That would indeed be cool (and useful).

If you define a Calcite lattice of, say, a sales and time_by_day tables, backed by a Druid table, then Calcite will endeavor to convert join queries into queries on the Druid table. Those Druid queries will not be joins.

Julian

Re: Mondrian + Calcite + Druid

Posted by Gian Merlino <gi...@imply.io>.
Date dimension tables don't work well with Druid. Druid works best if you
store the timestamp in Druid's native timestamp column and then use Druid's
native time functions to do date filtering and grouping.

Instead of:

select "datasource"."username_0" as "c0", "dimdate"."Date" as "c1",
count("datasource"."mobilenumber_2") as "m0" from "datasource" as
"datasource", "dimdate" as "dimdate" where "dimdate"."Date" in ('20161208',
'20161209', '20161210', '20161211') and "datasource"."sessiondate_3_DateDim"
= "dimdate"."Date" group by "datasource"."username_0", "dimdate"."Date"

A better query would be:

select "datasource"."username_0" as "c0", FLOOR("datasource"."__time" TO
DAY) as "c1", count("datasource"."mobilenumber_2") as "m0" from
"datasource" as "datasource" where FLOOR("datasource"."__time" TO DAY) IN
(TIMESTAMP '2016-12-08 00:00:00', TIMESTAMP '2016-12-09 00:00:00',
TIMESTAMP '2016-12-10 00:00:00', TIMESTAMP '2016-12-11 00:00:00') group by
"datasource"."username_0", FLOOR("datasource"."__time" TO DAY)

The FLOOR function should get Calcite to generate queries using Druid's
native time functions.

I bet you could also write some cool Calcite rules to make a phantom
"dimdate" table available, and convert joins on that table into application
of Druid native time functions on the base table. That'd allow the first
SQL query to work in an efficient way.

Gian

On Mon, Jan 23, 2017 at 4:33 AM, Sanjay Raja <sa...@noesyssoftware.com>
wrote:

> Hi,
>
>
>
> For the MDX Below
>
>
>
> With
>
> Member [Measures].[0] as IIF([Measures].[No Of Sessions]>0,1,0)
>
> Member [Measures].[1] as Aggregate({[Session Date].[Day].[2016-12-10].Lag(4):[Session
> Date].[Day].[2016-12-10]},[Measures].[0])
>
> Member [Measures].[Repeat Users] as COUNT(FILTER([User].[User].Chi
> ldren,[Measures].[1]>1))
>
> Select { [Measures].[Repeat Users] } on Columns From [Session Date Cube]
> Where { { [Session Date].[Day].[2016-12-10] } }
>
>
>
>
>
> Setup Scenario 1
>
> I have created a Table in Druid called DimDate which basically holds the
> Date Dimension.
>
> The resultant Query and Expected result is fine – however
>
> Mondrian generates the following query which Caclite Executes
>
>
>
> select "dimdate"."QuarterName" as "c0", "dimdate"."Date" as "c1" from
> "dimdate" as "dimdate" group by "dimdate"."QuarterName", "dimdate"."Date"
> order by CASE WHEN "dimdate"."QuarterName" IS NULL THEN 0 ELSE 1 END,
> "dimdate"."QuarterName" ASC;
>
>
>
> This basically times out against DRUID.
>
>
>
> {"queryType":"groupBy","dataSource":"DimDate","granularity":
> "all","dimensions":["QuarterName","Date"],"limitSpec":{"type":"default"},"
> aggregations":[{"type":"longSum","name":"dummy_agg","fieldNa
> me":"dummy_agg"}],"intervals":["2000-01-01T00:00:00.000Z/
> 2021-01-01T00:00:00.000Z"]}
>
>
>
> 1)      Why does Sqlline give back an empty result for this.
>
>
>
>
>
> Scenario 2
>
> I tried creating an in-memory Table for the Date Dimension using a
> CustomTableFactory. Here the DimDate values are returned immediately,
> however the inner join Query Fails.
>
> The generated query is
>
>
>
> select "datasource"."username_0" as "c0", "dimdate"."Date" as "c1",
> count("datasource"."mobilenumber_2") as "m0" from "datasource" as
> "datasource", "dimdate" as "dimdate" where "dimdate"."Date" in ('20161208',
> '20161209', '20161210', '20161211') and "datasource"."sessiondate_3_DateDim"
> = "dimdate"."Date" group by "datasource"."username_0", "dimdate"."Date"
>
>
>
> The Error Is
>
>
>
> [mondrian.rolap.agg.SegmentCacheManager$sqlExecutor_1] DEBUG
> mondrian.sql  - 6: , failed (java.sql.SQLException: Error while executing
> SQL "select "datasource"."username_0" as "c0", "dimdate"."Date" as "c1",
> count("datasource"."mobilenumber_2") as "m0" from "datasource" as
> "datasource", "dimdate" as "dimdate" where "dimdate"."Date" in ('20161208',
> '20161209', '20161210', '20161211') and "datasource"."sessiondate_3_DateDim"
> = "dimdate"."Date" group by "datasource"."username_0", "dimdate"."Date"":
> org.apache.calcite.interpreter.Bindables$BindableJoin cannot be cast to
> org.apache.calcite.adapter.enumerable.EnumerableRel)
>
> 56092
>
>
>
>
>
> Can someone please help with this
>
>
>
> Regards,
>
> Sanjay
>
>
>
>
>
>
>
>
>
>