You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Jesus Camacho Rodriguez (JIRA)" <ji...@apache.org> on 2016/09/12 12:46:20 UTC

[jira] [Comment Edited] (CALCITE-1334) Convert predicates on EXTRACT function calls into date ranges

    [ https://issues.apache.org/jira/browse/CALCITE-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15483996#comment-15483996 ] 

Jesus Camacho Rodriguez edited comment on CALCITE-1334 at 9/12/16 12:45 PM:
----------------------------------------------------------------------------

Fixed up in http://git-wip-us.apache.org/repos/asf/calcite/commit/63a7a1a .

Changed {{Calendar.getInstance();}}, which is user default timezone dependent, by {{Calendar.getInstance(TimeZone.getTimeZone("UTC"));}}.


was (Author: jcamachorodriguez):
Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/63a7a1a .

Changed {{Calendar.getInstance();}}, which is user default timezone dependent, by {{Calendar.getInstance(TimeZone.getTimeZone("UTC"));}}.

> Convert predicates on EXTRACT function calls into date ranges
> -------------------------------------------------------------
>
>                 Key: CALCITE-1334
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1334
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Assignee: Julian Hyde
>              Labels: druid
>             Fix For: 1.9.0
>
>
> We would like to convert predicates on date dimension columns into date ranges. This is particularly useful for Druid, which has a single timestamp column.
> Consider the case of a materialized view
> {code}
> SELECT sales.*, product.*, time_by_day.*
> FROM sales
> JOIN product USING (product_id)
> JOIN time_by_day USING (time_id)
> {code}
> that corresponds to a Druid table
> {noformat}
> sales_product_time(
>   product_id int not null,
>   time_id int not null,
>   units int not null,
>   the_year int not null,
>   the_quarter int not null,
>   the_month int not null,
>   the_timestamp timestamp not null,
>   product_name varchar(20) not null)
> {noformat}
> And suppose we have the following check constraints:
> * {{CHECK the_year = EXTRACT(YEAR FROM the_timestamp)}}
> * {{CHECK the_month = EXTRACT(MONTH FROM the_timestamp)}}
> Given a query
> {code}
> SELECT product_id, count(*)
> FROM sales
> JOIN product USING (product_id)
> JOIN time_by_day USING (time_id)
> WHERE the_year = 2016
> AND the_month IN (4, 5, 6)
> {code}
> we would like to transform it into the following query to be run against Druid:
> {code}
> SELECT product_id, count(*)
> FROM sales_product_time
> WHERE the_timestamp BETWEEN '2016-04-01' AND '2016-06-30'
> {code}
> Druid can handle timestamp ranges (or disjoint sets of ranges) very efficiently.
> I believe we can write a rule that knows the check constraints and also knows the properties of the {{EXTRACT}} function:
> 1. Apply check constraints to convert {{WHERE year = ...}} to {{WHERE EXTRACT(YEAR FROM the_timestamp) = ...}}, etc.
> 2. {{EXTRACT(YEAR FROM ...)}} is monotonic, therefore we can deduce the range of the_timestamp values such that {{EXTRACT(YEAR FROM the_timestamp)}} returns 2016.
> 3. Then we need to use the fact that {{EXTRACT(MONTH FROM the_timestamp)}} is monotonic if {{the_timestamp}} is bounded within a particular year.
> 4. And we need to merge month ranges somehow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)