You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (JIRA)" <ji...@apache.org> on 2015/08/11 04:07:45 UTC

[jira] [Updated] (CALCITE-838) Optimize "join to most recent" queries

     [ https://issues.apache.org/jira/browse/CALCITE-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julian Hyde updated CALCITE-838:
--------------------------------
    Summary: Optimize "join to most recent" queries  (was: Optimize join-to-MAX queries)

> Optimize "join to most recent" queries
> --------------------------------------
>
>                 Key: CALCITE-838
>                 URL: https://issues.apache.org/jira/browse/CALCITE-838
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Assignee: Julian Hyde
>
> A common class of queries joins a fact table to an "current" record in a dimension table. What appears to be a many-to-many join becomes many-to-one because there is only one current record.
> Here the query on orders (fact) and customers (dimension): {code}
> SELECT *
> FROM Orders
> JOIN Customers AS c USING (customerId)
> WHERE NOT EXISTS (
>   SELECT 1
>   FROM Customers
>   WHERE customerId = c.CustomerId 
>   AND effectiveDate > c.effectiveDate){code}
> Same query pushing WHERE into JOIN:{code}
> SELECT *
> FROM Orders AS o
> JOIN Customers AS c
> ON o.customerId = c.customerId
> AND NOT EXISTS (
>   SELECT 1
>   FROM Customers
>   WHERE customerId = c.CustomerId 
>   AND effectiveDate > c.effectiveDate){code}
> The same query re-formulated using LAST_VALUE:{code}
> SELECT *
> FROM Orders AS o
> JOIN (
>   SELECT last_value(customerId) OVER w AS customerId,
>     last_value(customerName) OVER w AS customerName
>   FROM Customers AS c
>   WINDOW w AS (PARTITION BY customerId
>     ORDER BY effectiveDate DESC)){code}
> This formulation avoids the self-join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)