You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (JIRA)" <ji...@apache.org> on 2015/08/11 04:07:45 UTC
[jira] [Updated] (CALCITE-838) Optimize "join to most recent"
queries
[ https://issues.apache.org/jira/browse/CALCITE-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Julian Hyde updated CALCITE-838:
--------------------------------
Summary: Optimize "join to most recent" queries (was: Optimize join-to-MAX queries)
> Optimize "join to most recent" queries
> --------------------------------------
>
> Key: CALCITE-838
> URL: https://issues.apache.org/jira/browse/CALCITE-838
> Project: Calcite
> Issue Type: Bug
> Reporter: Julian Hyde
> Assignee: Julian Hyde
>
> A common class of queries joins a fact table to an "current" record in a dimension table. What appears to be a many-to-many join becomes many-to-one because there is only one current record.
> Here the query on orders (fact) and customers (dimension): {code}
> SELECT *
> FROM Orders
> JOIN Customers AS c USING (customerId)
> WHERE NOT EXISTS (
> SELECT 1
> FROM Customers
> WHERE customerId = c.CustomerId
> AND effectiveDate > c.effectiveDate){code}
> Same query pushing WHERE into JOIN:{code}
> SELECT *
> FROM Orders AS o
> JOIN Customers AS c
> ON o.customerId = c.customerId
> AND NOT EXISTS (
> SELECT 1
> FROM Customers
> WHERE customerId = c.CustomerId
> AND effectiveDate > c.effectiveDate){code}
> The same query re-formulated using LAST_VALUE:{code}
> SELECT *
> FROM Orders AS o
> JOIN (
> SELECT last_value(customerId) OVER w AS customerId,
> last_value(customerName) OVER w AS customerName
> FROM Customers AS c
> WINDOW w AS (PARTITION BY customerId
> ORDER BY effectiveDate DESC)){code}
> This formulation avoids the self-join.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)