You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Billy(Yiming) Liu (JIRA)" <ji...@apache.org> on 2016/09/11 13:48:20 UTC

[jira] [Assigned] (KYLIN-1576) Support of new join type in the Cube Model - Temporal Join

     [ https://issues.apache.org/jira/browse/KYLIN-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Billy(Yiming) Liu reassigned KYLIN-1576:
----------------------------------------

    Assignee: Billy(Yiming) Liu  (was: Yiming Liu)

> Support of new join type in the Cube Model - Temporal Join
> ----------------------------------------------------------
>
>                 Key: KYLIN-1576
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1576
>             Project: Kylin
>          Issue Type: New Feature
>          Components: General
>    Affects Versions: Future
>            Reporter: Richard Calaba
>            Assignee: Billy(Yiming) Liu
>            Priority: Blocker
>
> There is a notion of time-dependent master data in many business scenarios. Typically modeled with granularity 1 day (datefrom, dateto fields of type DATE defining validity time of one master data record). Occasionally you can think of lower granularity so use of TIMESTAMP can be also seen as an valid scenario). Example of such master data definition could be:
> Master Data / Dimension Table:
> =========================
> KEY: PRODUCT_ID, DATE_TO, 
> NON-KEY: DATE_FROM, PRODUCT_DESCRIPTION
> - assuming that PRODUCT_DESCRIPTION cannot have 2 values during one day it is assumed that DATE_TO <= DATE_TO and also that there are no overlapping intervals (DATE_FROM, DATE_TO) for all PRODUCT master data
> - the KEY is then intentionally defined as (PRODUCT_ID, DATE_TO) so the statment SELECT * from PRODUCT WHERE ID = 'prod_key_1' AND DATE_TO >= today/now and DATE_FROM <= today/now is efficient way to retrieve 'current' PRODUCT master data (description). The today/now is also being named as 'key date'.
> - now if I have transaction data (FACT table) of product sales, i.e:
> SALES_DATE, PRODUCT_ID, STORE_ID, ....
> I would like to show the Sold Products at Store at certain date and also show the Description of the product at the date of product sale (assuming here that there is product catalog which can be updated independently, but for auditing purposes the original product description used during sale is needed to be displayed/used).
> The SQL for the temporal join would be then:
> SELECT S.PRODUCT_ID, S.SALES_DATE, P.PRODUCT_DESCRIPTION 
> FROM SALES as S LEFT OUTER JOIN PRODUCT as P 
> ON S.PRODUCT_ID = P.PRODUCT_ID 
> AND S.SALES_DATE >= P.DATE_FROM AND 
> AND S.SALES_DATE <= P.DATE_TO 
> (also INNER TEMPORAL JOIN can be defined and be valid in some scenarios but in this case it won't be the proper join - we need to show the product sales even the description wasn't maintained in product master data)
> (some more details for temporal joins - see i.e. here - http://scn.sap.com/community/hana-in-memory/blog/2014/09/28/using-temporal-join-to-fetch-the-result-set-within-the-time-interval )
> This scenario can be supported by Kylin if following enhancement would be done:
> 1) The Cube Model allowing to define special variant of LEFT OUTER and INNER joins (suggesting name temporal (left outer/inner) join) which forces to specify a 'key date' as a expression (column / constant / ...) from the FACT table and 2 validity fields ('valid from' and 'valid to') fro the LOOKUP table/ Those 2 validity fields are defining master data record validity period. Supported types for those fields should be DATE, optionally TIMESTAMP is also fine but rarely used in business scenarios. 
> Other option rather then defining new join type is to loosen the join condition and allowing <= and >= operands to be used as part of the LOOKUP join definition.
> 2) The Cube Definition then needs to know the extension of the join type in the cube model and needs to force the additional fields (key-date, valid-from, valid-to) be part of the whole cube structure.  Or alternatively cube definition for derived dimensions can be extended to define a "time-dependent derived lookup" similar way as described in the step 1) for the suggested cube model join type extension. 
> 3) Very often the time-partition field of the cube which is being used for incremental data loads to cubes will be the 'key-date'. BUT this shouldn't be hard-coded this way as this is not true for every scenario. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)