You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "liyang (JIRA)" <ji...@apache.org> on 2017/02/25 08:47:44 UTC
[jira] [Comment Edited] (KYLIN-1875) Snowflake schema support

    [ https://issues.apache.org/jira/browse/KYLIN-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15680172#comment-15680172 ] 

liyang edited comment on KYLIN-1875 at 2/25/17 8:47 AM:
--------------------------------------------------------

Initial design, a {{Data Model}} consists of the following:

- one {{Fact Table}}
- multiple {{Lookup Tables}}  (can fit in memory, typically < 300 MB)
- multiple {{Limited Lookup Tables}}  (cannot fit in memory)
- multiple {{Join Relationships}} that connect lookup tables to the fact table directly or indirectly

During cube build

- All tables are joined together and then aggregated to become the {{Cube}}
- Lookup tables, due to the small size, are also saved as {{Snapshots}}
- Limited lookup tables, due to the big size, does not have {{Snapshots}}

Query and limitations

- Lookup tables (excluding the limited) can be queried independently from the snapshots, ensures correct result
- Fact table and limited lookup tables must be queried from the cube, which submits to following limitations.
-- *No raw records* limitation. This is the same as before.
--- For example, {{select * from F}} actually returns {{select * from F group by D1, D2, ..., Dn}}, where F is the fact table, Dn are the dimensions of cube.
--- To support query of raw records, user can add PK as dimension (at the cost of cube size). Or try the [raw measure|http://kylin.apache.org/blog/2016/05/29/raw-measure-in-kylin/] feature.
-- *Enforced joint* limitation. Query on any fact table or limited lookup table will enforce the joins of the whole model, since cube only stores pre-calculated joint result. This limitation also exists in earlier versions.
--- For example, {{select ... from F}} will be executed as {{select ... from F join L1 join L2...}}, where F are the fact tables, Ln are the lookup tables.
--- This limitation IS NOT a problem for left join models and inner join models that has no record loss after inner join.
--- This limitation is a problem for many-to-many relationships. User can work around by creating multiple models, for example let each fact table has its own model.


was (Author: liyang.gmt8@gmail.com):
Initial design, a {{Data Model}} consists of the following:

- one {{Root Fact Table}}
- multiple {{Fact Tables}}
- multiple {{Lookup Tables}}  (can fit in memory, typically < 300 MB)
- multiple {{Join Relationships}} that connect tables to the Root Fact Table directly or indirectly

During cube build

- All tables are joined together and then aggregated as the {{Cube}}
- Lookup tables, due to the small size, are also saved as {{Snapshots}}

Query and limitations

- Lookup tables can be queried independently from the Snapshots, ensures correct result
- Fact tables can only be queried from cube, which enforces the pre-calculated joint and aggregated result
-- *No raw records* limitation. This is the same as before.
--- For example, {{select * from F}} actually returns {{select * from F group by D1, D2, ..., Dn}}, where F is fact table, Dn are dimension columns.
--- To support query of raw records, user can add PK as dimension (at the cost that the cardinality of PK is very high). Or try the [raw measure|http://kylin.apache.org/blog/2016/05/29/raw-measure-in-kylin/] feature.
-- *Enforced joint* limitation. Query on any fact table will enforce the joins of the whole model, since cube only stores pre-calculated joint result. This limitation also exists in earlier versions.
--- For example, {{select ... from F1}} will be executed as {{select ... from F1 join F2 ... join L1 join L2...}}, where Fn are fact tables, Ln are lookup tables.
--- This limitation IS NOT a problem for left join models and inner join models that has no record loss after inner join.
--- This limitation is a problem for many-to-many relationships. User can work around by creating multiple models, for example let each fact table has its own model.

> Snowflake schema support
> ------------------------
>
>                 Key: KYLIN-1875
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1875
>             Project: Kylin
>          Issue Type: Improvement
>    Affects Versions: v1.5.2
>            Reporter: Rahul Choubey
>            Assignee: liyang
>             Fix For: v2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)