You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by "Roberto Tardío Olmos (JIRA)" <ji...@apache.org> on 2017/11/01 13:35:01 UTC

[jira] [Created] (KYLIN-2983) Support for INNER JOINS in any order in the FROM clause

Roberto Tardío Olmos created KYLIN-2983:
-------------------------------------------

             Summary: Support for INNER JOINS in any order in the FROM clause
                 Key: KYLIN-2983
                 URL: https://issues.apache.org/jira/browse/KYLIN-2983
             Project: Kylin
          Issue Type: Improvement
          Components: Query Engine
    Affects Versions: v2.1.0, v2.0.0
            Reporter: Roberto Tardío Olmos
            Assignee: liyang
            Priority: Normal


Kylin versions after Kylin 1.6 (2.0 and 2.1) do not support queries over dimensions with INNER JOINS if you do not put first the fact table in the FROM clause. I think this can be related with new snowflake schema support. 

For example. I defined a data model over a hive star schema. Therefore I defined INNER JOINS between fact table (F_RENDIMIENTO) and dimension tables (D_CURSO_ACADEMICO_VK). I also defined a cube and built it. 
The issue is, the following query goes:

select D_CURSO_ACADEMICO_VK.ID_CURSO_ACADEMICO, sum(CREDITOS)
 from F_RENDIMIENTO JOIN D_CURSO_ACADEMICO_VK ON F_RENDIMIENTO.ID_CURSO_ACADEMICO = D_CURSO_ACADEMICO_VK.ID_CURSO_ACADEMICO
 group by D_CURSO_ACADEMICO_VK.ID_CURSO_ACADEMICO

But susprisingly if I change the INNER JOIN order the following query does not go

select D_CURSO_ACADEMICO_VK.ID_CURSO_ACADEMICO, sum(CREDITOS)
from D_CURSO_ACADEMICO_VK JOIN F_RENDIMIENTO ON F_RENDIMIENTO.ID_CURSO_ACADEMICO = D_CURSO_ACADEMICO_VK.ID_CURSO_ACADEMICO
group by D_CURSO_ACADEMICO_VK.ID_CURSO_ACADEMICO

Error while executing SQL "select D_CURSO_ACADEMICO_VK.ID_CURSO_ACADEMICO, sum(CREDITOS) from D_CURSO_ACADEMICO_VK JOIN F_RENDIMIENTO ON F_RENDIMIENTO.ID_CURSO_ACADEMICO = D_CURSO_ACADEMICO_VK.ID_CURSO_ACADEMICO group by D_CURSO_ACADEMICO_VK.ID_CURSO_ACADEMICO LIMIT 50000": No realization found for rel#7393:OLAPTableScan.OLAP.[](table=[DM_ACAD_KYLIN_ORC, D_CURSO_ACADEMICO_VK],fields=[0, 1]), JoinDesc [type=INNER, primary_key=[ID_CURSO_ACADEMICO], foreign_key=[ID_CURSO_ACADEMICO]]

Since this issue is possible related to snowflake schema support, I suggest to differentiate in metadata and then, kylin query engine, between this two types of schemas, in order to allow this kind of queries over a Star Schema. Another possibility is to check on query engine if there is a INNER JOIN, independently of then kind of Hive Source Schema. What do you think about?

This issue could lead to human errors wrinting ANSI-92 SQL queries. Furthermore some BI tools, specially old ones, can generate this kind of queries not supported by Kylin 2.0 and 2.1.

Thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)