You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Dayue Gao (JIRA)" <ji...@apache.org> on 2016/09/14 02:05:20 UTC

[jira] [Commented] (KYLIN-2013) more robust approach to hive schema changes

    [ https://issues.apache.org/jira/browse/KYLIN-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15489097#comment-15489097 ] 

Dayue Gao commented on KYLIN-2013:
----------------------------------

Hi [~yimingliu], could you point to me to the jira it's duplicated with? Has this issue already been fixed? I'm just going to submit a patch for it.

> more robust approach to hive schema changes
> -------------------------------------------
>
>                 Key: KYLIN-2013
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2013
>             Project: Kylin
>          Issue Type: Bug
>          Components: Metadata, REST Service, Web 
>    Affects Versions: v1.5.3
>            Reporter: Dayue Gao
>            Assignee: Dayue Gao
>
> Our users occasionally want to change their existing cube, such as adding/renaming/removing a dimension. Some of these changes require modifications to its source hive table. So our user changed the table schema and reloaded its metadata in Kylin, then several issues can happen depends on what he changed.
> I did some schema changing tests based on 1.5.3, the results after reloading table are listed below
> || type of changes || fact table || lookup table ||
> | *minor* | both query and build still works | query can fail or return wrong answer |
> | *major* | fail to load related cube | fail to load related cube |
> {{minor}} changes refer to those doesn't change columns used in cubes, such as insert/append new column, remove/change unused column.
> {{major}} changes are the opposite, like remove/rename/change type of used column.
> Clearly from the table, reload a changed table is problematic in certain cases. KYLIN-1536 reports a similar problem.
> So what can we do to support this kind of iterative development process (load -> define cube -> build -> reload -> change cube -> rebuild)?
> My first thought is simply detect-and-prohibit reloading used table. User should be able to know which cube is preventing him from reloading, and then he could drop and recreate cube after reloading. However, defining a cube is not an easy task (consider editing 100 measures). Force users to recreate their cube over and over again will certainly not make them happy.
> A better idea is to allow cube to be editable even if it's broken due to some columns changed after reloading. Broken cube can't be built or queried, it can only be edit or dropped. In fact, there is a cube status called {{RealizationStatusEnum.DESCBROKEN}} in code, but was never used. We should take advantage of it.
> An enabled cube shouldn't allow schema changes, otherwise an unintentional reload could make it unavailable. Similarly, a disabled but unpurged cube shouldn't allow schema changes since it still has data in it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)