You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Shaofeng SHI (JIRA)" <ji...@apache.org> on 2016/11/01 06:42:58 UTC

[jira] [Commented] (KYLIN-1826) kylin support more than one hive based on different hadoop claster

    [ https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624548#comment-15624548 ] 

Shaofeng SHI commented on KYLIN-1826:
-------------------------------------

Hi Yu, I reviewd the patch, here are some comments:

1. This patch introduces the new "external hive" concept, but I see some places are using "default hive" and "local hive" (totally three terms), this confused me (see in ExecutableConstants.java). It looks like the "local hive" = "external hive", if true, I suggest making the term consistent ("external hive" is better); For a normal deployment (only single hive), there should be no such terms appeared. 

2. The "getSourceType()" method on TableDesc.java only returns ID_HIVE and ID_EXTERNAL_HIVE, while we have "streaming" tables whose type is "ID_STREAMING";

3. "mvn test" failed with error in "RangeKeyDistributionJobTest.testJob", but I didn't see how it be broken...

4. The "hive" attribute appeared at both ProjectInstance.java and TableDesc.java, seems be redundant and may bring the inconsistency issue (user updated project but not on table, etc). 

5. As Hive has deprecated the "hive" cli and recommend using "beeline", in long run beeline should be supported; In my mind beeline should be more easier (set a target hive JDBC server when running the command) to use than "hive" in such a scenario, did you investigate that?

1~3 are high priority which should be fixed before merging to the trunck, 4~5 are optional and can be revisited later.

> kylin support more than one hive based on different hadoop claster
> ------------------------------------------------------------------
>
>                 Key: KYLIN-1826
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1826
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Environment 
>    Affects Versions: v1.5.2
>            Reporter: fengYu
>            Assignee: fengYu
>         Attachments: 0001-KYLIN-1826-add-external-hive-interface-project-table.patch, 0002-KYLIN-1826-add-and-modify-cube-source-job-for-extern.patch
>
>
> Currently, kylin only support one hive which should run by 'hive' command, However, when source data located in more than one hive we should deploy more kylin instance and more than one metastore. which is difficult to manager and may cause some conflict.
> I has been working on it Recently, In our cluster, there are some hive client(different metastore) which based on different hadoop cluster, I add a new hive source type which called 'external hive' in kylin 1.5.x
> Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. the main modification are:
> 1. add hive root directory in hive config file, external hive client exist in this directory. hive named by directory name.
> 2. add hive-site.xml file while loading hive tables.
> 3. store hive name into project, one project can only take one hive as source.
> 4. change and add some job to support job building.
> I will upload my patch if I finish all my tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)