You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Shaofeng SHI (JIRA)" <ji...@apache.org> on 2016/11/09 08:46:58 UTC

[jira] [Comment Edited] (KYLIN-1826) kylin support more than one hive based on different hadoop claster

    [ https://issues.apache.org/jira/browse/KYLIN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15650353#comment-15650353 ] 

Shaofeng SHI edited comment on KYLIN-1826 at 11/9/16 8:46 AM:
--------------------------------------------------------------

Hi Yu, I got many conflicts when merging the patchs to latest master branch...; besides, some design also makes me unconfident (for example the 'external hive' isn't another source type like Kafka).  today I discussed this with Yang, we want to implement your requirement in a more extensible way:

1. Add a config in KylinConfig "kylin.hive.home", which points to a Hive installation folder in local; (if empty, using the default)
2. As you know Kylin allows overwrite the KylinConfig at cube level today; so user can specify different Hive home at different Cubes; And we can do the same for project, then each project can bind to a Hive installation also; The overwrited properties in project can be inheritated on Cube in the cube wizard;
3.  Create a HiveClient by passing the KylinConfig instance, it will check the config and then know how to load the metadata & execute the CMD;

I made an initial commit in branch KYLIN-1826-2, which has extended the CLIHiveClient and add the copy step in HiveMRInput, you can checkout and have a look; With this way, the impact will be small and be easier to understand & maintain.   

I don't have time to merge all the changes there, if you think it is a good idea, please continue there. For the REST API (TableController), please keep the project name parameter as optional, so the as-is user doesn't need change their client side codes (many users have integrated Kylin with their apps, we need keep the API stable as much as possible).

Thanks.


was (Author: shaofengshi):
Hi Yu, I got many conflicts when merging the patchs to latest master branch...; besides, some design also makes me unconfident (for example the 'external hive' isn't another source type like Kafka).  today I discussed this with Yang, we want to implement your requirement in a more extensible way:

1. Add a config in KylinConfig "kylin.have.home", which points to a Hive installation folder in local; (if empty, using the default)
2. As you know Kylin allows overwrite the KylinConfig at cube level today; so user can specify different Hive home at different Cubes; And we can do the same for project, then each project can bind to a Hive installation also; The overwrited properties in project can be inheritated on Cube in the cube wizard;
3.  Create a HiveClient by passing the KylinConfig instance, it will check the config and then know how to load the metadata & execute the CMD;

I made an initial commit in branch KYLIN-1826-2, which has extended the CLIHiveClient and add the copy step in HiveMRInput, you can checkout and have a look; With this way, the impact will be small and be easier to understand & maintain.   

I don't have time to merge all the changes there, if you think it is a good idea, please continue there. For the REST API (TableController), please keep the project name parameter as optional, so the as-is user doesn't need change their client side codes (many users have integrated Kylin with their apps, we need keep the API stable as much as possible).

Thanks.

> kylin support more than one hive based on different hadoop claster
> ------------------------------------------------------------------
>
>                 Key: KYLIN-1826
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1826
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Environment 
>    Affects Versions: v1.5.2
>            Reporter: fengYu
>            Assignee: fengYu
>         Attachments: 0001-KYLIN-1826-add-external-hive-interface-project-table.patch, 0002-KYLIN-1826-add-and-modify-cube-source-job-for-extern.patch, 0003-KYLIN-1826-unify-hive-concept-forbid-modify-hive-nam.patch
>
>
> Currently, kylin only support one hive which should run by 'hive' command, However, when source data located in more than one hive we should deploy more kylin instance and more than one metastore. which is difficult to manager and may cause some conflict.
> I has been working on it Recently, In our cluster, there are some hive client(different metastore) which based on different hadoop cluster, I add a new hive source type which called 'external hive' in kylin 1.5.x
> Thanks to kylin Plug-in architecture in 2.x, which make this work easiler. the main modification are:
> 1. add hive root directory in hive config file, external hive client exist in this directory. hive named by directory name.
> 2. add hive-site.xml file while loading hive tables.
> 3. store hive name into project, one project can only take one hive as source.
> 4. change and add some job to support job building.
> I will upload my patch if I finish all my tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)