You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by Yiming Liu <li...@gmail.com> on 2016/07/10 11:35:09 UTC

Data source management(Hive and Kafka)

Hi Kylin developers,

Currently, Kylin will load the hive configuration from the HIVE_DEPENDENCY
classpath. That means Kylin supports only one hive source. KYLIN-1826 aims
to support multiple hive data sources, but the design is a little
complicated by introducing the EXTERNAL HIVE concepts.

The data source management becomes more tricky when multiple hive clusters
and multiple kafka clusters are needed. I just rise the question today
without specific solution yet, all suggestions are welcomed. I think it
could be very useful if Kylin could support data source management.

It should have the following features:
1. Defines Hive Cluster/Kafka Cluster as the data source factory under
Project.
2. One Project could have more than one Hive/Kafka/SparkSQL Cluster
definitions.
3. When "Load Table/Streaming", Kylin could load the TABLE(Hive) and
Topic(Kafka) definition from Hive/Kafka directly.
4. The following model design and cube build are the same as before still.

I know it's not a critical requirement, but maybe someone wants it too.

Thank you a lot.

-- 
With Warm regards

Yiming Liu (刘一鸣)

Re: Data source management(Hive and Kafka)

Posted by Luke Han <lu...@gmail.com>.
Thanks Yiming


Best Regards!
---------------------

Luke Han

On Tue, Jul 12, 2016 at 9:18 AM, Yiming Liu <li...@gmail.com> wrote:

> Thanks Luke. https://issues.apache.org/jira/browse/KYLIN-1873 is filed for
> this.
>
> 2016-07-12 0:02 GMT+08:00 Luke Han <lu...@gmail.com>:
>
> > Data Source is one feature should bring into current design as
> traditional
> > BI applications.
> >
> > Would you mind to open JIRA for this?
> >
> > Thanks.
> >
> >
> >
> > Best Regards!
> > ---------------------
> >
> > Luke Han
> >
> > On Mon, Jul 11, 2016 at 3:01 AM, Li Yang <li...@apache.org> wrote:
> >
> > > Setting classpath for individual datasource is a good idea.
> > >
> > > However convenience for 80% cases must not be comprised. E.g. a
> > meaningful
> > > default hive datasource is still very important in my opinion.
> > >
> > >
> > > Yang
> > >
> > > On Sun, Jul 10, 2016 at 7:35 PM, Yiming Liu <li...@gmail.com>
> > > wrote:
> > >
> > > > Hi Kylin developers,
> > > >
> > > > Currently, Kylin will load the hive configuration from the
> > > HIVE_DEPENDENCY
> > > > classpath. That means Kylin supports only one hive source. KYLIN-1826
> > > aims
> > > > to support multiple hive data sources, but the design is a little
> > > > complicated by introducing the EXTERNAL HIVE concepts.
> > > >
> > > > The data source management becomes more tricky when multiple hive
> > > clusters
> > > > and multiple kafka clusters are needed. I just rise the question
> today
> > > > without specific solution yet, all suggestions are welcomed. I think
> it
> > > > could be very useful if Kylin could support data source management.
> > > >
> > > > It should have the following features:
> > > > 1. Defines Hive Cluster/Kafka Cluster as the data source factory
> under
> > > > Project.
> > > > 2. One Project could have more than one Hive/Kafka/SparkSQL Cluster
> > > > definitions.
> > > > 3. When "Load Table/Streaming", Kylin could load the TABLE(Hive) and
> > > > Topic(Kafka) definition from Hive/Kafka directly.
> > > > 4. The following model design and cube build are the same as before
> > > still.
> > > >
> > > > I know it's not a critical requirement, but maybe someone wants it
> too.
> > > >
> > > > Thank you a lot.
> > > >
> > > > --
> > > > With Warm regards
> > > >
> > > > Yiming Liu (刘一鸣)
> > > >
> > >
> >
>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>

Re: Data source management(Hive and Kafka)

Posted by Yiming Liu <li...@gmail.com>.
Thanks Luke. https://issues.apache.org/jira/browse/KYLIN-1873 is filed for
this.

2016-07-12 0:02 GMT+08:00 Luke Han <lu...@gmail.com>:

> Data Source is one feature should bring into current design as traditional
> BI applications.
>
> Would you mind to open JIRA for this?
>
> Thanks.
>
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Mon, Jul 11, 2016 at 3:01 AM, Li Yang <li...@apache.org> wrote:
>
> > Setting classpath for individual datasource is a good idea.
> >
> > However convenience for 80% cases must not be comprised. E.g. a
> meaningful
> > default hive datasource is still very important in my opinion.
> >
> >
> > Yang
> >
> > On Sun, Jul 10, 2016 at 7:35 PM, Yiming Liu <li...@gmail.com>
> > wrote:
> >
> > > Hi Kylin developers,
> > >
> > > Currently, Kylin will load the hive configuration from the
> > HIVE_DEPENDENCY
> > > classpath. That means Kylin supports only one hive source. KYLIN-1826
> > aims
> > > to support multiple hive data sources, but the design is a little
> > > complicated by introducing the EXTERNAL HIVE concepts.
> > >
> > > The data source management becomes more tricky when multiple hive
> > clusters
> > > and multiple kafka clusters are needed. I just rise the question today
> > > without specific solution yet, all suggestions are welcomed. I think it
> > > could be very useful if Kylin could support data source management.
> > >
> > > It should have the following features:
> > > 1. Defines Hive Cluster/Kafka Cluster as the data source factory under
> > > Project.
> > > 2. One Project could have more than one Hive/Kafka/SparkSQL Cluster
> > > definitions.
> > > 3. When "Load Table/Streaming", Kylin could load the TABLE(Hive) and
> > > Topic(Kafka) definition from Hive/Kafka directly.
> > > 4. The following model design and cube build are the same as before
> > still.
> > >
> > > I know it's not a critical requirement, but maybe someone wants it too.
> > >
> > > Thank you a lot.
> > >
> > > --
> > > With Warm regards
> > >
> > > Yiming Liu (刘一鸣)
> > >
> >
>



-- 
With Warm regards

Yiming Liu (刘一鸣)

Re: Data source management(Hive and Kafka)

Posted by Luke Han <lu...@gmail.com>.
Data Source is one feature should bring into current design as traditional
BI applications.

Would you mind to open JIRA for this?

Thanks.



Best Regards!
---------------------

Luke Han

On Mon, Jul 11, 2016 at 3:01 AM, Li Yang <li...@apache.org> wrote:

> Setting classpath for individual datasource is a good idea.
>
> However convenience for 80% cases must not be comprised. E.g. a meaningful
> default hive datasource is still very important in my opinion.
>
>
> Yang
>
> On Sun, Jul 10, 2016 at 7:35 PM, Yiming Liu <li...@gmail.com>
> wrote:
>
> > Hi Kylin developers,
> >
> > Currently, Kylin will load the hive configuration from the
> HIVE_DEPENDENCY
> > classpath. That means Kylin supports only one hive source. KYLIN-1826
> aims
> > to support multiple hive data sources, but the design is a little
> > complicated by introducing the EXTERNAL HIVE concepts.
> >
> > The data source management becomes more tricky when multiple hive
> clusters
> > and multiple kafka clusters are needed. I just rise the question today
> > without specific solution yet, all suggestions are welcomed. I think it
> > could be very useful if Kylin could support data source management.
> >
> > It should have the following features:
> > 1. Defines Hive Cluster/Kafka Cluster as the data source factory under
> > Project.
> > 2. One Project could have more than one Hive/Kafka/SparkSQL Cluster
> > definitions.
> > 3. When "Load Table/Streaming", Kylin could load the TABLE(Hive) and
> > Topic(Kafka) definition from Hive/Kafka directly.
> > 4. The following model design and cube build are the same as before
> still.
> >
> > I know it's not a critical requirement, but maybe someone wants it too.
> >
> > Thank you a lot.
> >
> > --
> > With Warm regards
> >
> > Yiming Liu (刘一鸣)
> >
>

Re: Data source management(Hive and Kafka)

Posted by Li Yang <li...@apache.org>.
Setting classpath for individual datasource is a good idea.

However convenience for 80% cases must not be comprised. E.g. a meaningful
default hive datasource is still very important in my opinion.


Yang

On Sun, Jul 10, 2016 at 7:35 PM, Yiming Liu <li...@gmail.com> wrote:

> Hi Kylin developers,
>
> Currently, Kylin will load the hive configuration from the HIVE_DEPENDENCY
> classpath. That means Kylin supports only one hive source. KYLIN-1826 aims
> to support multiple hive data sources, but the design is a little
> complicated by introducing the EXTERNAL HIVE concepts.
>
> The data source management becomes more tricky when multiple hive clusters
> and multiple kafka clusters are needed. I just rise the question today
> without specific solution yet, all suggestions are welcomed. I think it
> could be very useful if Kylin could support data source management.
>
> It should have the following features:
> 1. Defines Hive Cluster/Kafka Cluster as the data source factory under
> Project.
> 2. One Project could have more than one Hive/Kafka/SparkSQL Cluster
> definitions.
> 3. When "Load Table/Streaming", Kylin could load the TABLE(Hive) and
> Topic(Kafka) definition from Hive/Kafka directly.
> 4. The following model design and cube build are the same as before still.
>
> I know it's not a critical requirement, but maybe someone wants it too.
>
> Thank you a lot.
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>