You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@kylin.apache.org by 163 <am...@163.com> on 2015/10/07 09:57:53 UTC

Re: 关于Kylin使用上的一些问题

Hi Luke

Thanks for you quick reply and happy holiday.

你说的只要一张事实表，我不是太理解，如果我有时间（小时，天，月）与空间 
（省、市）两个维度，这张事实表的结构应该是怎么样的？是不是也有两个 维度 
字段，类型是Integer，叫“时间”与“空间”，把不同的时间用不同的编码表示，如 
小时可以表示成1，天可以表示成2，月可以表示成3， 空间也是一样。

这里的效率主要想了解一下Kylin生成Cube的效率，而对于Query latency主要还是 
基于HBase的精确查询和简单的统计，一般都是比较快的。

谢谢

在 2015/9/29 23:52, Luke Han 写道:
> Hi there,
>     You case should be fine,which is typical OLAP scenario.
>     Please try to generate Star-Schema in Hive: one fact table and 
> some Lookup Tables (you do not need create 18 fact tables;-), and 
> create cube based on this data model, remember to leverage hierarchy 
> and derived dimension as much as possible to avoid huge cube to be 
> generated.
>
>      For performance, are you asking for cube processing or query latency?
>
>     Any issue, please feel free to ask here.
>     Thanks.
> Luke
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> 2015-09-29 23:10 GMT+08:00 163 <amuseme@163.com <ma...@163.com>>:
>
>     Hi all
>
>     有一些关于Kylin使用上的问题想请教一下，我们现在有这样一些维度，不
>     知道用 Kylin能不能实现，或者要做什么调整来实现，在效率上会不会 有
>     问题。
>
>     目前我们主要有时间（小时，日，月）、空间（省，市）、业务（一级业
>     务，二级 业务）、终端（用户终端品牌，型号）、用户维度（用户号
>     码）。其中用 户维度 的数据量比较大。
>
>     现在我想用Kylin来做OLAP，是不是要生成3x2x2x2x1=18张事实表，这样就
>     需要在 Kylin在建立18个Cube。目前一天输入 的数据量在20TB左右，记录
>     数在5千万条。 不知道Kylin的处理效率怎么样？
>
>     谢谢
>
>

Re: 关于Kylin使用上的一些问题

Posted by Luke Han <lu...@gmail.com>.

You are welcome, please feel free to raise further questions here, we are
happy to help:)

Thanks.


Best Regards!
---------------------

Luke Han

On Thu, Oct 8, 2015 at 9:33 AM, 163 <am...@163.com> wrote:

> Hi Luke
>
> Ok. I will check and learn the reference about your mentioned above. Then
> do some tests for my data using Kylin.
>
> Many thanks.
>
> 在 2015/10/7 19:42, Luke Han 写道:
>
> Here are some reference about how to build star-schema data warehouse:
> http://learndatamodeling.com/blog/designing-star-schema/
>
>
> http://www.ibmbigdatahub.com/blog/building-fast-data-warehouse-schemas-part-1
>
> For cube build performance, it highly depends on your data schema,
> cardinality, cluster computing resources, network traffic and other facts.
> The best way is to build sample cube with sample data and then estimate for
> entire duration about building process.
>
> Thanks.
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> 2015-10-07 15:57 GMT+08:00 163 <am...@163.com>:
>
>> Hi Luke
>>
>> Thanks for you quick reply and happy holiday.
>>
>> 你说的只要一张事实表，我不是太理解，如果我有时间（小时，天，月）与空间（省、市）两个维度，这张事实表的结构应该是怎么 样的？是不是也有两个
>> 维度字段，类型是Integer，叫“时间”与“空间”，把不同的时间用不同的编码表示，如小时可以表示成1，天可以表示成2，月可以表示成3， 空间也是一样。
>>
>> 这里的效率主要想了解一下Kylin生成Cube的效率，而对于Query
>> latency主要还是基于HBase的精确查询和简单的统计，一般都是比较快的。
>>
>> 谢谢
>>
>>
>> 在 2015/9/29 23:52, Luke Han 写道:
>>
>> Hi there,
>>     You case should be fine,which is typical OLAP scenario.
>>     Please try to generate Star-Schema in Hive: one fact table and some
>> Lookup Tables (you do not need create 18 fact tables;-), and create cube
>> based on this data model, remember to leverage hierarchy and derived
>> dimension as much as possible to avoid huge cube to be generated.
>>
>>      For performance, are you asking for cube processing or query latency?
>>
>>     Any issue, please feel free to ask here.
>>     Thanks.
>> Luke
>>
>>
>> Best Regards!
>> ---------------------
>>
>> Luke Han
>>
>> 2015-09-29 23:10 GMT+08:00 163 < <am...@163.com>:
>>
>>> Hi all
>>>
>>> 有一些关于Kylin使用上的问题想请教一下，我们现在有这样一些维度，不知道用 Kylin能不能实现，或者要做什么调整来实现，在效率上会不会
>>> 有问题。
>>>
>>> 目前我们主要有时间（小时，日，月）、空间（省，市）、业务（一级业务，二级 业务）、终端（用户终端品牌，型号）、用户维度（用户号码）。其中用
>>> 户维度 的数据量比较大。
>>>
>>> 现在我想用Kylin来做OLAP，是不是要生成3x2x2x2x1=18张事实表，这样就需要在 Kylin在建立18个Cube。目前一天输入
>>> 的数据量在20TB左右，记录数在5千万条。 不知道Kylin的处理效率怎么样？
>>>
>>> 谢谢
>>>
>>>
>>
>>
>
>

Re: 关于Kylin使用上的一些问题

Posted by 163 <am...@163.com>.

Hi Luke

Ok. I will check and learn the reference about your mentioned above. 
Then do some tests for my data using Kylin.

Many thanks.

在 2015/10/7 19:42, Luke Han 写道:
> Here are some reference about how to build star-schema data warehouse:
> http://learndatamodeling.com/blog/designing-star-schema/
>
> http://www.ibmbigdatahub.com/blog/building-fast-data-warehouse-schemas-part-1
>
> For cube build performance, it highly depends on your data schema, 
> cardinality, cluster computing resources, network traffic and other 
> facts. The best way is to build sample cube with sample data and then 
> estimate for entire duration about building process.
>
> Thanks.
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> 2015-10-07 15:57 GMT+08:00 163 <amuseme@163.com <ma...@163.com>>:
>
>     Hi Luke
>
>     Thanks for you quick reply and happy holiday.
>
>     你说的只要一张事实表，我不是太理解，如果我有时间（小时，天，月）与
>     空间（省、市）两个维度，这张事实表的结构应该是怎么 样的？是不是也
>     有两个 维度字段，类型是Integer，叫“时间”与“空间”，把不同的时间用不
>     同的编码表示，如小时可以表示成1，天可以表示成2，月可以表示成3， 空
>     间也是一样。
>
>     这里的效率主要想了解一下Kylin生成Cube的效率，而对于Query latency主
>     要还是基于HBase的精确查询和简单的统计，一般都是比较快的。
>
>     谢谢
>
>
>     在 2015/9/29 23:52, Luke Han 写道:
>>     Hi there,
>>         You case should be fine,which is typical OLAP scenario.
>>         Please try to generate Star-Schema in Hive: one fact table
>>     and some Lookup Tables (you do not need create 18 fact tables;-),
>>     and create cube based on this data model, remember to leverage
>>     hierarchy and derived dimension as much as possible to avoid huge
>>     cube to be generated.
>>
>>          For performance, are you asking for cube processing or query
>>     latency?
>>
>>         Any issue, please feel free to ask here.
>>         Thanks.
>>     Luke
>>
>>
>>     Best Regards!
>>     ---------------------
>>
>>     Luke Han
>>
>>     2015-09-29 23:10 GMT+08:00 163 <amuseme@163.com
>>     <ma...@163.com>>:
>>
>>         Hi all
>>
>>         有一些关于Kylin使用上的问题想请教一下，我们现在有这样一些维
>>         度，不知道用 Kylin能不能实现，或者要做什么调整来实现，在效率
>>         上会不会 有问题。
>>
>>         目前我们主要有时间（小时，日，月）、空间（省，市）、业务（一
>>         级业务，二级 业务）、终端（用户终端品牌，型号）、用户维度（用
>>         户号码）。其中用 户维度 的数据量比较大。
>>
>>         现在我想用Kylin来做OLAP，是不是要生成3x2x2x2x1=18张事实表，这
>>         样就需要在 Kylin在建立18个Cube。目前一天输入 的数据量在20TB左
>>         右，记录数在5千万条。 不知道Kylin的处理效率怎么样？
>>
>>         谢谢
>>
>>
>
>

Re: 关于Kylin使用上的一些问题

Posted by Luke Han <lu...@gmail.com>.

Here are some reference about how to build star-schema data warehouse:
http://learndatamodeling.com/blog/designing-star-schema/

http://www.ibmbigdatahub.com/blog/building-fast-data-warehouse-schemas-part-1

For cube build performance, it highly depends on your data schema,
cardinality, cluster computing resources, network traffic and other facts.
The best way is to build sample cube with sample data and then estimate for
entire duration about building process.

Thanks.


Best Regards!
---------------------

Luke Han

2015-10-07 15:57 GMT+08:00 163 <am...@163.com>:

> Hi Luke
>
> Thanks for you quick reply and happy holiday.
>
> 你说的只要一张事实表，我不是太理解，如果我有时间（小时，天，月）与空间（省、市）两个维度，这张事实表的结构应该是怎么样的？是不是也有两个
> 维度字段，类型是Integer，叫“时间”与“空间”，把不同的时间用不同的编码表示，如小时可以表示成1，天可以表示成2，月可以表示成3， 空间也是一样。
>
> 这里的效率主要想了解一下Kylin生成Cube的效率，而对于Query latency主要还是基于HBase的精确查询和简单的统计，一般都是比较快的。
>
> 谢谢
>
>
> 在 2015/9/29 23:52, Luke Han 写道:
>
> Hi there,
>     You case should be fine,which is typical OLAP scenario.
>     Please try to generate Star-Schema in Hive: one fact table and some
> Lookup Tables (you do not need create 18 fact tables;-), and create cube
> based on this data model, remember to leverage hierarchy and derived
> dimension as much as possible to avoid huge cube to be generated.
>
>      For performance, are you asking for cube processing or query latency?
>
>     Any issue, please feel free to ask here.
>     Thanks.
> Luke
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> 2015-09-29 23:10 GMT+08:00 163 <am...@163.com>:
>
>> Hi all
>>
>> 有一些关于Kylin使用上的问题想请教一下，我们现在有这样一些维度，不知道用 Kylin能不能实现，或者要做什么调整来实现，在效率上会不会 有问题。
>>
>> 目前我们主要有时间（小时，日，月）、空间（省，市）、业务（一级业务，二级 业务）、终端（用户终端品牌，型号）、用户维度（用户号码）。其中用 户维度
>> 的数据量比较大。
>>
>> 现在我想用Kylin来做OLAP，是不是要生成3x2x2x2x1=18张事实表，这样就需要在 Kylin在建立18个Cube。目前一天输入
>> 的数据量在20TB左右，记录数在5千万条。 不知道Kylin的处理效率怎么样？
>>
>> 谢谢
>>
>>
>
>