You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Yaqian Zhang <Ya...@126.com> on 2019/09/01 08:02:45 UTC

Re: Details about “Extract Fact Table Distinct Columns and Build Dimension Dictionary”

Hi Johnson:
	In this step, kylin calculates the cardinality of the dimension column and builds a dictionary for the dimension column.
	In order to save space and improve efficiency, kylin encodes and compresses dimensions, and adopts dictionary coding technology by default. Dictionary encoding is to construct a mapping table from string to int for all the values under the dimension, and then serialize the dictionary to save, thus greatly reducing the size of the storage. The dictionary is in order. If string A is bigger than string B, the value of encoding A will be bigger than that of encoding B. This will enable the encoding value to be used in Hbase queries without decoding.
	However, since using dictionary encoding requires maintaining a mapping table, it is necessary to consider the dimension cardinality, which refers to the number of all the different values in the dimension column. If the cardinality of the dimension is very high, the dictionary will be very large, so it is not suitable for loading into memory. In this case, other encoding methods should be chosen. The maximum allowable limit for kylin dictionary coding is 5 million by default, which is configured by parameter kylin.dictionary.max.cardinality.

> On Aug 30, 2019, at 8:29 PM, Johnson <it...@163.com> wrote:
> 
> Hi,all:
> I want to know the details of these two steps:Extract Fact Table Distinct Columns and Build Dimension Dictionary。What do these steps do and how to do?
> looking forward to your reply
> 
> ----------------------
> Best wishes,
> Johnson
> 
> 
> 
> 
>  


Re: Details about “Extract Fact Table Distinct Columns and Build Dimension Dictionary”

Posted by ShaoFeng Shi <sh...@apache.org>.
This article can help, to some extend:

https://kylin.apache.org/docs/howto/howto_optimize_build.html

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofengshi@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




ITzhangqiang <IT...@163.com> 于2019年9月2日周一 上午10:23写道:

> Hi Yaqian:
>
>        Thanks fro your reply!
>
> I know what you said,but I want to know more detail.
>
>
>
> 发送自 Windows 10 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>应用
>
>
>
> *发件人: *Yaqian Zhang <Ya...@126.com>
> *发送时间: *2019年9月1日 16:03
> *收件人: *user@kylin.apache.org
> *主题: *Re: Details about “Extract Fact Table Distinct Columns and Build
> Dimension Dictionary”
>
>
>
> Hi Johnson:
>
>        In this step, kylin calculates the cardinality of the dimension
> column and builds a dictionary for the dimension column.
>
>        In order to save space and improve efficiency, kylin encodes and
> compresses dimensions, and adopts dictionary coding technology by default.
> Dictionary encoding is to construct a mapping table from string to int for
> all the values under the dimension, and then serialize the dictionary to
> save, thus greatly reducing the size of the storage. The dictionary is in
> order. If string A is bigger than string B, the value of encoding A will be
> bigger than that of encoding B. This will enable the encoding value to be
> used in Hbase queries without decoding.
>
>        However, since using dictionary encoding requires maintaining a
> mapping table, it is necessary to consider the dimension cardinality, which
> refers to the number of all the different values in the dimension column.
> If the cardinality of the dimension is very high, the dictionary will be
> very large, so it is not suitable for loading into memory. In this case,
> other encoding methods should be chosen. The maximum allowable limit for
> kylin dictionary coding is 5 million by default, which is configured by
> parameter kylin.dictionary.max.cardinality.
>
>
>
> On Aug 30, 2019, at 8:29 PM, Johnson <it...@163.com> wrote:
>
>
>
> Hi,all:
>
> ·         I want to know the details of these two steps:Extract Fact
> Table Distinct Columns and Build Dimension Dictionary。What do these steps
> do and how to do?
>
> ·         looking forward to your reply
>
>
>
> ----------------------
>
> Best wishes,
>
> Johnson
>
>
>
>
>
>
>
>
>
>
>
>
>
>

答复: Details about “Extract Fact Table Distinct Columns and Build Dimension Dictionary”

Posted by ITzhangqiang <IT...@163.com>.
Hi Yaqian:
	Thanks fro your reply!
  I know what you said,but I want to know more detail.

发送自 Windows 10 版邮件应用

发件人: Yaqian Zhang
发送时间: 2019年9月1日 16:03
收件人: user@kylin.apache.org
主题: Re: Details about “Extract Fact Table Distinct Columns and Build Dimension Dictionary”

Hi Johnson:
	In this step, kylin calculates the cardinality of the dimension column and builds a dictionary for the dimension column.
	In order to save space and improve efficiency, kylin encodes and compresses dimensions, and adopts dictionary coding technology by default. Dictionary encoding is to construct a mapping table from string to int for all the values under the dimension, and then serialize the dictionary to save, thus greatly reducing the size of the storage. The dictionary is in order. If string A is bigger than string B, the value of encoding A will be bigger than that of encoding B. This will enable the encoding value to be used in Hbase queries without decoding.
	However, since using dictionary encoding requires maintaining a mapping table, it is necessary to consider the dimension cardinality, which refers to the number of all the different values in the dimension column. If the cardinality of the dimension is very high, the dictionary will be very large, so it is not suitable for loading into memory. In this case, other encoding methods should be chosen. The maximum allowable limit for kylin dictionary coding is 5 million by default, which is configured by parameter kylin.dictionary.max.cardinality.


On Aug 30, 2019, at 8:29 PM, Johnson <it...@163.com> wrote:

Hi,all:
• I want to know the details of these two steps:Extract Fact Table Distinct Columns and Build Dimension Dictionary。What do these steps do and how to do?
• looking forward to your reply

----------------------
Best wishes,
Johnson