You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by zhong zhang <zz...@gmail.com> on 2016/01/08 01:54:17 UTC

ultra high cardinality

Hi All,

The cube we are trying to build include several ultra high-cardinality
columns.
They are over 50 million cardinality.
>From this link
<http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin?next_slideshow=5>,
it says:
Avoid UHC as much as possible.
- if it's used as indicator, then put the indicator in cube.
- try categorize values or derive features from the UHC rather than putting
the original value in cube.

I'm sorry that I'm a newbie to the Kylin and Cube things. Can anyone
give a little bit more detailed explanation for the above two suggestions?

Best regards,
Zhong

Re: ultra high cardinality

Posted by Li Yang <li...@apache.org>.
When carefully designed dimensions and aggregation group, Kylin can work
with ultra high-cardinality columns.  That requires understanding of the
analysis scenario first.

As to the specific lines, it's basically saying think if the UHC can be
replaced with a low cardinality column. E.g. some data set has a URL column
which is UHC, and in real analysis which really useful is only the domain
name, so in such case, ETL can pre-process the URL into domain names, then
cube doesn't have to deal with the UHC directly.

On Fri, Jan 8, 2016 at 8:54 AM, zhong zhang <zz...@gmail.com> wrote:

> Hi All,
>
> The cube we are trying to build include several ultra high-cardinality
> columns.
> They are over 50 million cardinality.
> From this link
> <http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin?next_slideshow=5>,
> it says:
> Avoid UHC as much as possible.
> - if it's used as indicator, then put the indicator in cube.
> - try categorize values or derive features from the UHC rather than
> putting the original value in cube.
>
> I'm sorry that I'm a newbie to the Kylin and Cube things. Can anyone
> give a little bit more detailed explanation for the above two suggestions?
>
> Best regards,
> Zhong
>