You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kylin.apache.org by huawang <10...@qq.com> on 2016/04/21 05:02:31 UTC

The performance of cube with large dimension table

hi,  I found that if a dimension of the cube has a large number of candidates, the query will be very slow. Is there any solution to this condition?

Re: The performance of cube with large dimension table

Posted by hongbin ma <ma...@apache.org>.

by "a large number of candidates" do you mean the dimension has very high
cardinality? I assume so.

The way to optimize high cardinality dimension cube depends on your query
pattern:

When queries involve filters on the high cardinality dimension, it's best
to put the dimension at the beginning of row key, so that filters can help
quickly filter unwanted cube rows.

If your query is filtering on other dimension and group by the high
cardinality dimension, the query can easily return massive amount of
results. The scenario used to be weakness of Kylin. However, recently we're
working on multiple improvements on similar scenarios.
https://issues.apache.org/jira/browse/KYLIN-1428.  After these improvements
released I'll summarise a blog to explain all of them.

You can also check
http://apache-kylin.74782.x6.nabble.com/How-to-use-kylin-with-high-cardinality-dimensions-td3661.html,
it might be inspiring to you.

On Thu, Apr 21, 2016 at 11:02 AM, huawang <10...@qq.com> wrote:

> hi,  I found that if a dimension of the cube has a large number of
> candidates, the query will be very slow. Is there any solution to this
> condition?

-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone