You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Yoni Amir <Yo...@niceactimize.com> on 2018/04/26 17:18:06 UTC

a couple of questions before starting a PoC

Hello,
Sorry for the newbie question, I am about to start a PoC with Kylin and I wanted to ask a few questions regarding performance and my use-case to understand if I am going in the right direction.

First, I was wondering if there is any publicly available information regarding Kylin's performance and benchmarks. I read a couple of the ebay articles, such as this: https://www.ebayinc.com/stories/blogs/tech/cube-planner-build-an-apache-kylin-olap-cube-efficiently-and-intelligently/. While this article boasts sub-second latency with high concurrency, it doesn't go into the details of the hardware that was used to achieve this.

Secondly, I was wondering how long can a dimension of a cube be? Meaning, how many distinct values can it have? As a basic example, let's say that I have a fact table with 'transaction's, and the transactions are related, among others, to 'account's. I have billions of transactions related to several millions of accounts. If I understand correctly, I can have the account as a dimension in a hierarchy if the accounts are grouped together, e.g. by regions. Then I can aggregate the transactions by region and so forth.
What about using the accounts as a "stand-alone" dimension, even if there are millions of values? The reason is that I want a query to fetch transactions data that are related to a specific account (together with a few other smaller dimensions, e.g. the transaction date and type) . I want to fetch that data, not to aggregate it. Does it make sense to do it with Kylin or is it a wrong use-case for this tool?

Thanks in advance,
Yoni

Confidentiality: This communication and any attachments are intended for the above-named persons only and may be confidential and/or legally privileged. Any opinions expressed in this communication are not necessarily those of NICE Actimize. If this communication has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender by e-mail immediately.  
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and attachments are free from any virus, we advise that in keeping with good computing practice the recipient should ensure they are actually virus free.

Re: a couple of questions before starting a PoC

Posted by ShaoFeng Shi <sh...@apache.org>.
Hello Yoni,

Many Kylin users did the benchmark in their scenarios. Kyligence Inc (the
commercial company behind of Kylin) has published the SSB and TPCH
benchmark for Apache Kylin, you check the related Github projects:

https://github.com/Kyligence/ssb-kylin
https://github.com/Kyligence/kylin-tpch

For more information, we can talk offline.

2018-04-27 1:18 GMT+08:00 Yoni Amir <Yo...@niceactimize.com>:

> Hello,
>
> Sorry for the newbie question, I am about to start a PoC with Kylin and I
> wanted to ask a few questions regarding performance and my use-case to
> understand if I am going in the right direction.
>
>
>
> First, I was wondering if there is any publicly available information
> regarding Kylin’s performance and benchmarks. I read a couple of the ebay
> articles, such as this: https://www.ebayinc.com/stories/blogs/tech/cube-
> planner-build-an-apache-kylin-olap-cube-efficiently-and-intelligently/.
> While this article boasts sub-second latency with high concurrency, it
> doesn’t go into the details of the hardware that was used to achieve this.
>
>
>
> Secondly, I was wondering how long can a dimension of a cube be? Meaning,
> how many distinct values can it have? As a basic example, let’s say that I
> have a fact table with ‘transaction’s, and the transactions are related,
> among others, to ‘account’s. I have billions of transactions related to
> several millions of accounts. If I understand correctly, I can have the
> account as a dimension in a hierarchy if the accounts are grouped together,
> e.g. by regions. Then I can aggregate the transactions by region and so
> forth.
>
> What about using the accounts as a “stand-alone” dimension, even if there
> are millions of values? The reason is that I want a query to fetch
> transactions data that are related to a specific account (together with a
> few other smaller dimensions, e.g. the transaction date and type) . I want
> to fetch that data, not to aggregate it. Does it make sense to do it with
> Kylin or is it a wrong use-case for this tool?
>
>
>
> Thanks in advance,
>
> Yoni
>
>
> Confidentiality: This communication and any attachments are intended for
> the above-named persons only and may be confidential and/or legally
> privileged. Any opinions expressed in this communication are not
> necessarily those of NICE Actimize. If this communication has come to you
> in error you must take no action based on it, nor must you copy or show it
> to anyone; please delete/destroy and inform the sender by e-mail
> immediately.
> Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
> Viruses: Although we have taken steps toward ensuring that this e-mail and
> attachments are free from any virus, we advise that in keeping with good
> computing practice the recipient should ensure they are actually virus free.
>



-- 
Best regards,

Shaofeng Shi 史少锋