You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Mars J <xu...@gmail.com> on 2016/03/31 04:20:04 UTC

Discussion : About Time Consuming of Kylin Cubing

Hello All,

      I think we can have a discussion on the time consuming during cube
build steps.

      Our team test kylin's performance to check whether kylin is suited to
our requirements. Our environment is as follows,
                 Hadoop 2.7.2 (file replication is 2)
                 Hive 1.2
                 HBase 1.1.3
                 Kylin 1.3-HBase 1.1.3
                 OS CentOS 6.7

       We test kylin in 2 different ways seperately.
               1,  dimensions from 4 to 10(increased by 2)
               2,  cluster nodes from 3 to 5.

       We have some interesting results to discuss
               1,after extended nodes(No data balance), time consuming is
obviously cutted at 10 dims and 12 dims, but have little change at 4/6/8
dims.
               2,after extended nodes(data balance done), time consuming is
mostly the same to having no data balance, some times even more when dims
is bigger(e.g. 12 dim).
               3,Wether our test method is the right way ?


       For these problems, We want to analysis it from source code. Due to
my little experience in reading source code and the little comment in
source code,  so here the discussion.

       Starting from the source code engine-mr-steps......

       By the way, what's puprpose of the invertedindex in Kylin ?

Re: Discussion : About Time Consuming of Kylin Cubing

Posted by Li Yang <li...@apache.org>.
Please ignore the invertedindex. It's for experiment only and didn't
participate in build or query at the moment.

As to the scalability of MR job, it's more related to MR tuning techniques.
So basically you want enough mappers and reducers for parallelism, and
correct memory and VM parameters are for tasks. The expectation is scaling
linearly. If not such case, then analysis on the MR job should be taken.



On Thu, Mar 31, 2016 at 10:20 AM, Mars J <xu...@gmail.com> wrote:

> Hello All,
>
>       I think we can have a discussion on the time consuming during cube
> build steps.
>
>       Our team test kylin's performance to check whether kylin is suited
> to our requirements. Our environment is as follows,
>                  Hadoop 2.7.2 (file replication is 2)
>                  Hive 1.2
>                  HBase 1.1.3
>                  Kylin 1.3-HBase 1.1.3
>                  OS CentOS 6.7
>
>        We test kylin in 2 different ways seperately.
>                1,  dimensions from 4 to 10(increased by 2)
>                2,  cluster nodes from 3 to 5.
>
>        We have some interesting results to discuss
>                1,after extended nodes(No data balance), time consuming is
> obviously cutted at 10 dims and 12 dims, but have little change at 4/6/8
> dims.
>                2,after extended nodes(data balance done), time consuming
> is mostly the same to having no data balance, some times even more when
> dims is bigger(e.g. 12 dim).
>                3,Wether our test method is the right way ?
>
>
>        For these problems, We want to analysis it from source code. Due to
> my little experience in reading source code and the little comment in
> source code,  so here the discussion.
>
>        Starting from the source code engine-mr-steps......
>
>        By the way, what's puprpose of the invertedindex in Kylin ?
>