You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by 吕卓然 <lv...@fosun.com> on 2017/04/21 10:20:10 UTC

A problem in cube size

Hi all,

Currently I am using Kylin 1.6.1 and I face a problem about cube size. In web GUI, my cube size is around 700MB with 800,000,000 records. However, when I try to use kylin.sh  org.apache.kylin.engine.mr.common.CubeStatsReader in terminal, the Total estimated size(MB) is said to be 8709.30625014305. I tried to check HBase and the cube size is indeed 700MB. I am really confusing about this. Please correct me if I made any mistake on this.

Thanks a lot!
Zhuoran

Re: A problem in cube size

Posted by ShaoFeng Shi <sh...@apache.org>.
Hi zhuoran,

The cube size 700M is the serialized/compressed size on storage; In query
time, Kylin need read and decompress dimension/measures into memory, the
size will be larger than origin one; As HyperLogLog is much bigger than a
normal measure (usually 1k to 64k each depends on the precision you
select), its scan limit is much smaller than normal;

You're correct the threshold is computed from budget and the size of each
line.

If the cube is well designed, the run time scan/aggregation should be
minor; To avoid such error/warning, you'd better analysis the query and
optimize cube design.

2017-04-24 15:55 GMT+08:00 zhuoranlyu <lv...@fosun.com>:

> BTW, if I set kylin.query.memory-budget-bytes to 6GB, there is no error
> anymore.
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/A-problem-in-cube-size-tp7737p7752.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>



-- 
Best regards,

Shaofeng Shi 史少锋

Re: A problem in cube size

Posted by zhuoranlyu <lv...@fosun.com>.
BTW, if I set kylin.query.memory-budget-bytes to 6GB, there is no error
anymore.

--
View this message in context: http://apache-kylin.74782.x6.nabble.com/A-problem-in-cube-size-tp7737p7752.html
Sent from the Apache Kylin mailing list archive at Nabble.com.

Re: A problem in cube size

Posted by zhuoranlyu <lv...@fosun.com>.
Hi Shaofeng,

Glad to hear from you. Thank you for your information. Another quick
question, I understand that the estimated size is calculated before the cube
built. However, when I tried to use count distinct (hyperloglog), it says
"The coprocessor thread stopped itself due to scan timeout or scan
threshold(check region server log), failing current query." I looked into
this error and it seems like that this error happens because it used too
much memory during query process.  I set "kylin.query.memory-budget-bytes"
to 3GB. I was wondering why this happens because the cube size is only
700MB.
I checked the log and found that "gtrecord.GTCubeStorageQueryBase:343 :
Memory budget is set to 49140 rows". I think this number is calculated by
using 3GB/eachRowSize(16KB). Is that correct? 

Thanks,
Zhuoran

--
View this message in context: http://apache-kylin.74782.x6.nabble.com/A-problem-in-cube-size-tp7737p7751.html
Sent from the Apache Kylin mailing list archive at Nabble.com.

Re: A problem in cube size

Posted by ShaoFeng Shi <sh...@apache.org>.
Hi zhuoran,

The number in CubeStatsReader is estimated data before cube be built, Kylin
need this to estimate the reduer nubmer, HBase region nubmer and others;

The cube size you see in Web GUI are real size (data from MR counter); it
may differ from the estimated size, but this is normal and you don't need
worry about it.

2017-04-24 10:15 GMT+08:00 ShaoFeng Shi <sh...@apache.org>:

> H
>
> 2017-04-21 18:20 GMT+08:00 吕卓然 <lv...@fosun.com>:
>
>> Hi all,
>>
>> Currently I am using Kylin 1.6.1 and I face a problem about cube size. In
>> web GUI, my cube size is around 700MB with 800,000,000 records. However,
>> when I try to use kylin.sh  org.apache.kylin.engine.mr.common.CubeStatsReader
>> in terminal, the Total estimated size(MB) is said to be 8709.30625014305. I
>> tried to check HBase and the cube size is indeed 700MB. I am really
>> confusing about this. Please correct me if I made any mistake on this.
>>
>> Thanks a lot!
>> Zhuoran
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Re: A problem in cube size

Posted by ShaoFeng Shi <sh...@apache.org>.
H

2017-04-21 18:20 GMT+08:00 吕卓然 <lv...@fosun.com>:

> Hi all,
>
> Currently I am using Kylin 1.6.1 and I face a problem about cube size. In
> web GUI, my cube size is around 700MB with 800,000,000 records. However,
> when I try to use kylin.sh  org.apache.kylin.engine.mr.common.CubeStatsReader
> in terminal, the Total estimated size(MB) is said to be 8709.30625014305. I
> tried to check HBase and the cube size is indeed 700MB. I am really
> confusing about this. Please correct me if I made any mistake on this.
>
> Thanks a lot!
> Zhuoran
>



-- 
Best regards,

Shaofeng Shi 史少锋