You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by 吴钰彬 <wu...@baixing.com> on 2016/05/31 07:43:38 UTC

答复: how to extend the threshold for kylin query?(from baixing.com)

Hi, Kylin developers

We further investigate our query, and when I change my SQL query and remove the count(distinct) , it work fine , and from our query only return 231 records.
When add the count(distinct) measure should not increase the return records, but I don’t know how kylin work inside to calucate the count(distinct), it seem need scan more many record to return the result?
How to solve this problem?

Ps: right now we are using kylin version 1.3

[cid:image003.jpg@01D1BB53.323467F0]


B.R
Austin.Woo

发件人: 吴钰彬
发送时间: 2016年5月31日 10:35
收件人: 'dev@kylin.apache.org' <de...@kylin.apache.org>
抄送: 'dev-subscribe@kylin.apache.org' <de...@kylin.apache.org>; 李欣 <li...@baixing.com>; 亓庆国 <qi...@baixing.com>
主题: how to extend the threshold for kylin query?(from baixing.com)

Hi, Kylin developers.

This is Austin, DW team lead from Baixing.com. thanks for reading this mail.

Right now we are research on kylin solution adopt for our big data query engine to consume our website click and event data,

When we build some cube and try query from them, we face an issue as below.

[cid:image004.png@01D1BB52.ECF95290]


But when we check the configure in /conf/kylin.properties set as below

l  Kylin.query.scan.threshold=40000000

Can you help advise is there anything we missing here?

Looking forward your reply, and many thanks for your time.


B.R
Austin.Woo

Re: 答复: 答复: how to extend the threshold for kylin query?(from baixing.com)

Posted by Li Yang <li...@apache.org>.
Seems your query reads a lot of records out of HBase, which is not by
design a normal case. Normally data should be very aggregated and only a
few thousands of records are read. The query response time must be slow too
I guess. Your cube could use some optimization to match the use case.

For the properties:

kylin.query.scan.threshold=10000000
#default is 3M
- This controls how many records read from HBase at most. A safety valve to
keep Kylin from overloaded by bad queries.

kylin.query.mem.budget=64424509440
#default is 3G
- This controls how the memory cap each query can use in query server. The
memory is used for final aggregation before returning correct result.

kylin.query.cube.visit.timeout.times=3
#default is 1
- A timeout for waiting HBase scan to return.

Cheers
Yang


On Tue, Jun 7, 2016 at 10:10 AM, 吴钰彬 <wu...@baixing.com> wrote:

> Hi, Liyang
>
> Many thanks for reply this mail, by look into the source code, I have add
> below 3 parameter into the kylin.profile to somehow solve the problem
>
> Btw, can you explain more on below parameter propose?
>
> kylin.query.scan.threshold=10000000
> #default is 3M
> kylin.query.mem.budget=64424509440
> #default is 3G
> kylin.query.cube.visit.timeout.times=3
> #default is 1
>
>
>
> B.R
> Austin.Woo
>
> -----邮件原件-----
> 发件人: Li Yang [mailto:liyang@apache.org]
> 发送时间: 2016年6月7日 8:55
> 收件人: dev@kylin.apache.org
> 抄送: 李欣 <li...@baixing.com>; 亓庆国 <qi...@baixing.com>
> 主题: Re: 答复: how to extend the threshold for kylin query?(from baixing.com)
>
> Hi Austin,
>
> Note the image didn't get through mail list, thus was not displayed.
>
> So we didn't quite get you issue yet. Could you try describe again? You
> can use file hosting service to communicate attachments.
>
> Also it's always better to adopt the latest version. If you are early in
> pilot stage, the shift should be easy.
>
> Cheers
> Yang
>
> On Tue, May 31, 2016 at 3:43 PM, 吴钰彬 <wu...@baixing.com> wrote:
>
> > Hi, Kylin developers
> >
> >
> >
> > We further investigate our query, and when I change my SQL query and
> > remove the count(distinct) , it work fine , and from our query only
> > return
> > 231 records.
> >
> > When add the count(distinct) measure should not increase the return
> > records, but I don’t know how kylin work inside to calucate the
> > count(distinct), it seem need scan more many record to return the result?
> >
> > How to solve this problem?
> >
> >
> >
> > Ps: right now we are using kylin version 1.3
> >
> >
> >
> >
> >
> >
> >
> > B.R
> >
> > Austin.Woo
> >
> >
> >
> > *发件人:* 吴钰彬
> > *发送时间:* 2016年5月31日 10:35
> > *收件人:* 'dev@kylin.apache.org' <de...@kylin.apache.org>
> > *抄送:* 'dev-subscribe@kylin.apache.org'
> > <de...@kylin.apache.org>;
> > 李欣 <li...@baixing.com>; 亓庆国 <qi...@baixing.com>
> > *主题:* how to extend the threshold for kylin query?(from baixing.com)
> >
> >
> >
> > Hi, Kylin developers.
> >
> >
> >
> > This is Austin, DW team lead from Baixing.com. thanks for reading this
> > mail.
> >
> >
> >
> > Right now we are research on kylin solution adopt for our big data
> > query engine to consume our website click and event data,
> >
> >
> >
> > When we build some cube and try query from them, we face an issue as
> > below.
> >
> >
> >
> >
> >
> >
> >
> > But when we check the configure in /conf/kylin.properties set as below
> >
> > l  Kylin.query.scan.threshold=40000000
> >
> >
> >
> > Can you help advise is there anything we missing here?
> >
> >
> >
> > Looking forward your reply, and many thanks for your time.
> >
> >
> >
> >
> >
> > B.R
> >
> > Austin.Woo
> >
>

答复: 答复: how to extend the threshold for kylin query?(from baixing.com)

Posted by 吴钰彬 <wu...@baixing.com>.
Hi, Liyang

Many thanks for reply this mail, by look into the source code, I have add below 3 parameter into the kylin.profile to somehow solve the problem

Btw, can you explain more on below parameter propose?

kylin.query.scan.threshold=10000000
#default is 3M
kylin.query.mem.budget=64424509440
#default is 3G
kylin.query.cube.visit.timeout.times=3
#default is 1



B.R
Austin.Woo

-----邮件原件-----
发件人: Li Yang [mailto:liyang@apache.org] 
发送时间: 2016年6月7日 8:55
收件人: dev@kylin.apache.org
抄送: 李欣 <li...@baixing.com>; 亓庆国 <qi...@baixing.com>
主题: Re: 答复: how to extend the threshold for kylin query?(from baixing.com)

Hi Austin,

Note the image didn't get through mail list, thus was not displayed.

So we didn't quite get you issue yet. Could you try describe again? You can use file hosting service to communicate attachments.

Also it's always better to adopt the latest version. If you are early in pilot stage, the shift should be easy.

Cheers
Yang

On Tue, May 31, 2016 at 3:43 PM, 吴钰彬 <wu...@baixing.com> wrote:

> Hi, Kylin developers
>
>
>
> We further investigate our query, and when I change my SQL query and 
> remove the count(distinct) , it work fine , and from our query only 
> return
> 231 records.
>
> When add the count(distinct) measure should not increase the return 
> records, but I don’t know how kylin work inside to calucate the 
> count(distinct), it seem need scan more many record to return the result?
>
> How to solve this problem?
>
>
>
> Ps: right now we are using kylin version 1.3
>
>
>
>
>
>
>
> B.R
>
> Austin.Woo
>
>
>
> *发件人:* 吴钰彬
> *发送时间:* 2016年5月31日 10:35
> *收件人:* 'dev@kylin.apache.org' <de...@kylin.apache.org>
> *抄送:* 'dev-subscribe@kylin.apache.org' 
> <de...@kylin.apache.org>;
> 李欣 <li...@baixing.com>; 亓庆国 <qi...@baixing.com>
> *主题:* how to extend the threshold for kylin query?(from baixing.com)
>
>
>
> Hi, Kylin developers.
>
>
>
> This is Austin, DW team lead from Baixing.com. thanks for reading this 
> mail.
>
>
>
> Right now we are research on kylin solution adopt for our big data 
> query engine to consume our website click and event data,
>
>
>
> When we build some cube and try query from them, we face an issue as 
> below.
>
>
>
>
>
>
>
> But when we check the configure in /conf/kylin.properties set as below
>
> l  Kylin.query.scan.threshold=40000000
>
>
>
> Can you help advise is there anything we missing here?
>
>
>
> Looking forward your reply, and many thanks for your time.
>
>
>
>
>
> B.R
>
> Austin.Woo
>

Re: 答复: how to extend the threshold for kylin query?(from baixing.com)

Posted by Li Yang <li...@apache.org>.
Hi Austin,

Note the image didn't get through mail list, thus was not displayed.

So we didn't quite get you issue yet. Could you try describe again? You can
use file hosting service to communicate attachments.

Also it's always better to adopt the latest version. If you are early in
pilot stage, the shift should be easy.

Cheers
Yang

On Tue, May 31, 2016 at 3:43 PM, 吴钰彬 <wu...@baixing.com> wrote:

> Hi, Kylin developers
>
>
>
> We further investigate our query, and when I change my SQL query and
> remove the count(distinct) , it work fine , and from our query only return
> 231 records.
>
> When add the count(distinct) measure should not increase the return
> records, but I don’t know how kylin work inside to calucate the
> count(distinct), it seem need scan more many record to return the result?
>
> How to solve this problem?
>
>
>
> Ps: right now we are using kylin version 1.3
>
>
>
>
>
>
>
> B.R
>
> Austin.Woo
>
>
>
> *发件人:* 吴钰彬
> *发送时间:* 2016年5月31日 10:35
> *收件人:* 'dev@kylin.apache.org' <de...@kylin.apache.org>
> *抄送:* 'dev-subscribe@kylin.apache.org' <de...@kylin.apache.org>;
> 李欣 <li...@baixing.com>; 亓庆国 <qi...@baixing.com>
> *主题:* how to extend the threshold for kylin query?(from baixing.com)
>
>
>
> Hi, Kylin developers.
>
>
>
> This is Austin, DW team lead from Baixing.com. thanks for reading this
> mail.
>
>
>
> Right now we are research on kylin solution adopt for our big data query
> engine to consume our website click and event data,
>
>
>
> When we build some cube and try query from them, we face an issue as
> below.
>
>
>
>
>
>
>
> But when we check the configure in /conf/kylin.properties set as below
>
> l  Kylin.query.scan.threshold=40000000
>
>
>
> Can you help advise is there anything we missing here?
>
>
>
> Looking forward your reply, and many thanks for your time.
>
>
>
>
>
> B.R
>
> Austin.Woo
>