You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kylin.apache.org by Vadim Semenov <_...@databuryat.com> on 2015/08/21 15:15:54 UTC

Queries with filters and coprocessors high cpu usage

Hi,

I've been experimenting with Kylin for some time, and I ran into a difficult problem:

I have a cube (total size ~150GB, ~1.1B source records) with the following dimensions and cardinalities (as they defined in the aggregation group):
date 10
dim0 250 STRING
dim1 60 STRING
dim2 3000 INT
dim3 7000 INT
dim4 30 INT
dim5 20 INT
dim6 30 INT
dim7 10 INT

When I execute queries like this (accept partial = false):

SELECT dim1, SUM(m0), SUM(m1), … FROM fact
WHERE date BETWEEN … AND
dim0 IN (10 values) AND
dim2 IN (10 values)
GROUP BY dim1 LIMIT 10;

SELECT dim7, SUM(m0), SUM(m1), … FROM fact
WHERE date BETWEEN … AND
dim0 IN (10 values) AND
dim2 IN (10 values) AND
dim3 IN (10 values)
GROUP BY dim7 LIMIT 10;

SELECT dim7, SUM(m0), SUM(m1), … FROM fact
WHERE date BETWEEN … AND
dim0 IN (10 values) AND
dim2 IN (10 values) AND
dim3 IN (10 values) AND
dim4 IN (10 values) AND
dim6 IN (10 values)
GROUP BY dim7 LIMIT 10;


Coprocessors consume 100% CPU on some of the region servers and never finish.
I tried to profile a region server and got the following:
http://i.imgur.com/yrKnDc1.png

I tried to disable fuzzy key feature using backdoorToggles, and got much better results: coprocessors don't get stuck anymore and I always get response. Though response time suffered a bit but overall responsiveness is much better.

Query times I get for the queries (accept partial = false):
1. 5-10 seconds
2. 30-100 seconds
3. 180-300 seconds

So my questions are:
1. Are there ways to improve query time for this kind of queries?
2. Why coprocessors consume 100% cpu and never finish with enabled fuzzy key?

Thanks.

Re: 答复: 答复: Queries with filters and coprocessors high cpu usage

Posted by hongbin ma <ma...@apache.org>.

If you would agree the problem is covered in the issue
https://issues.apache.org/jira/browse/KYLIN-740,

I'm working on it, please add more requirements on this to that ticket.

On Fri, Aug 21, 2015 at 11:54 PM, vipul jhawar <vi...@gmail.com>
wrote:

> yup we thought about that option but that limits us at the case when we
> have to use all the dimensions as that will be the worst case. We had cubes
> with 4 - 5 dimension groups before. Move to kyling and multi dimensional
> dashboard was built so that we could leverage 9-12 dimensions in a single
> cube.
>
> Thanks
>
> On Fri, Aug 21, 2015 at 8:15 PM, Huang Hua <hu...@mininglamp.com>
> wrote:
>
> > Adding more machines should help to some extent.
> >
> > Before considering adding more machines, is it possible to divide the
> > query dimensions into different small groups according to business
> > requirement?
> > If so, you can build multiple cubes where each of them corresponds to a
> > group of certain dimensions, and Kylin itself does good job on
> auto-routing
> > queries to one of those cubes by a best-matching algorithm. And more
> > importantly, with reduced number of dimensions, you can probably still
> > maintain a very responsive dashboard.
> >
> > But if the dashboard application is meant to allow queries with arbitrary
> > combinations of dimensions, then the above approach won't work.
> >
> > > -----邮件原件-----
> > > 发件人: dev-return-3769-
> > > huanghua=mininglamp.com@kylin.incubator.apache.org [mailto:dev-return-
> > > 3769-huanghua=mininglamp.com@kylin.incubator.apache.org] 代表 vipul
> > > jhawar
> > > 发送时间: 2015年8月21日 22:10
> > > 收件人: dev@kylin.incubator.apache.org
> > > 主题: Re: 答复: Queries with filters and coprocessors high cpu usage
> > >
> > > Just to add to vadim's query, we want to leverage kylin cube with many
> > > dimensions for a very responsive dashboard, which allows using
> selecting
> > > different values among the dimensions as filters which get set in the
> IN
> > > clause, so we would not want to compromise with this feature. What
> would
> > > be the possible strategies to overcome some of these issues. Could this
> > be
> > > solved with scaling horizontally or throwing more hardware in the
> > cluster ?
> > > Any tips on the sizing would be appreciated.
> > >
> > > On Fri, Aug 21, 2015 at 7:33 PM, Huang Hua <hu...@mininglamp.com>
> > > wrote:
> > >
> > > > I suspect that the reason is most likely related to the "IN"
> > statements.
> > > >
> > > > As far as I know, the current scan algorithm for the "IN" statements
> > > > is to use the minimal value and the maximum value from the "IN" value
> > > > list to come up with the hbase scan range. In the worst case, such
> > > > range can be very big. For example, let's say the "IN" statement
> looks
> > > > like "in (1, 2, 3, 1000000)" and then kylin will scan the range [1,
> > > > 1000000] to get back the results which is sometimes equivalent to a
> > full
> > > table scan.
> > > >
> > > > And I am guessing that you were generating the queries randomly,
> which
> > > > would probably produce "IN" statements with big ranges and gives not
> > > > so-well performance.
> > > >
> > > > > -----邮件原件-----
> > > > > 发件人: dev-return-3767-
> > > > > huanghua=mininglamp.com@kylin.incubator.apache.org
> > > > > [mailto:dev-return-
> > > > > 3767-huanghua=mininglamp.com@kylin.incubator.apache.org] 代表
> > > Vadim
> > > > > Semenov
> > > > > 发送时间: 2015年8月21日 21:16
> > > > > 收件人: dev@kylin.incubator.apache.org
> > > > > 主题: Queries with filters and coprocessors high cpu usage
> > > > >
> > > > > Hi,
> > > > >
> > > > > I've been experimenting with Kylin for some time, and I ran into a
> > > > difficult
> > > > > problem:
> > > > >
> > > > > I have a cube (total size ~150GB, ~1.1B source records) with the
> > > > following
> > > > > dimensions and cardinalities (as they defined in the aggregation
> > group):
> > > > > date 10
> > > > > dim0 250 STRING
> > > > > dim1 60 STRING
> > > > > dim2 3000 INT
> > > > > dim3 7000 INT
> > > > > dim4 30 INT
> > > > > dim5 20 INT
> > > > > dim6 30 INT
> > > > > dim7 10 INT
> > > > >
> > > > > When I execute queries like this (accept partial = false):
> > > > >
> > > > > SELECT dim1, SUM(m0), SUM(m1), … FROM fact WHERE date BETWEEN …
> > > AND
> > > > > dim0 IN (10 values) AND
> > > > > dim2 IN (10 values)
> > > > > GROUP BY dim1 LIMIT 10;
> > > > >
> > > > > SELECT dim7, SUM(m0), SUM(m1), … FROM fact WHERE date BETWEEN …
> > > AND
> > > > > dim0 IN (10 values) AND
> > > > > dim2 IN (10 values) AND
> > > > > dim3 IN (10 values)
> > > > > GROUP BY dim7 LIMIT 10;
> > > > >
> > > > > SELECT dim7, SUM(m0), SUM(m1), … FROM fact WHERE date BETWEEN …
> > > AND
> > > > > dim0 IN (10 values) AND
> > > > > dim2 IN (10 values) AND
> > > > > dim3 IN (10 values) AND
> > > > > dim4 IN (10 values) AND
> > > > > dim6 IN (10 values)
> > > > > GROUP BY dim7 LIMIT 10;
> > > > >
> > > > >
> > > > > Coprocessors consume 100% CPU on some of the region servers and
> > > never
> > > > > finish.
> > > > > I tried to profile a region server and got the following:
> > > > > http://i.imgur.com/yrKnDc1.png
> > > > >
> > > > > I tried to disable fuzzy key feature using backdoorToggles, and got
> > much
> > > > > better results: coprocessors don't get stuck anymore and I always
> get
> > > > > response. Though response time suffered a bit but overall
> > > responsiveness
> > > > is
> > > > > much better.
> > > > >
> > > > > Query times I get for the queries (accept partial = false):
> > > > > 1. 5-10 seconds
> > > > > 2. 30-100 seconds
> > > > > 3. 180-300 seconds
> > > > >
> > > > > So my questions are:
> > > > > 1. Are there ways to improve query time for this kind of queries?
> > > > > 2. Why coprocessors consume 100% cpu and never finish with enabled
> > > fuzzy
> > > > > key?
> > > > >
> > > > > Thanks.
> > > >
> > > >
> > > >
> >
> >
> >
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: 答复: 答复: Queries with filters and coprocessors high cpu usage

Posted by vipul jhawar <vi...@gmail.com>.

yup we thought about that option but that limits us at the case when we
have to use all the dimensions as that will be the worst case. We had cubes
with 4 - 5 dimension groups before. Move to kyling and multi dimensional
dashboard was built so that we could leverage 9-12 dimensions in a single
cube.

Thanks

On Fri, Aug 21, 2015 at 8:15 PM, Huang Hua <hu...@mininglamp.com> wrote:

> Adding more machines should help to some extent.
>
> Before considering adding more machines, is it possible to divide the
> query dimensions into different small groups according to business
> requirement?
> If so, you can build multiple cubes where each of them corresponds to a
> group of certain dimensions, and Kylin itself does good job on auto-routing
> queries to one of those cubes by a best-matching algorithm. And more
> importantly, with reduced number of dimensions, you can probably still
> maintain a very responsive dashboard.
>
> But if the dashboard application is meant to allow queries with arbitrary
> combinations of dimensions, then the above approach won't work.
>
> > -----邮件原件-----
> > 发件人: dev-return-3769-
> > huanghua=mininglamp.com@kylin.incubator.apache.org [mailto:dev-return-
> > 3769-huanghua=mininglamp.com@kylin.incubator.apache.org] 代表 vipul
> > jhawar
> > 发送时间: 2015年8月21日 22:10
> > 收件人: dev@kylin.incubator.apache.org
> > 主题: Re: 答复: Queries with filters and coprocessors high cpu usage
> >
> > Just to add to vadim's query, we want to leverage kylin cube with many
> > dimensions for a very responsive dashboard, which allows using selecting
> > different values among the dimensions as filters which get set in the IN
> > clause, so we would not want to compromise with this feature. What would
> > be the possible strategies to overcome some of these issues. Could this
> be
> > solved with scaling horizontally or throwing more hardware in the
> cluster ?
> > Any tips on the sizing would be appreciated.
> >
> > On Fri, Aug 21, 2015 at 7:33 PM, Huang Hua <hu...@mininglamp.com>
> > wrote:
> >
> > > I suspect that the reason is most likely related to the "IN"
> statements.
> > >
> > > As far as I know, the current scan algorithm for the "IN" statements
> > > is to use the minimal value and the maximum value from the "IN" value
> > > list to come up with the hbase scan range. In the worst case, such
> > > range can be very big. For example, let's say the "IN" statement looks
> > > like "in (1, 2, 3, 1000000)" and then kylin will scan the range [1,
> > > 1000000] to get back the results which is sometimes equivalent to a
> full
> > table scan.
> > >
> > > And I am guessing that you were generating the queries randomly, which
> > > would probably produce "IN" statements with big ranges and gives not
> > > so-well performance.
> > >
> > > > -----邮件原件-----
> > > > 发件人: dev-return-3767-
> > > > huanghua=mininglamp.com@kylin.incubator.apache.org
> > > > [mailto:dev-return-
> > > > 3767-huanghua=mininglamp.com@kylin.incubator.apache.org] 代表
> > Vadim
> > > > Semenov
> > > > 发送时间: 2015年8月21日 21:16
> > > > 收件人: dev@kylin.incubator.apache.org
> > > > 主题: Queries with filters and coprocessors high cpu usage
> > > >
> > > > Hi,
> > > >
> > > > I've been experimenting with Kylin for some time, and I ran into a
> > > difficult
> > > > problem:
> > > >
> > > > I have a cube (total size ~150GB, ~1.1B source records) with the
> > > following
> > > > dimensions and cardinalities (as they defined in the aggregation
> group):
> > > > date 10
> > > > dim0 250 STRING
> > > > dim1 60 STRING
> > > > dim2 3000 INT
> > > > dim3 7000 INT
> > > > dim4 30 INT
> > > > dim5 20 INT
> > > > dim6 30 INT
> > > > dim7 10 INT
> > > >
> > > > When I execute queries like this (accept partial = false):
> > > >
> > > > SELECT dim1, SUM(m0), SUM(m1), … FROM fact WHERE date BETWEEN …
> > AND
> > > > dim0 IN (10 values) AND
> > > > dim2 IN (10 values)
> > > > GROUP BY dim1 LIMIT 10;
> > > >
> > > > SELECT dim7, SUM(m0), SUM(m1), … FROM fact WHERE date BETWEEN …
> > AND
> > > > dim0 IN (10 values) AND
> > > > dim2 IN (10 values) AND
> > > > dim3 IN (10 values)
> > > > GROUP BY dim7 LIMIT 10;
> > > >
> > > > SELECT dim7, SUM(m0), SUM(m1), … FROM fact WHERE date BETWEEN …
> > AND
> > > > dim0 IN (10 values) AND
> > > > dim2 IN (10 values) AND
> > > > dim3 IN (10 values) AND
> > > > dim4 IN (10 values) AND
> > > > dim6 IN (10 values)
> > > > GROUP BY dim7 LIMIT 10;
> > > >
> > > >
> > > > Coprocessors consume 100% CPU on some of the region servers and
> > never
> > > > finish.
> > > > I tried to profile a region server and got the following:
> > > > http://i.imgur.com/yrKnDc1.png
> > > >
> > > > I tried to disable fuzzy key feature using backdoorToggles, and got
> much
> > > > better results: coprocessors don't get stuck anymore and I always get
> > > > response. Though response time suffered a bit but overall
> > responsiveness
> > > is
> > > > much better.
> > > >
> > > > Query times I get for the queries (accept partial = false):
> > > > 1. 5-10 seconds
> > > > 2. 30-100 seconds
> > > > 3. 180-300 seconds
> > > >
> > > > So my questions are:
> > > > 1. Are there ways to improve query time for this kind of queries?
> > > > 2. Why coprocessors consume 100% cpu and never finish with enabled
> > fuzzy
> > > > key?
> > > >
> > > > Thanks.
> > >
> > >
> > >
>
>
>

答复: 答复: Queries with filters and coprocessors high cpu usage

Posted by Huang Hua <hu...@mininglamp.com>.

Adding more machines should help to some extent. 

Before considering adding more machines, is it possible to divide the query dimensions into different small groups according to business requirement? 
If so, you can build multiple cubes where each of them corresponds to a group of certain dimensions, and Kylin itself does good job on auto-routing queries to one of those cubes by a best-matching algorithm. And more importantly, with reduced number of dimensions, you can probably still maintain a very responsive dashboard.

But if the dashboard application is meant to allow queries with arbitrary combinations of dimensions, then the above approach won't work.

> -----邮件原件-----
> 发件人: dev-return-3769-
> huanghua=mininglamp.com@kylin.incubator.apache.org [mailto:dev-return-
> 3769-huanghua=mininglamp.com@kylin.incubator.apache.org] 代表 vipul
> jhawar
> 发送时间: 2015年8月21日 22:10
> 收件人: dev@kylin.incubator.apache.org
> 主题: Re: 答复: Queries with filters and coprocessors high cpu usage
> 
> Just to add to vadim's query, we want to leverage kylin cube with many
> dimensions for a very responsive dashboard, which allows using selecting
> different values among the dimensions as filters which get set in the IN
> clause, so we would not want to compromise with this feature. What would
> be the possible strategies to overcome some of these issues. Could this be
> solved with scaling horizontally or throwing more hardware in the cluster ?
> Any tips on the sizing would be appreciated.
> 
> On Fri, Aug 21, 2015 at 7:33 PM, Huang Hua <hu...@mininglamp.com>
> wrote:
> 
> > I suspect that the reason is most likely related to the "IN" statements.
> >
> > As far as I know, the current scan algorithm for the "IN" statements
> > is to use the minimal value and the maximum value from the "IN" value
> > list to come up with the hbase scan range. In the worst case, such
> > range can be very big. For example, let's say the "IN" statement looks
> > like "in (1, 2, 3, 1000000)" and then kylin will scan the range [1,
> > 1000000] to get back the results which is sometimes equivalent to a full
> table scan.
> >
> > And I am guessing that you were generating the queries randomly, which
> > would probably produce "IN" statements with big ranges and gives not
> > so-well performance.
> >
> > > -----邮件原件-----
> > > 发件人: dev-return-3767-
> > > huanghua=mininglamp.com@kylin.incubator.apache.org
> > > [mailto:dev-return-
> > > 3767-huanghua=mininglamp.com@kylin.incubator.apache.org] 代表
> Vadim
> > > Semenov
> > > 发送时间: 2015年8月21日 21:16
> > > 收件人: dev@kylin.incubator.apache.org
> > > 主题: Queries with filters and coprocessors high cpu usage
> > >
> > > Hi,
> > >
> > > I've been experimenting with Kylin for some time, and I ran into a
> > difficult
> > > problem:
> > >
> > > I have a cube (total size ~150GB, ~1.1B source records) with the
> > following
> > > dimensions and cardinalities (as they defined in the aggregation group):
> > > date 10
> > > dim0 250 STRING
> > > dim1 60 STRING
> > > dim2 3000 INT
> > > dim3 7000 INT
> > > dim4 30 INT
> > > dim5 20 INT
> > > dim6 30 INT
> > > dim7 10 INT
> > >
> > > When I execute queries like this (accept partial = false):
> > >
> > > SELECT dim1, SUM(m0), SUM(m1), … FROM fact WHERE date BETWEEN …
> AND
> > > dim0 IN (10 values) AND
> > > dim2 IN (10 values)
> > > GROUP BY dim1 LIMIT 10;
> > >
> > > SELECT dim7, SUM(m0), SUM(m1), … FROM fact WHERE date BETWEEN …
> AND
> > > dim0 IN (10 values) AND
> > > dim2 IN (10 values) AND
> > > dim3 IN (10 values)
> > > GROUP BY dim7 LIMIT 10;
> > >
> > > SELECT dim7, SUM(m0), SUM(m1), … FROM fact WHERE date BETWEEN …
> AND
> > > dim0 IN (10 values) AND
> > > dim2 IN (10 values) AND
> > > dim3 IN (10 values) AND
> > > dim4 IN (10 values) AND
> > > dim6 IN (10 values)
> > > GROUP BY dim7 LIMIT 10;
> > >
> > >
> > > Coprocessors consume 100% CPU on some of the region servers and
> never
> > > finish.
> > > I tried to profile a region server and got the following:
> > > http://i.imgur.com/yrKnDc1.png
> > >
> > > I tried to disable fuzzy key feature using backdoorToggles, and got much
> > > better results: coprocessors don't get stuck anymore and I always get
> > > response. Though response time suffered a bit but overall
> responsiveness
> > is
> > > much better.
> > >
> > > Query times I get for the queries (accept partial = false):
> > > 1. 5-10 seconds
> > > 2. 30-100 seconds
> > > 3. 180-300 seconds
> > >
> > > So my questions are:
> > > 1. Are there ways to improve query time for this kind of queries?
> > > 2. Why coprocessors consume 100% cpu and never finish with enabled
> fuzzy
> > > key?
> > >
> > > Thanks.
> >
> >
> >

Re: 答复: Queries with filters and coprocessors high cpu usage

Posted by vipul jhawar <vi...@gmail.com>.

Just to add to vadim's query, we want to leverage kylin cube with many
dimensions for a very responsive dashboard, which allows using selecting
different values among the dimensions as filters which get set in the IN
clause, so we would not want to compromise with this feature. What would be
the possible strategies to overcome some of these issues. Could this be
solved with scaling horizontally or throwing more hardware in the cluster ?
Any tips on the sizing would be appreciated.

On Fri, Aug 21, 2015 at 7:33 PM, Huang Hua <hu...@mininglamp.com> wrote:

> I suspect that the reason is most likely related to the "IN" statements.
>
> As far as I know, the current scan algorithm for the "IN" statements is to
> use the minimal value and the maximum value from the "IN" value list to
> come up with the hbase scan range. In the worst case, such range can be
> very big. For example, let's say the "IN" statement looks like "in (1, 2,
> 3, 1000000)" and then kylin will scan the range [1, 1000000] to get back
> the results which is sometimes equivalent to a full table scan.
>
> And I am guessing that you were generating the queries randomly, which
> would probably produce "IN" statements with big ranges and gives not
> so-well performance.
>
> > -----邮件原件-----
> > 发件人: dev-return-3767-
> > huanghua=mininglamp.com@kylin.incubator.apache.org [mailto:dev-return-
> > 3767-huanghua=mininglamp.com@kylin.incubator.apache.org] 代表 Vadim
> > Semenov
> > 发送时间: 2015年8月21日 21:16
> > 收件人: dev@kylin.incubator.apache.org
> > 主题: Queries with filters and coprocessors high cpu usage
> >
> > Hi,
> >
> > I've been experimenting with Kylin for some time, and I ran into a
> difficult
> > problem:
> >
> > I have a cube (total size ~150GB, ~1.1B source records) with the
> following
> > dimensions and cardinalities (as they defined in the aggregation group):
> > date 10
> > dim0 250 STRING
> > dim1 60 STRING
> > dim2 3000 INT
> > dim3 7000 INT
> > dim4 30 INT
> > dim5 20 INT
> > dim6 30 INT
> > dim7 10 INT
> >
> > When I execute queries like this (accept partial = false):
> >
> > SELECT dim1, SUM(m0), SUM(m1), … FROM fact WHERE date BETWEEN …
> > AND
> > dim0 IN (10 values) AND
> > dim2 IN (10 values)
> > GROUP BY dim1 LIMIT 10;
> >
> > SELECT dim7, SUM(m0), SUM(m1), … FROM fact WHERE date BETWEEN …
> > AND
> > dim0 IN (10 values) AND
> > dim2 IN (10 values) AND
> > dim3 IN (10 values)
> > GROUP BY dim7 LIMIT 10;
> >
> > SELECT dim7, SUM(m0), SUM(m1), … FROM fact WHERE date BETWEEN …
> > AND
> > dim0 IN (10 values) AND
> > dim2 IN (10 values) AND
> > dim3 IN (10 values) AND
> > dim4 IN (10 values) AND
> > dim6 IN (10 values)
> > GROUP BY dim7 LIMIT 10;
> >
> >
> > Coprocessors consume 100% CPU on some of the region servers and never
> > finish.
> > I tried to profile a region server and got the following:
> > http://i.imgur.com/yrKnDc1.png
> >
> > I tried to disable fuzzy key feature using backdoorToggles, and got much
> > better results: coprocessors don't get stuck anymore and I always get
> > response. Though response time suffered a bit but overall responsiveness
> is
> > much better.
> >
> > Query times I get for the queries (accept partial = false):
> > 1. 5-10 seconds
> > 2. 30-100 seconds
> > 3. 180-300 seconds
> >
> > So my questions are:
> > 1. Are there ways to improve query time for this kind of queries?
> > 2. Why coprocessors consume 100% cpu and never finish with enabled fuzzy
> > key?
> >
> > Thanks.
>
>
>

答复: Queries with filters and coprocessors high cpu usage

Posted by Huang Hua <hu...@mininglamp.com>.

I suspect that the reason is most likely related to the "IN" statements.

As far as I know, the current scan algorithm for the "IN" statements is to use the minimal value and the maximum value from the "IN" value list to come up with the hbase scan range. In the worst case, such range can be very big. For example, let's say the "IN" statement looks like "in (1, 2, 3, 1000000)" and then kylin will scan the range [1, 1000000] to get back the results which is sometimes equivalent to a full table scan.

And I am guessing that you were generating the queries randomly, which would probably produce "IN" statements with big ranges and gives not so-well performance.
 
> -----邮件原件-----
> 发件人: dev-return-3767-
> huanghua=mininglamp.com@kylin.incubator.apache.org [mailto:dev-return-
> 3767-huanghua=mininglamp.com@kylin.incubator.apache.org] 代表 Vadim
> Semenov
> 发送时间: 2015年8月21日 21:16
> 收件人: dev@kylin.incubator.apache.org
> 主题: Queries with filters and coprocessors high cpu usage
> 
> Hi,
> 
> I've been experimenting with Kylin for some time, and I ran into a difficult
> problem:
> 
> I have a cube (total size ~150GB, ~1.1B source records) with the following
> dimensions and cardinalities (as they defined in the aggregation group):
> date 10
> dim0 250 STRING
> dim1 60 STRING
> dim2 3000 INT
> dim3 7000 INT
> dim4 30 INT
> dim5 20 INT
> dim6 30 INT
> dim7 10 INT
> 
> When I execute queries like this (accept partial = false):
> 
> SELECT dim1, SUM(m0), SUM(m1), … FROM fact WHERE date BETWEEN …
> AND
> dim0 IN (10 values) AND
> dim2 IN (10 values)
> GROUP BY dim1 LIMIT 10;
> 
> SELECT dim7, SUM(m0), SUM(m1), … FROM fact WHERE date BETWEEN …
> AND
> dim0 IN (10 values) AND
> dim2 IN (10 values) AND
> dim3 IN (10 values)
> GROUP BY dim7 LIMIT 10;
> 
> SELECT dim7, SUM(m0), SUM(m1), … FROM fact WHERE date BETWEEN …
> AND
> dim0 IN (10 values) AND
> dim2 IN (10 values) AND
> dim3 IN (10 values) AND
> dim4 IN (10 values) AND
> dim6 IN (10 values)
> GROUP BY dim7 LIMIT 10;
> 
> 
> Coprocessors consume 100% CPU on some of the region servers and never
> finish.
> I tried to profile a region server and got the following:
> http://i.imgur.com/yrKnDc1.png
> 
> I tried to disable fuzzy key feature using backdoorToggles, and got much
> better results: coprocessors don't get stuck anymore and I always get
> response. Though response time suffered a bit but overall responsiveness is
> much better.
> 
> Query times I get for the queries (accept partial = false):
> 1. 5-10 seconds
> 2. 30-100 seconds
> 3. 180-300 seconds
> 
> So my questions are:
> 1. Are there ways to improve query time for this kind of queries?
> 2. Why coprocessors consume 100% cpu and never finish with enabled fuzzy
> key?
> 
> Thanks.