You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@kylin.apache.org by Phong Pham <ph...@gmail.com> on 2017/01/17 11:37:37 UTC

Problem with limit and joint aggregation

Hi all,
    I definedsome dimensions, for example: A,B,C as  joint aggregation.
When i executed query:

SELECT A,B,C, SUM(metrics) as metrics
FROM table1
WHERE DateStats <= x and DateStats >= x
GROUP BY A,B,C
LIMIT 250

Query is very fast, but Metrics (from SUM(metrics)) Value just sum data
within limit (250 rows). If i used ORDER BY <A>, results will be true but
performance is so bad (If Total Scan Count is over 2-3 milions).
Please explain to me this problem.

Thanks.

Re: Problem with limit and joint aggregation

Posted by Alberto Ramón <a....@gmail.com>.

Joint must be used for:
 - Group Dims with *very *low cardinality, Example: IdCurrency  (most of
bank's transactions uses < 10 currencies)
- You Have columns with same cardinality: Country_ID and Contry_txt

Check TopN feature of Kylin to precalcualte sum order by
You can allocate more memory to Kylin Instance (for order by process)
please, read links I shared with you in the other Q, there are some useful
tips and examples

2017-01-17 12:37 GMT+01:00 Phong Pham <ph...@gmail.com>:

> Hi all,
>     I definedsome dimensions, for example: A,B,C as  joint aggregation.
> When i executed query:
>
> SELECT A,B,C, SUM(metrics) as metrics
> FROM table1
> WHERE DateStats <= x and DateStats >= x
> GROUP BY A,B,C
> LIMIT 250
>
> Query is very fast, but Metrics (from SUM(metrics)) Value just sum data
> within limit (250 rows). If i used ORDER BY <A>, results will be true but
> performance is so bad (If Total Scan Count is over 2-3 milions).
> Please explain to me this problem.
>
> Thanks.
>