You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Phong Pham <ph...@gmail.com> on 2017/01/17 11:37:37 UTC
Problem with limit and joint aggregation
Hi all,
I definedsome dimensions, for example: A,B,C as joint aggregation.
When i executed query:
SELECT A,B,C, SUM(metrics) as metrics
FROM table1
WHERE DateStats <= x and DateStats >= x
GROUP BY A,B,C
LIMIT 250
Query is very fast, but Metrics (from SUM(metrics)) Value just sum data
within limit (250 rows). If i used ORDER BY <A>, results will be true but
performance is so bad (If Total Scan Count is over 2-3 milions).
Please explain to me this problem.
Thanks.
Re: Problem with limit and joint aggregation
Posted by Alberto Ramón <a....@gmail.com>.
Joint must be used for:
- Group Dims with *very *low cardinality, Example: IdCurrency (most of
bank's transactions uses < 10 currencies)
- You Have columns with same cardinality: Country_ID and Contry_txt
Check TopN feature of Kylin to precalcualte sum order by
You can allocate more memory to Kylin Instance (for order by process)
please, read links I shared with you in the other Q, there are some useful
tips and examples
2017-01-17 12:37 GMT+01:00 Phong Pham <ph...@gmail.com>:
> Hi all,
> I definedsome dimensions, for example: A,B,C as joint aggregation.
> When i executed query:
>
> SELECT A,B,C, SUM(metrics) as metrics
> FROM table1
> WHERE DateStats <= x and DateStats >= x
> GROUP BY A,B,C
> LIMIT 250
>
> Query is very fast, but Metrics (from SUM(metrics)) Value just sum data
> within limit (250 rows). If i used ORDER BY <A>, results will be true but
> performance is so bad (If Total Scan Count is over 2-3 milions).
> Please explain to me this problem.
>
> Thanks.
>