You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by GitBox <gi...@apache.org> on 2019/04/23 14:05:08 UTC
[GitHub] [kylin] zhaojintaozhao edited a comment on issue #612: KYLIN-3961
Optimize TopNCounter's merge function to reduce TopNCounter's error size.
zhaojintaozhao edited a comment on issue #612: KYLIN-3961 Optimize TopNCounter's merge function to reduce TopNCounter's error size.
URL: https://github.com/apache/kylin/pull/612#issuecomment-485816100
> Hi jintao, I understand your change.
>
> The current algorithm bases the algorithm of "a parallel space saving algorithm for frequent items and the Hurwitz zeta distribution", which you can find the link from the reference list of:
>
> https://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/
>
> It will make the aggregated value bigger than the actual value, but it is to ensure the position be relatively close to actual.
Hi shaofeng:
I built 3 cubes and test the query performance and accuracy of TOPN.
The first is normal cube without topN; The second cube has topN measure with current version code. The third cube has topN measure with my modified code.
Both the topN measure of the second and third cube is top100. The amount of source data is 22million, I build cube 30 days and using the same query sql.
I find that both the second and third topN cube have a few errors to the actual. But the second and third topN cube is very fast than the first normal cube.
The error of the third cube is less than the second cube about position relatively to the actual. The aggregated value of the second cube is larger than the actual, but the third cube doesn't have error about aggregated value.
I think that the third optimized cube is better than the second current code cube.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services