You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by GitBox <gi...@apache.org> on 2019/04/23 14:05:08 UTC

[GitHub] [kylin] zhaojintaozhao edited a comment on issue #612: KYLIN-3961 Optimize TopNCounter's merge function to reduce TopNCounter's error size.

zhaojintaozhao edited a comment on issue #612: KYLIN-3961 Optimize TopNCounter's merge function to reduce TopNCounter's error size.
URL: https://github.com/apache/kylin/pull/612#issuecomment-485816100
 
 
   > Hi jintao, I understand your change.
   > 
   > The current algorithm bases the algorithm of "a parallel space saving algorithm for frequent items and the Hurwitz zeta distribution", which you can find the link from the reference list of:
   > 
   > https://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/
   > 
   > It will make the aggregated value bigger than the actual value, but it is to ensure the position be relatively close to actual.
   
   Hi shaofeng:
   
   I built 3 cubes and test the query performance and accuracy of TOPN.  
   The first is normal cube without topN;  The second cube has topN measure with current version code. The third cube has topN measure with my modified code.
   Both  the topN measure of  the second and third cube is top100. The amount of source data is 22million, I build cube  30 days and using the same query sql. 
   
   I find that both the second  and  third topN cube have a few errors to the actual. But  the second and third topN cube  is very fast than the first normal cube.
   The error of the  third cube is less than the second cube about position relatively to the actual. The aggregated value of the  second cube is larger than the actual, but the third cube doesn't have error about aggregated value.
   I think that  the third optimized cube is better than the second current code cube.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services