You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kylin.apache.org by "zhao jintao (JIRA)" <ji...@apache.org> on 2019/04/17 10:28:00 UTC

[jira] [Created] (KYLIN-3961) Optimize TopN measure merge function to reduce mistaks

zhao jintao created KYLIN-3961:
----------------------------------

             Summary:  Optimize TopN  measure merge function  to  reduce mistaks 
                 Key: KYLIN-3961
                 URL: https://issues.apache.org/jira/browse/KYLIN-3961
             Project: Kylin
          Issue Type: Improvement
          Components: Measure - TopN
    Affects Versions: v2.5.2
         Environment: Huawei FusionInsight
            Reporter: zhao jintao
            Assignee: zhao jintao


Hi Team:

I use "Top-N "measure to query such sql "select sum(AAA) from BBB group by CCC,DDD", It is much better than a cube without "Top-N".

In my system, kylin cost just 0.2s to query sql with "Top-N" measure cube; If without "Top-N" measure it may be cost 10s.

But I find that Top-N measure can be optimized to reduce mistaks.

I use kylin demo to test "TopN".

I build two cube using "KYLIN_SALES". The first cube has three dimentions:"SELLER_ID","BUYER_ID" and "PART_DT", has one measures: "SUM(PRICE)" . The second cube has one dimention:"PART_DT", has twon measures: "SUM(PRICE)" and "TOPN(10)", the "ORDER|SUM by Column" of  "TOPN(10)" is "PRICE", the "Group by Column"  of “TOPN(10)” is "SELLER_ID" and "BUYER_ID",the "Return Type" of "TOPN(10)" is "Top 10". Then I build cube from "2012-01-01" to "2014-01-01".

I use same sql to query two cube. I find that 2 cubes have a larger error.

The top5  "SUM PRICE" of first cube without "TopN" is "167.7269", "99.9908", "99.9888","99.9865","99.978".

The top5 "SUM PRICE" of second cube with "TopN" is "179.27699...","167.6320...","167.3050...","167.2069...","166.7429...".

Does any one meet same problem?

 

Best regards.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)