You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2019/12/25 11:16:00 UTC

[jira] [Commented] (KYLIN-4260) When using server side PreparedStatement cache, the query result are not match on TopN scenario

    [ https://issues.apache.org/jira/browse/KYLIN-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003225#comment-17003225 ] 

ASF subversion and git services commented on KYLIN-4260:
--------------------------------------------------------

Commit a7c78bc5172a51e3cfd1908781b5dbdc6ebfffeb in kylin's branch refs/heads/master from mawu2
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=a7c78bc ]

KYLIN-4260 When using server side PreparedStatement cache, the query result are not match on TopN scenario


> When using server side PreparedStatement cache, the query result are not match on TopN scenario
> -----------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-4260
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4260
>             Project: Kylin
>          Issue Type: Bug
>          Components: Query Engine
>    Affects Versions: v3.0.0-alpha2, v2.6.4
>            Reporter: Marc Wu
>            Assignee: Marc Wu
>            Priority: Major
>             Fix For: Future, v2.6.5
>
>         Attachments: image-2019-11-18-15-55-00-312.png, image-2019-11-18-15-55-11-906.png, image-2019-11-18-19-29-34-489.png, image-2019-11-18-19-29-42-721.png
>
>
> Hi Kylin team,
> I found an issue while server side PreparedStatement enabled. The second time query and after's result will be different from the first when query TopN, and the result is not right.
> Part of Cube info:
>  
>  Dimensions
>  TRANS_ID
>  PART_DT
>  SELLER_ID
>  BUYER_ID
> Measures:
>  SUM(PRICE)
>  MAX(PRICE)
>  TOPN(PRICE) Group By:KYLIN_SALES.SELLER_ID,KYLIN_SALES.BUYER_ID
>  
> SQL:
> {code:java}
> {"sql":"select seller_id, buyer_id, sum(PRICE) from glaucus.kylin_sales where PART_DT >= ? and PART_DT <= ? group by seller_id, buyer_id order by sum(PRICE) desc limit 20","project":"DDTFORTEST_Analytics", "params":[{"className": "java.lang.String","value": "2012-01-01"},{"className": "java.lang.String","value": "2012-01-10"}]}
> {code}
> The First query result:
> !image-2019-11-18-15-55-00-312.png!
> The Second and after:
> !image-2019-11-18-15-55-11-906.png|width=2046,height=1096!
>  -----------------------------------------------
> h2. -Root Cause-
> Cached preparedContext is changed when doing preparedStatement.executeQuery, and losing groupByColumns. So the first execution result is correct, the second and the after will be incorrect.
> !image-2019-11-18-19-29-34-489.png!
> !image-2019-11-18-19-29-42-721.png!
> h2. Real Root Cause
> The first time we entered PreparedStatement logic, we'll try to borrow preparedContext from cache pool, of course there isn't any, but the cache pool will execute create method to create a new preparedContext, and then loaned it to us.
> I didn't figure out how adjustSqlDigest works before, and try to remove code
> {code:java}
> sqlDigest.groupbyColumns.removeAll(topnLiteralCol){code}
> but it's not right. Top-N isn't like some other measures, the dimensions aren't as part of row key, they stored in measures in design, so it's why the adjustSqlDigest matters, especially those codes.
> {code:java}
> sqlDigest.groupbyColumns.removeAll(topnLiteralCol);
> sqlDigest.metricColumns.addAll(topnLiteralCol);
> {code}
> The root cause for this issue is because the create sql digest is execute again after we store it in cache, so the digest changed.
> {code:java}
> # This is from the first time
> fact table GLAUCUS.KYLIN_SALES,group by [],filter on [GLAUCUS.KYLIN_SALES.PART_DT],with aggregates[FunctionDesc [expression=TOP_N, parameter=GLAUCUS.KYLIN_SALES.PRICE,GLAUCUS.KYLIN_SALES.SELLER_ID,GLAUCUS.KYLIN_SALES.BUYER_ID, returnType=topn(5000,8)]].
> # This is the second one
> fact table GLAUCUS.KYLIN_SALES,group by [],filter on [GLAUCUS.KYLIN_SALES.PART_DT],with aggregates[FunctionDesc [expression=SUM, parameter=GLAUCUS.KYLIN_SALES.PRICE, returnType=decimal(19,4)]].
> {code}
> So the second time and after we execute the same query or same pattern, the expression will be changed to SUM instead of TOPN, that's why the strange result show up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)