You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Xiaoxiang Yu (Jira)" <ji...@apache.org> on 2020/07/31 12:10:00 UTC
[jira] [Closed] (KYLIN-4260) When using server side
PreparedStatement cache, the query result are not match on TopN scenario
[ https://issues.apache.org/jira/browse/KYLIN-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiaoxiang Yu closed KYLIN-4260.
-------------------------------
Resolved in release 3.1.0 (2020-07-03)
> When using server side PreparedStatement cache, the query result are not match on TopN scenario
> -----------------------------------------------------------------------------------------------
>
> Key: KYLIN-4260
> URL: https://issues.apache.org/jira/browse/KYLIN-4260
> Project: Kylin
> Issue Type: Bug
> Components: Query Engine
> Affects Versions: v3.0.0-alpha2, v2.6.4
> Reporter: Marc Wu
> Assignee: Marc Wu
> Priority: Major
> Fix For: v3.1.0, v3.0.1
>
> Attachments: image-2019-11-18-15-55-00-312.png, image-2019-11-18-15-55-11-906.png, image-2019-11-18-19-29-34-489.png, image-2019-11-18-19-29-42-721.png
>
>
> Hi Kylin team,
> I found an issue while server side PreparedStatement enabled. The second time query and after's result will be different from the first when query TopN, and the result is not right.
> Part of Cube info:
>
> Dimensions
> TRANS_ID
> PART_DT
> SELLER_ID
> BUYER_ID
> Measures:
> SUM(PRICE)
> MAX(PRICE)
> TOPN(PRICE) Group By:KYLIN_SALES.SELLER_ID,KYLIN_SALES.BUYER_ID
>
> SQL:
> {code:java}
> {"sql":"select seller_id, buyer_id, sum(PRICE) from glaucus.kylin_sales where PART_DT >= ? and PART_DT <= ? group by seller_id, buyer_id order by sum(PRICE) desc limit 20","project":"DDTFORTEST_Analytics", "params":[{"className": "java.lang.String","value": "2012-01-01"},{"className": "java.lang.String","value": "2012-01-10"}]}
> {code}
> The First query result:
> !image-2019-11-18-15-55-00-312.png!
> The Second and after:
> !image-2019-11-18-15-55-11-906.png|width=2046,height=1096!
> -----------------------------------------------
> h2. -Root Cause-
> Cached preparedContext is changed when doing preparedStatement.executeQuery, and losing groupByColumns. So the first execution result is correct, the second and the after will be incorrect.
> !image-2019-11-18-19-29-34-489.png!
> !image-2019-11-18-19-29-42-721.png!
> h2. Real Root Cause
> The first time we entered PreparedStatement logic, we'll try to borrow preparedContext from cache pool, of course there isn't any, but the cache pool will execute create method to create a new preparedContext, and then loaned it to us.
> I didn't figure out how adjustSqlDigest works before, and try to remove code
> {code:java}
> sqlDigest.groupbyColumns.removeAll(topnLiteralCol){code}
> but it's not right. Top-N isn't like some other measures, the dimensions aren't as part of row key, they stored in measures in design, so it's why the adjustSqlDigest matters, especially those codes.
> {code:java}
> sqlDigest.groupbyColumns.removeAll(topnLiteralCol);
> sqlDigest.metricColumns.addAll(topnLiteralCol);
> {code}
> The root cause for this issue is because the create sql digest is execute again after we store it in cache, so the digest changed.
> {code:java}
> # This is from the first time
> fact table GLAUCUS.KYLIN_SALES,group by [],filter on [GLAUCUS.KYLIN_SALES.PART_DT],with aggregates[FunctionDesc [expression=TOP_N, parameter=GLAUCUS.KYLIN_SALES.PRICE,GLAUCUS.KYLIN_SALES.SELLER_ID,GLAUCUS.KYLIN_SALES.BUYER_ID, returnType=topn(5000,8)]].
> # This is the second one
> fact table GLAUCUS.KYLIN_SALES,group by [],filter on [GLAUCUS.KYLIN_SALES.PART_DT],with aggregates[FunctionDesc [expression=SUM, parameter=GLAUCUS.KYLIN_SALES.PRICE, returnType=decimal(19,4)]].
> {code}
> So the second time and after we execute the same query or same pattern, the expression will be changed to SUM instead of TOPN, that's why the strange result show up.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)