You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "zhimin wu (Jira)" <ji...@apache.org> on 2021/04/11 09:31:00 UTC

[jira] [Commented] (KYLIN-4969) Query results may be incorrect if the query filter condition contains derived dimensions

    [ https://issues.apache.org/jira/browse/KYLIN-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318709#comment-17318709 ] 

zhimin wu commented on KYLIN-4969:
----------------------------------

*Root Cause:*

For a filter condition in a query, there are three places that it can take effect during a query,
Step 1. Coarse-grained segment Pruner and Shard Pruner. Filtering the data before reading it.
Step 2. Filter Pushdown performed during the reading of data. However, in the process of reading data, the data in the original data containing only normal dimension does not contain the data of Derived dimension, so for the filter on Derived dimension, The filter needs to be converted to a filter on the Normal dimension using the snapshot of the corresponding dimension table. The coarse-grained filter is pushed down when the exact conversion is not possible.
Step 3. After reading the data, the query plan of Calcite will be walked over, and the precise filtering conditions will be passed in the query plan.

The `stream aggregate` is executed between steps 2 and 3, which is the root cause of the incorrect query results. refer to https://issues.apache.org/jira/browse/KYLIN-2501
If accurate filtering is not realized in Step 2, it is necessary to ensure that the Stream Aggregate conducted in advance this time will add the columns related to the filtering conditions that failed to achieve accurate filtering to the Aggregate group, so as to ensure that the data is still accurate when Step3 is carried out.

In the appeal SQL, The filter condition SALES_REGION. R_NAME = BUY_REGION. R_NAME is a filter that cannot be transformed without the actual column data but here it is incorrectly converted to the coarse-grained filter. Moreover, the system mistakenly believed that the converted filter was an accurate filter, and the corresponding column was not added to the Aggregate group of the subsequent Stream Aggregate, resulting in incorrect data when entering Step3 and incorrect query results.

> Query results may be incorrect if the query filter condition contains derived dimensions
> ----------------------------------------------------------------------------------------
>
>                 Key: KYLIN-4969
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4969
>             Project: Kylin
>          Issue Type: Bug
>          Components: Query Engine
>    Affects Versions: v3.1.1
>            Reporter: zhimin wu
>            Priority: Major
>         Attachments: image-2021-04-11-16-11-03-058.png, image-2021-04-11-16-14-10-698.png, image-2021-04-11-16-15-52-352.png, image-2021-04-11-16-17-45-890.png, image-2021-04-11-16-21-53-243.png
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> There are there tables.
> !image-2021-04-11-16-11-03-058.png|width=431,height=280!
> Model as follows
> !image-2021-04-11-16-14-10-698.png|width=824,height=328!
> Cube as follows
> !image-2021-04-11-16-15-52-352.png|width=765,height=425!
> the query result on kylin
> !image-2021-04-11-16-17-45-890.png|width=762,height=450!
> resule on hive
> !image-2021-04-11-16-21-53-243.png|width=1207,height=913!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)