You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by "Yaguang Jia (Jira)" <ji...@apache.org> on 2023/04/26 07:13:00 UTC

[jira] [Created] (KYLIN-5536) Kylin query optimization, by limiting the data range of max query, improve query efficiency

Yaguang Jia created KYLIN-5536:
----------------------------------

             Summary: Kylin query optimization, by limiting the data range of max query, improve query efficiency
                 Key: KYLIN-5536
                 URL: https://issues.apache.org/jira/browse/KYLIN-5536
             Project: Kylin
          Issue Type: Improvement
          Components: Query Engine
    Affects Versions: 5.0-alpha
            Reporter: Yaguang Jia
            Assignee: Yaguang Jia
             Fix For: 5.0-beta


h2. Dev design

1、Add configuration kylin.query.max-measure-segment-pruner-before-days
Limit the time range of the query. The default value is -1, which is equivalent to turning off this optimization. When configured to 0, no data is scanned. When the configuration parameter is incorrect (e.g. 0.1), the effect is to not turn on the switch. Includes three levels: model, project, and system, in decreasing order of priority.
2、Where will the optimization be done?
segment pruner at: org.apache.kylin.query.routing.RealizationPruner#pruneSegments
3、What kind of queries will be optimized?
select <max(partDT)> from T [where xxx]
The query must be max(time partitioned column; where condition is optional; no group by column
4、When configuration parameters are specified, which segment is selected to answer the query?
From the last (new) segment, the segment is selected according to the configuration time.
h3. dev design
h4. 1、新增配置 {{kylin.query.max-measure-segment-pruner-before-days}}

用于限定查询时的时间范围。默认值为-1,相当于关闭此优化。当配置为0时,不扫描数据。当配置参数不对(比如0.1)时,效果为不打开开关。包括模型、项目、系统三个级别,优先级依次降低。

*2、将优化做在哪?*

segment pruner处:org.apache.kylin.query.routing.RealizationPruner#pruneSegments

*3、什么样的查询会被优化?*

select <max(partDT)> from T [where xxx]

查询必须是max(时间分区列;where 条件可有可无;不能有group by 列

*4、当指定了配置参数时,选择哪些segment来回答查询?*

从最后(新)一个segment起,按照配置时间选择segment。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)