You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Dayue Gao (JIRA)" <ji...@apache.org> on 2017/02/08 13:56:41 UTC

[jira] [Created] (KYLIN-2438) replace scan threshold with max scan bytes

Dayue Gao created KYLIN-2438:
--------------------------------

             Summary: replace scan threshold with max scan bytes
                 Key: KYLIN-2438
                 URL: https://issues.apache.org/jira/browse/KYLIN-2438
             Project: Kylin
          Issue Type: Improvement
          Components: Query Engine, Storage - HBase
    Affects Versions: v1.6.0
            Reporter: Dayue Gao
            Assignee: Dayue Gao


In order to guard against bad queries that can consume too much memory and then crash kylin / hbase server, kylin limits the maximum number of rows query can scan. The maximum value is determined by two configs
# *kylin.query.scan.threshold* is used if the query doesn't contain memory-hungry metrics
# otherwise, *kylin.query.mem.budget* / estimated_row_size is used as the maximum per region.

This approach however has several deficiencies:
* It doesn't work with complex, variable length metrics very well. The estimated threshold could be either too small or too large. If it's too small, good queries are killed. If it's too large, bad queries are not banned.
* Row count doesn't correspond to memory consumption, thus it's difficult to determine how large scan threshold should be set to.
* kylin.query.scan.threshold can't be override at cube level.

In this JIRA, I propose to replace the current row count based threshold with a more intuitive size based threshold
* KYLIN-2437 will collect the number of bytes scanned at both region and query level
* A new configuration *kylin.query.max-scan-bytes* will be added to limits the maximum number of bytes query can scan in total
* *kylin.query.mem.budget* will be renamed to *kylin.storage.hbase.coprocessor-max-scan-bytes*, which limits at region level
* the old *kylin.query.scan.threshold* will be deprecated



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)