You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/07/01 07:59:47 UTC

[GitHub] [incubator-doris] marising opened a new pull request #3994: SQL result cache and partition cache

marising opened a new pull request #3994:
URL: https://github.com/apache/incubator-doris/pull/3994


   #2581
   
   ## Solutions
   
   This cache give priority to ensuring data consistency. On this basis, it refines the cache granularity and improves the hit rate. Therefore, it has the following characteristics:
   
   * User don't need to worry about data consistency. Cache invalidation is controlled by version. The cached data is consistent with the data queried from be
   * Without additional components and costs, the cache results are stored in the memory of be, and user can adjust the cache memory size according to their needs
   * Two caching strategies are implemented, sql and partition cache, which are more granular
   * The cache algorithm in be is an improved LRU
   
   ## Two cache mode
   
   ### SQLCache
   Sql cache stores and fetches the cache according to the **SQL signature**, **partition ID** of the query table, and the **latest version** of the partition. 
   
   The combination of the three determines a cache dataset. If any one of them changes, such as SQL changes, query fields or conditions are not the same, or the version after data update changes, the cache will not be hit.
   
   If multiple tables are joined, the latest partition ID and the latest version number are used. If one of the tables is updated, the partition ID or version number will be different, and the cache will not be hit.
   
   Sql cache is more suitable for the scenario of T + 1 update. When the data is updated in the morning, the results of the first query are obtained from be and put into the cache, and the subsequent same query is obtained from the cache. Real time update data can also be used, but there may be a low hit rate. Please refer to the following partitioncache.
   
   ### PartitionCache
   Query the number of users per day in the last 7 days, such as partitioning by date, data is only written to the current partition, and the data of other partitions other than that day are fixed. Under the same query SQL, query a partition that does not update The indicators are fixed. As follows, the number of users in the 7 days before the query on 2020-03-09, the data from 2020-03-03 to 2020-03-07 comes from the cache, the first query from 2020-03-08 comes from the partition, and the subsequent queries come from the cache , 2020-03-09 because of the non-stop writing that day, so from the partition.
   
   Therefore, querying the data of N days, the latest D days of the data update, each day is only a query with a similar date range, only need to query D partitions, the other parts are all from the cache, which can effectively reduce the cluster load and reduce the query time.
   
   ```
   MySQL [(none)]> SELECT eventdate,count(userid) FROM testdb.appevent WHERE eventdate>="2020-03-03" AND eventdate<="2020-03-09" GROUP BY eventdate ORDER BY eventdate;
   +------------+-----------------+
   | eventdate  | count(`userid`) |
   +------------+-----------------+
   | 2020-03-03 |              15 | //From cache
   | 2020-03-04 |              20 | ...
   | 2020-03-05 |              25 |
   | 2020-03-06 |              30 |
   | 2020-03-07 |              35 |
   | 2020-03-08 |              40 | //From cache
   | 2020-03-09 |              25 | //From disk
   +------------+-----------------+
   7 rows in set (0.02 sec)
   ```
   
   ## Reference
   For more information, please read [partition_cache.md](https://github.com/apache/incubator-doris/compare/master...marising:partition_cache_0.3?expand=1#diff-bfa91ffdf5960b0af00c5775fa9a2db2)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] HangyuanLiu closed pull request #3994: SQL result cache and partition cache

Posted by GitBox <gi...@apache.org>.

HangyuanLiu closed pull request #3994:
URL: https://github.com/apache/incubator-doris/pull/3994


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] wutiangan commented on a change in pull request #3994: SQL result cache and partition cache

Posted by GitBox <gi...@apache.org>.

wutiangan commented on a change in pull request #3994:
URL: https://github.com/apache/incubator-doris/pull/3994#discussion_r448295650



##########
File path: docs/zh-CN/administrator-guide/partition_cache.md
##########
@@ -0,0 +1,205 @@
+# 分区缓存
+
+## 需求场景
+大部分数据分析场景是写少读多，数据写入一次，多次频繁读取，比如一张报表涉及的维度和指标，数据在凌晨一次性计算好，但每天有数百甚至数千次的页面访问，因此非常适合把结果集缓存起来。在数据分析或BI应用中，存在下面的业务场景：
+* **高并发场景**，Doris可以较好的支持高并发，但单台服务器无法承载太高的QPS
+* **复杂图表的看板**，复杂的Dashboard或者大屏类应用，数据来自多张表，每个页面有数十个查询，虽然每个查询只有数十毫秒，但是总体查询时间会在数秒
+* **趋势分析**，给定日期范围的查询，指标按日显示，比如查询最近7天内的用户数的趋势，这类查询数据量大，查询范围广，查询时间往往需要数十秒
+* **用户重复查询**，如果产品没有防重刷机制，用户因手误或其他原因重复刷新页面，导致提交大量的重复的SQL
+
+以上四种场景，在应用层的解决方案，把查询结果放到Redis中，周期性的更新缓存或者用户手工刷新缓存，但是这个方案有如下问题：
+* **数据不一致**，无法感知数据的更新，导致用户经常看到旧的数据
+* **命中率低**，缓存整个查询结果，如果数据实时写入，缓存频繁失效，命中率低且系统负载较重
+* **额外成本**，引入外部缓存组件，会带来系统复杂度，增加额外成本
+
+## 解决方案
+本分区缓存策略可以解决上面的问题，优先保证数据一致性，在此基础上细化缓存粒度，提升命中率，因此有如下特点：
+* 用户无需担心数据一致性，通过版本来控制缓存失效，缓存的数据和从BE中查询的数据是一致的
+* 没有额外的组件和成本，缓存结果存储在BE的内存中，用户可以根据需要调整缓存内存大小
+* 实现了两种缓存策略，SQLCache和PartitionCache，后者缓存粒度更细
+* 用一致性哈希解决BE节点上下线的问题，BE中的缓存算法是改进的LRU
+
+## SQLCache
+SQLCache按SQL的签名、查询的表的分区ID、分区最新版本来存储和获取缓存。三者组合确定一个缓存数据集，任何一个变化了，如SQL有变化，如查询字段或条件不一样，或数据更新后版本变化了，会导致命中不了缓存。
+
+如果多张表Join，使用最近更新的分区ID和最新的版本号，如果其中一张表更新了，会导致分区ID或版本号不一样，也一样命中不了缓存。
+
+SQLCache，更适合T+1更新的场景，凌晨数据更新，首次查询从BE中获取结果放入到缓存中，后续相同查询从缓存中获取。实时更新数据也可以使用，但是可能存在命中率低的问题，可以参考如下PartitionCache。
+
+## PartitionCache
+
+### 设计原理
+1. SQL可以并行拆分，Q = Q1 ∪ Q2 ... ∪ Qn，R= R1 ∪ R2 ... ∪ Rn，Q为查询语句，R为结果集
+2. 拆分为只读分区和可更新分区，只读分区缓存，更新分区不缓存
+
+如上，查询最近7天的每天用户数，如按日期分区，数据只写当天分区，当天之外的其他分区的数据，都是固定不变的，在相同的查询SQL下，查询某个不更新分区的指标都是固定的。如下，在2020-03-09当天查询前7天的用户数，2020-03-03至2020-03-07的数据来自缓存，2020-03-08第一次查询来自分区，后续的查询来自缓存，2020-03-09因为当天在不停写入，所以来自分区。
+
+因此，查询N天的数据，数据更新最近的D天，每天只是日期范围不一样相似的查询，只需要查询D个分区即可，其他部分都来自缓存，可以有效降低集群负载，减少查询时间。
+
+```
+MySQL [(none)]> SELECT eventdate,count(userid) FROM testdb.appevent WHERE eventdate>="2020-03-03" AND eventdate<="2020-03-09" GROUP BY eventdate ORDER BY eventdate;
++------------+-----------------+
+| eventdate  | count(`userid`) |
++------------+-----------------+
+| 2020-03-03 |              15 |
+| 2020-03-04 |              20 |
+| 2020-03-05 |              25 |
+| 2020-03-06 |              30 |
+| 2020-03-07 |              35 |
+| 2020-03-08 |              40 | //第一次来自分区，后续来自缓存
+| 2020-03-09 |              25 | //来自分区
++------------+-----------------+
+7 rows in set (0.02 sec)
+```
+
+在PartitionCache中，缓存第一级Key是去掉了分区条件后的SQL的128位MD5签名，下面是改写后的待签名的SQL：
+```
+SELECT eventdate,count(userid) FROM testdb.appevent GROUP BY eventdate ORDER BY eventdate;
+```
+缓存的第二级Key是查询结果集的分区字段的内容，比如上面查询结果的eventdate列的内容，二级Key的附属信息是分区的版本号和版本更新时间。
+
+下面演示上面SQL在2020-03-09当天第一次执行的流程：
+1. 从缓存中获取数据
+```
++------------+-----------------+
+| 2020-03-03 |              15 |
+| 2020-03-04 |              20 |
+| 2020-03-05 |              25 |
+| 2020-03-06 |              30 |
+| 2020-03-07 |              35 |
++------------+-----------------+
+```
+2. 从BE中获取数据的SQL和数据
+```
+SELECT eventdate,count(userid) FROM testdb.appevent WHERE eventdate>="2020-03-08" AND eventdate<="2020-03-09" GROUP BY eventdate ORDER BY eventdate;
+
++------------+-----------------+
+| 2020-03-08 |              40 |
++------------+-----------------+
+| 2020-03-09 |              25 | 
++------------+-----------------+
+```
+3. 最后发送给终端的数据
+```
++------------+-----------------+
+| eventdate  | count(`userid`) |
++------------+-----------------+
+| 2020-03-03 |              15 |
+| 2020-03-04 |              20 |
+| 2020-03-05 |              25 |
+| 2020-03-06 |              30 |
+| 2020-03-07 |              35 |
+| 2020-03-08 |              40 |
+| 2020-03-09 |              25 |
++------------+-----------------+
+```
+4. 发送给缓存的数据
+```
++------------+-----------------+
+| 2020-03-08 |              40 |
++------------+-----------------+
+```
+
+Partition缓存，适合按日期分区，部分分区实时更新，查询SQL较为固定。
+
+分区字段也可以是其他字段，但是需要保证只有少量分区更新。
+
+### 一些限制
+* 只支持OlapTable，其他存储如MySQL的表没有版本信息，无法感知数据是否更新
+* 只支持按分区字段分组，不支持按其他字段分组，按其他字段分组，该分组数据都有可能被更新，会导致缓存都失效
+* 只支持结果集的前半部分、后半部分以及全部命中缓存，不支持结果集被缓存数据分割成几个部分
+
+## 使用方式
+### 开启SQLCache
+fe.conf添加enable_sql_cache=true
+```
+vim fe/conf/fe.conf
+enable_sql_cache=true
+```
+在MySQL命令行中设置变量
+```
+MySQL [(none)]> set [global] enable_sql_cache=true;
+```
+注：globa是全局变量，不加指当前会话变量

Review comment:
       ```suggestion
   注：global是全局变量，不加指当前会话变量
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] blackfox1983 commented on a change in pull request #3994: SQL result cache and partition cache

Posted by GitBox <gi...@apache.org>.

blackfox1983 commented on a change in pull request #3994:
URL: https://github.com/apache/incubator-doris/pull/3994#discussion_r448285149



##########
File path: be/src/runtime/cache/result_cache.h
##########
@@ -0,0 +1,117 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef DORIS_BE_SRC_RUNTIME_RESULT_CACHE_H
+#define DORIS_BE_SRC_RUNTIME_RESULT_CACHE_H
+
+#include <boost/thread.hpp>
+#include <cassert>
+#include <cstdio>
+#include <cstdlib>
+#include <exception>
+#include <iostream>
+#include <list>
+#include <map>
+#include <mutex>
+#include <shared_mutex>
+#include <thread>
+
+#include "common/config.h"
+#include "runtime/cache/cache_utils.h"
+#include "runtime/cache/result_node.h"
+#include "runtime/mem_pool.h"
+#include "runtime/mem_tracker.h"
+#include "runtime/row_batch.h"
+#include "runtime/tuple_row.h"
+
+namespace doris {
+
+typedef std::unordered_map<UniqueId, ResultNode*> ResultNodeMap;
+
+// a doubly linked list class

Review comment:
       why not use std::list?

##########
File path: be/src/common/config.h
##########
@@ -536,6 +536,15 @@ namespace config {
     // Whether to continue to start be when load tablet from header failed.
     CONF_Bool(ignore_load_tablet_failure, "false");
 
+    // Set max cache's size of query results, the unit is M byte
+    CONF_Int32(cache_max_size, "256"); 

Review comment:
       rename to cache_max_size_mb will better?

##########
File path: fe/src/main/java/org/apache/doris/qe/cache/CacheAnalyzer.java
##########
@@ -0,0 +1,449 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.qe.cache;
+
+import org.apache.doris.analysis.AggregateInfo;
+import org.apache.doris.analysis.BinaryPredicate;
+import org.apache.doris.analysis.CastExpr;
+import org.apache.doris.analysis.CompoundPredicate;
+import org.apache.doris.analysis.Expr;
+import org.apache.doris.analysis.InlineViewRef;
+import org.apache.doris.analysis.QueryStmt;
+import org.apache.doris.analysis.SelectStmt;
+import org.apache.doris.analysis.SlotRef;
+import org.apache.doris.analysis.StatementBase;
+import org.apache.doris.analysis.TableRef;
+import org.apache.doris.catalog.OlapTable;
+import org.apache.doris.catalog.RangePartitionInfo;
+import org.apache.doris.catalog.PartitionType;
+import org.apache.doris.catalog.Partition;
+import org.apache.doris.catalog.Column;
+import org.apache.doris.common.util.DebugUtil;
+import org.apache.doris.metric.MetricRepo;
+import org.apache.doris.planner.OlapScanNode;
+import org.apache.doris.planner.Planner;
+import org.apache.doris.planner.ScanNode;
+import org.apache.doris.qe.ConnectContext;
+import org.apache.doris.qe.RowBatch;
+import org.apache.doris.common.Config;
+import org.apache.doris.common.Status;
+
+import com.google.common.collect.Lists;
+import org.apache.doris.thrift.TUniqueId;
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+
+/**
+ * Analyze which caching mode a SQL is suitable for
+ * 1. T + 1 update is suitable for SQL mode
+ * 2. Partition by date, update the data of the day in near real time, which is suitable for Partition mode
+ */
+public class CacheAnalyzer {
+    private static final Logger LOG = LogManager.getLogger(CacheAnalyzer.class);
+
+    /**
+     * NoNeed : disable config or variable, not query, not scan table etc.
+     */
+    public enum CacheMode {
+        NoNeed,
+        None,
+        Sql,
+        Partition,
+        Aggregate
+    }
+
+    private ConnectContext context;
+    private boolean enableSqlCache = false;
+    private boolean enablePartitionCache = false;
+    private TUniqueId queryId;
+    private CacheMode cacheMode;
+    private CacheTable latestTable;
+    private StatementBase parsedStmt;
+    private SelectStmt selectStmt;
+    private List<ScanNode> scanNodes;
+    private OlapTable olapTable;
+    private RangePartitionInfo partitionInfo;
+    private Column partColumn;
+    private CompoundPredicate partitionPredicate;
+    private Cache cache;
+
+    public Cache getCache() {
+        return cache;
+    }
+
+    public CacheAnalyzer(ConnectContext context, StatementBase parsedStmt, Planner planner) {
+        this.context = context;
+        this.queryId = context.queryId();
+        this.parsedStmt = parsedStmt;
+        scanNodes = planner.getScanNodes();
+        latestTable = new CacheTable();
+        checkCacheConfig();
+    }
+
+    //for unit test
+    public CacheAnalyzer(ConnectContext context, StatementBase parsedStmt, List<ScanNode> scanNodes) {
+        this.context = context;
+        this.parsedStmt = parsedStmt;
+        this.scanNodes = scanNodes;
+        checkCacheConfig();
+    }
+
+    private void checkCacheConfig() {
+        if (Config.enable_sql_cache) {
+            if (context.getSessionVariable().isEnableSqlCache()) {
+                enableSqlCache = true;
+            }
+        }
+        if (Config.enable_partition_cache) {
+            if (context.getSessionVariable().isEnablePartitionCache()) {
+                enablePartitionCache = true;
+            }
+        }
+    }
+
+    public CacheMode getCacheMode() {
+        return cacheMode;
+    }
+
+    public class CacheTable implements Comparable<CacheTable> {
+        public OlapTable olapTable;
+        public long latestId;
+        public long latestVersion;
+        public long latestTime;
+
+        public CacheTable() {
+            olapTable = null;
+            latestId = 0;
+            latestVersion = 0;
+            latestTime = 0;
+        }
+
+        @Override
+        public int compareTo(CacheTable table) {
+            return (int) (table.latestTime - this.latestTime);
+        }
+
+        public void Debug() {
+            LOG.info("table {}, partition id {}, ver {}, time {}", olapTable.getName(), latestId, latestVersion, latestTime);
+        }
+    }
+
+    public boolean enableCache() {
+        return enableSqlCache || enablePartitionCache;
+    }
+
+    public boolean enableSqlCache() {
+        return enableSqlCache;
+    }
+
+    public boolean enablePartitionCache() {
+        return enablePartitionCache;
+    }
+
+    /**
+     * Check cache mode with SQL and table
+     * 1、Only Olap table
+     * 2、The update time of the table is before Config.last_version_interval_time
+     * 2、PartitionType is PartitionType.RANGE, and partition key has only one column
+     * 4、Partition key must be included in the group by clause
+     * 5、Where clause must contain only one partition key predicate
+     * CacheMode.Sql
+     * xxx FROM user_profile, updated before Config.last_version_interval_time
+     * CacheMode.Partition, partition by event_date, only the partition of today will be updated.
+     * SELECT xxx FROM app_event WHERE event_date >= 20191201 AND event_date <= 20191207 GROUP BY event_date
+     * SELECT xxx FROM app_event INNER JOIN user_Profile ON app_event.user_id = user_profile.user_id xxx
+     * SELECT xxx FROM app_event INNER JOIN user_profile ON xxx INNER JOIN site_channel ON xxx
+     */
+    public void checkCacheMode(long now) {

Review comment:
       Check whether a SQL hit the cache in explain?
   otherwise we will check the log, it's inconvenient




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] blackfox1983 commented on a change in pull request #3994: SQL result cache and partition cache

Posted by GitBox <gi...@apache.org>.

blackfox1983 commented on a change in pull request #3994:
URL: https://github.com/apache/incubator-doris/pull/3994#discussion_r448279546



##########
File path: be/src/common/config.h
##########
@@ -536,6 +536,15 @@ namespace config {
     // Whether to continue to start be when load tablet from header failed.
     CONF_Bool(ignore_load_tablet_failure, "false");
 
+    // Set max cache's size of query results, the unit is M byte
+    CONF_Int32(cache_max_size, "256"); 
+
+    //Cache memory is pruened when reach cache_max_size + cache_elasticity_size

Review comment:
       cache memory will be shrinked?
   i do not understand what does this mean....




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[GitHub] [incubator-doris] marising commented on pull request #3994: SQL result cache and partition cache

Posted by GitBox <gi...@apache.org>.

marising commented on pull request #3994:
URL: https://github.com/apache/incubator-doris/pull/3994#issuecomment-656430449


   I split the PR and submit the be part first
   #4005 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org