You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by fj...@apache.org on 2019/03/12 00:57:06 UTC
[incubator-druid] branch master updated: Further improve caching documentation. (#7236)

This is an automated email from the ASF dual-hosted git repository.

fjy pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-druid.git


The following commit(s) were added to refs/heads/master by this push:
     new 9178793  Further improve caching documentation. (#7236)
9178793 is described below

commit 9178793ab5e94da8241f40e275e842c0673fe21a
Author: Gian Merlino <gi...@gmail.com>
AuthorDate: Mon Mar 11 20:57:00 2019 -0400

    Further improve caching documentation. (#7236)
    
    Follow-up to #7223 that fixes a doc bug (a result-level cache property
    was misspelled), changes the recommended "small cluster" threshold from
    20 to 5 servers, and clarifies behavior of the various caching options.
---
 docs/content/querying/caching.md       | 44 ++++++++++++++++++++++++----------
 docs/content/querying/query-context.md |  4 ++--
 2 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/docs/content/querying/caching.md b/docs/content/querying/caching.md
index a1da2cb..8d2cc88 100644
--- a/docs/content/querying/caching.md
+++ b/docs/content/querying/caching.md
@@ -24,23 +24,41 @@ title: "Query Caching"
 
 # Query Caching
 
-Druid supports query result caching through an LRU cache. Results are stored as a whole or either on a per segment basis along with the 
-parameters of a given query. Segment level caching allows Druid to return final results based partially on segment results in the cache 
-and partially on segment results from scanning historical/real-time segments. Result level caching enables Druid to cache the entire 
-result set, so that query results can be completely retrieved from the cache for identical queries.
+Druid supports query result caching at both the segment and whole-query result level. Cache data can be stored in the
+local JVM heap or in an external distributed key/value store. In all cases, the Druid cache is a query result cache.
+The only difference is whether the result is a _partial result_ for a particular segment, or the result for an entire
+query. In both cases, the cache is invalidated as soon as any underlying data changes; it will never return a stale
+result.
 
-Segment results can be stored in a local heap cache or in an external distributed key/value store. Segment query caches 
-can be enabled at either the Historical and Broker level (it is not recommended to enable caching on both).
+Segment-level caching allows the cache to be leveraged even when some of the underling segments are mutable and
+undergoing real-time ingestion. In this case, Druid will potentially cache query results for immutable historical
+segments, while re-computing results for the real-time segments on each query. Whole-query result level caching is not
+useful in this scenario, since it would be continuously invalidated.
+
+Segment-level caching does require Druid to merge the per-segment results on each query, even when they are served
+from the cache. For this reason, whole-query result level caching can be more efficient if invalidation due to real-time
+ingestion is not an issue.
 
 ## Query caching on Brokers
 
-Enabling caching on the Broker can yield faster results than if query caches were enabled on Historicals for small clusters. This is 
-the recommended setup for smaller production clusters (< 20 servers). Take note that when `druid.broker.cache.populateCache` is set to
-`true`, results from Historicals are returned on a per segment basis, and Historicals will not be able to do any local result merging.
-Result level caching is enabled only on the Broker side.
+Brokers support both segment-level and whole-query result level caching. Segment-level caching is controlled by the
+parameters `useCache` and `populateCache`. Whole-query result level caching is controlled by the parameters
+`useResultLevelCache` and `populateResultLevelCache` and [runtime properties](../configuration/index.html)
+`druid.broker.cache.*`..
+
+Enabling segment-level caching on the Broker can yield faster results than if query caches were enabled on Historicals for small
+clusters. This is the recommended setup for smaller production clusters (< 5 servers). Populating segment-level caches on
+the Broker is _not_ recommended for large production clusters, since when the property `druid.broker.cache.populateCache` is
+set to `true` (and query context parameter `populateCache` is _not_ set to `false`), results from Historicals are returned
+on a per segment basis, and Historicals will not be able to do any local result merging. This impairs the ability of the
+Druid cluster to scale well.
 
 ## Query caching on Historicals
 
-Larger production clusters should enable caching only on the Historicals to avoid having to use Brokers to merge all query 
-results. Enabling caching on the Historicals instead of the Brokers enables the Historicals to do their own local result
-merging and puts less strain on the Brokers.
+Historicals only support segment-level caching. Segment-level caching is controlled by the query context
+parameters `useCache` and `populateCache` and [runtime properties](../configuration/index.html)
+`druid.historical.cache.*`.
+
+Larger production clusters should enable segment-level cache population on Historicals only (not on Brokers) to avoid
+having to use Brokers to merge all query results. Enabling cache population on the Historicals instead of the Brokers
+enables the Historicals to do their own local result merging and puts less strain on the Brokers.
diff --git a/docs/content/querying/query-context.md b/docs/content/querying/query-context.md
index ef730f3..9c16d32 100644
--- a/docs/content/querying/query-context.md
+++ b/docs/content/querying/query-context.md
@@ -33,8 +33,8 @@ The query context is used for various query configuration parameters. The follow
 |queryId          | auto-generated                         | Unique identifier given to this query. If a query ID is set or known, this can be used to cancel the query |
 |useCache         | `true`                                 | Flag indicating whether to leverage the query cache for this query. When set to false, it disables reading from the query cache for this query. When set to true, Druid uses druid.broker.cache.useCache or druid.historical.cache.useCache to determine whether or not to read from the query cache |
 |populateCache    | `true`                                 | Flag indicating whether to save the results of the query to the query cache. Primarily used for debugging. When set to false, it disables saving the results of this query to the query cache. When set to true, Druid uses druid.broker.cache.populateCache or druid.historical.cache.populateCache to determine whether or not to save the results of this query to the query cache |
-|useResultLevelCache         | `true`                      | Flag indicating whether to leverage the result level cache for this query. When set to false, it disables reading from the query cache for this query. When set to true, Druid uses druid.broker.cache.useResultLevelCache to determine whether or not to read from the query cache |
-|populateResultLevelCache    | `true`                      | Flag indicating whether to save the results of the query to the result level cache. Primarily used for debugging. When set to false, it disables saving the results of this query to the query cache. When set to true, Druid uses druid.broker.cache.populateCache to determine whether or not to save the results of this query to the query cache |
+|useResultLevelCache         | `true`                      | Flag indicating whether to leverage the result level cache for this query. When set to false, it disables reading from the query cache for this query. When set to true, Druid uses druid.broker.cache.useResultLevelCache to determine whether or not to read from the result-level query cache |
+|populateResultLevelCache    | `true`                      | Flag indicating whether to save the results of the query to the result level cache. Primarily used for debugging. When set to false, it disables saving the results of this query to the query cache. When set to true, Druid uses druid.broker.cache.populateResultLevelCache to determine whether or not to save the results of this query to the result-level query cache |
 |bySegment        | `false`                                | Return "by segment" results. Primarily used for debugging, setting it to `true` returns results associated with the data segment they came from |
 |finalize         | `true`                                 | Flag indicating whether to "finalize" aggregation results. Primarily used for debugging. For instance, the `hyperUnique` aggregator will return the full HyperLogLog sketch instead of the estimated cardinality when this flag is set to `false` |
 |chunkPeriod      | `P0D` (off)                            | At the Broker process level, long interval queries (of any type) may be broken into shorter interval queries to parallelize merging more than normal. Broken up queries will use a larger share of cluster resources, but, if you use groupBy "v1, it may be able to complete faster as a result. Use ISO 8601 periods. For example, if this property is set to `P1M` (one month), then a query covering a year would be broken into 12 smaller [...]


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org