You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/11/23 12:09:46 UTC

[GitHub] [druid] petermarshallio commented on a change in pull request #11584: Docs - query caching

petermarshallio commented on a change in pull request #11584:
URL: https://github.com/apache/druid/pull/11584#discussion_r755062147



##########
File path: docs/querying/caching.md
##########
@@ -32,30 +32,50 @@ If you're unfamiliar with Druid architecture, review the following topics before
 
 For instructions to configure query caching see [Using query caching](./using-caching.md).
 
+Cache monitoring, including the hit rate and number of evictions, is available in [Druid metrics](../operations/metrics.html#cache).
+
+Query-level caching is in addition to [data-level caching](../design/historical.md) on Historicals.
+
 ## Cache types
 
-Druid supports the following types of caches:
+Druid supports two types of query caching:
 
-- **Per-segment** caching which stores _partial results_ of a query for a specific segment. Per-segment caching is enabled on Historicals by default.
-- **Whole-query** caching which stores all results for a query.
+- [Per-segment caching](#per-segment-caching) stores _partial_ query results for a specific segment (enabled by default).
+- [Whole-query caching](#whole-query-caching) stores _final_ query results.
 
-To avoid returning stale results, Druid invalidates the cache the moment any underlying data changes for both types of cache.
+> **Druid invalidates _any_ cache the moment any underlying data changes**
+>
+> This ensures that Druid does not return stale results, especially important for `table` datasources that have highly-variable underlying data segments, including real-time data segments.
 
-Druid can store cache data on the local JVM heap or in an external distributed key/value store. The default is a local cache based upon [Caffeine](https://github.com/ben-manes/caffeine). Maximum cache storage defaults to the minimum value of 1 GiB or the ten percent of the maximum runtime memory for the JVM with no cache expiration. See [Cache configuration](../configuration/index.md#cache-configuration) for information on how to configure cache storage.
+> **Druid can store cache data on the local JVM heap or in an external distributed key/value store (e.g. memcached)**
+>
+> The default is a local cache based upon [Caffeine](https://github.com/ben-manes/caffeine). The default maximum cache storage size is the minimum of 1 GiB / ten percent of maximum runtime memory for the JVM, with no cache expiration. See [Cache configuration](../configuration/index.md#cache-configuration) for information on how to configure cache storage.  When using caffeine, the cache is inside the JVM heap and is directly measurable.  Heap usage will grow up to the maximum configured size, and then the least recently used segment results will be evicted and replaced with newer results.
 
 ### Per-segment caching
 
-The primary form of caching in Druid is the **per-segment cache** which stores query results on a per-segment basis. It is enabled on Historical services by default.
+The primary form of caching in Druid is a *per-segment results cache*.  This stores partial query results on a per-segment basis and is enabled on Historical services by default.
+
+It allows Druid to maintain a low-eviction-rate cache for segments that do not change, especially important for those segments that [historical](../design/historical.html) processes pull into their local _segment cache_ from [deep storage](../dependencies/deep-storage.html) as instructed by the lead [coordinator](../design/coordinator.html).  Meanwhile, real-time segments, on the other hand, continue to have results computed at query time.
 
-When your queries include data from segments that are mutable and undergoing real-time ingestion, use a segment cache. In this case Druid caches query results for immutable historical segments when possible. It re-computes results for the real-time segments at query time.
+Per-segment cached results also have the potential to be merged into the results of later queries where there is a similar basic shape (filters, aggregations, etc.) yet cover a different period of time, for example.
 
-For example, you have queries that frequently include incoming data from a Kafka or Kinesis stream alongside unchanging segments. Per-segment caching lets Druid cache results from older immutable segments and merge them with updated data. Whole-query caching would not be helpful in this scenario because the new data from real-time ingestion continually invalidates the cache.
+Per-segment caching is controlled by the parameters `useCache` and `populateCache`.

Review comment:
       I've moved this into the Using Caching page.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org