You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/10/10 04:29:12 UTC

[GitHub] [druid] abhishekagarwal87 opened a new pull request #10503: Additional documentation for query caching

abhishekagarwal87 opened a new pull request #10503:
URL: https://github.com/apache/druid/pull/10503


   Add documentation for the queries that do not support caching. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #10503: Additional documentation for query caching

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #10503:
URL: https://github.com/apache/druid/pull/10503#discussion_r507924157



##########
File path: docs/querying/caching.md
##########
@@ -82,3 +82,12 @@ Note that the task executor processes only support caches that keep their data l
 This restriction exists because the cache stores results at the level of intermediate partial segments generated by the
 ingestion tasks. These intermediate partial segments will not necessarily be identical across task replicas, so
 remote cache types such as `memcached` will be ignored by task executor processes.
+
+## Unsupported queries
+
+Query caching is not available for following
+- queries, that have a union operation, do not support result-level caching - [More details](https://github.com/apache/druid/issues/8713)

Review comment:
       Cool, thank you for adding the link.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #10503: Additional documentation for query caching

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #10503:
URL: https://github.com/apache/druid/pull/10503#discussion_r506718820



##########
File path: docs/querying/caching.md
##########
@@ -82,3 +82,19 @@ Note that the task executor processes only support caches that keep their data l
 This restriction exists because the cache stores results at the level of intermediate partial segments generated by the
 ingestion tasks. These intermediate partial segments will not necessarily be identical across task replicas, so
 remote cache types such as `memcached` will be ignored by task executor processes.
+
+## Unsupported queries
+
+Query caching is not available for following:
+- Queries, that involve a `union` datasource, do not support result-level caching. Refer to the 
+[related github issue](https://github.com/apache/druid/issues/8713) for details. Top level union SQL queries can still 

Review comment:
       ```
       ../docs/querying/caching.md
          90 | [related github issue](https://github.com/apa 
   >> 1 spelling error found in 167 files
   ```
   
   The CI is failing because of this line. Please add a suppression in `website/.spelling`. BTW, I think it should be `GitHub`. 

##########
File path: docs/querying/caching.md
##########
@@ -82,3 +82,12 @@ Note that the task executor processes only support caches that keep their data l
 This restriction exists because the cache stores results at the level of intermediate partial segments generated by the
 ingestion tasks. These intermediate partial segments will not necessarily be identical across task replicas, so
 remote cache types such as `memcached` will be ignored by task executor processes.
+
+## Unsupported queries
+
+Query caching is not available for following
+- queries, that have a union operation, do not support result-level caching - [More details](https://github.com/apache/druid/issues/8713)

Review comment:
       > I was deliberate in avoiding datasource term since SQL users don't define `datasource` as such. For them, its just union operator.
   
   Even they don't define datasource by themselves, their query will be translated into native queries, which will determine whether it will be cached or not. I think it will be better to be precise so that users don't get confused.
   
   > Though I think Top Level Union queries may still be cached since they are not translated into a Union datasource.
   
   Good point, I'm not sure what you mean by "Top Level Union queries" though. In SQL, the union operator can be translated to either `DruidUnionDataSourceRule` or `DruidUnionRule`. The former is converted to a `union` datasource while the later is executed sequentially by the sql layer. AFAIT, the former can be used when it's `UNION ALL` of flat scan subqueries. The later can be used otherwise (still only for `UNION ALL`). So, the result-level cache cannot be used for the former, but can for the later. Maybe it could say, "Queries, that have a `union` datasource, do not support result-level caching. For SQL, a union SQL query can be translated to a native query with a `union` datasource when it is a `UNION ALL` of flat scan subqueries. These queries cannot be cached at the result-level."




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #10503: Additional documentation for query caching

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #10503:
URL: https://github.com/apache/druid/pull/10503#discussion_r504299242



##########
File path: docs/querying/caching.md
##########
@@ -82,3 +82,12 @@ Note that the task executor processes only support caches that keep their data l
 This restriction exists because the cache stores results at the level of intermediate partial segments generated by the
 ingestion tasks. These intermediate partial segments will not necessarily be identical across task replicas, so
 remote cache types such as `memcached` will be ignored by task executor processes.
+
+## Unsupported queries
+
+Query caching is not available for following
+- queries, that have a union operation, do not support result-level caching - [More details](https://github.com/apache/druid/issues/8713)

Review comment:
       Suggest union operation -> `union` datasource.

##########
File path: docs/querying/caching.md
##########
@@ -82,3 +82,12 @@ Note that the task executor processes only support caches that keep their data l
 This restriction exists because the cache stores results at the level of intermediate partial segments generated by the
 ingestion tasks. These intermediate partial segments will not necessarily be identical across task replicas, so
 remote cache types such as `memcached` will be ignored by task executor processes.
+
+## Unsupported queries
+
+Query caching is not available for following
+- queries, that have a union operation, do not support result-level caching - [More details](https://github.com/apache/druid/issues/8713)
+- queries, that involve an Inline data source or a Lookup data source, do not support any caching. 

Review comment:
       Suggest `inline` datasource and `lookup` datasource. "Datasource" is the correct terminology to refer a datasource.

##########
File path: docs/querying/datasource.md
##########
@@ -134,8 +134,6 @@ another will be treated as if they contained all null values in the tables where
 The list of "dataSources" must be nonempty. If you want to query an empty dataset, use an [`inline` datasource](#inline)
 instead.
 
-Union datasources are not available in Druid SQL.

Review comment:
       Nice catch. Please add a SQL query example as in https://github.com/apache/druid/pull/10503/files#diff-dae1ec2a726a9b89dfdf6eb74b7d7c4d9397e02eb95cf4ec5ef3ee440fdefd4dR75-R79.

##########
File path: docs/querying/caching.md
##########
@@ -82,3 +82,12 @@ Note that the task executor processes only support caches that keep their data l
 This restriction exists because the cache stores results at the level of intermediate partial segments generated by the
 ingestion tasks. These intermediate partial segments will not necessarily be identical across task replicas, so
 remote cache types such as `memcached` will be ignored by task executor processes.
+
+## Unsupported queries
+
+Query caching is not available for following

Review comment:
       Did you intend this line and the below sentences to be separate? If so, please add a colon or a period at the end of this sentence. Also, the below sentences should start with a capital letter.

##########
File path: docs/querying/caching.md
##########
@@ -82,3 +82,12 @@ Note that the task executor processes only support caches that keep their data l
 This restriction exists because the cache stores results at the level of intermediate partial segments generated by the
 ingestion tasks. These intermediate partial segments will not necessarily be identical across task replicas, so
 remote cache types such as `memcached` will be ignored by task executor processes.
+
+## Unsupported queries

Review comment:
       Some missing stuffs:
   
   - `DataSourceMetadataQuery` is never cached.
   - Any queries are not cached in brokers if `bySegment` is set in query context. Historicals don't seem to care about `bySegment` which I think is a bug. We should never cache when it's set.

##########
File path: docs/querying/caching.md
##########
@@ -82,3 +82,12 @@ Note that the task executor processes only support caches that keep their data l
 This restriction exists because the cache stores results at the level of intermediate partial segments generated by the
 ingestion tasks. These intermediate partial segments will not necessarily be identical across task replicas, so
 remote cache types such as `memcached` will be ignored by task executor processes.
+
+## Unsupported queries
+
+Query caching is not available for following
+- queries, that have a union operation, do not support result-level caching - [More details](https://github.com/apache/druid/issues/8713)
+- queries, that involve an Inline data source or a Lookup data source, do not support any caching. 
+- queries, with a sub-query in them, do not support any caching though the output of sub-queries itself may be cached.

Review comment:
       It would be worth linking https://druid.apache.org/docs/0.19.0/querying/query-execution.html#query to help understanding since this requires to know how subquery system works in Druid.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10503: Additional documentation for query caching

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10503:
URL: https://github.com/apache/druid/pull/10503#discussion_r504429222



##########
File path: docs/querying/caching.md
##########
@@ -82,3 +82,12 @@ Note that the task executor processes only support caches that keep their data l
 This restriction exists because the cache stores results at the level of intermediate partial segments generated by the
 ingestion tasks. These intermediate partial segments will not necessarily be identical across task replicas, so
 remote cache types such as `memcached` will be ignored by task executor processes.
+
+## Unsupported queries
+
+Query caching is not available for following
+- queries, that have a union operation, do not support result-level caching - [More details](https://github.com/apache/druid/issues/8713)
+- queries, that involve an Inline data source or a Lookup data source, do not support any caching. 
+- queries, with a sub-query in them, do not support any caching though the output of sub-queries itself may be cached.

Review comment:
       Ack




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] dylwylie commented on a change in pull request #10503: Additional documentation for query caching

Posted by GitBox <gi...@apache.org>.
dylwylie commented on a change in pull request #10503:
URL: https://github.com/apache/druid/pull/10503#discussion_r506959628



##########
File path: docs/querying/caching.md
##########
@@ -82,3 +82,19 @@ Note that the task executor processes only support caches that keep their data l
 This restriction exists because the cache stores results at the level of intermediate partial segments generated by the
 ingestion tasks. These intermediate partial segments will not necessarily be identical across task replicas, so
 remote cache types such as `memcached` will be ignored by task executor processes.
+
+## Unsupported queries
+
+Query caching is not available for following:
+- Queries, that involve a `union` datasource, do not support result-level caching. Refer to the 
+[related github issue](https://github.com/apache/druid/issues/8713) for details. Top level union SQL queries can still 
+be cached at result-level since they are not translated into a `union` datasource. Refer to [Union SQL](sql.md#UNION ALL) 
+for more details on how union SQL queries work.
+- Queries, that involve an `Inline` datasource or a `Lookup` datasource, do not support any caching. 
+- Queries, with a sub-query in them, do not support any caching though the output of sub-queries itself may be cached. 
+Refer to the [Query execution](query-execution.md#query) page for more details on how sub-queries are executed.
+- Join queries do not support any caching on the broker [More details](https://github.com/apache/druid/issues/10444).
+- GroupBy v2 queries do not support any caching on broker [More details](https://github.com/apache/druid/issues/3820).

Review comment:
       The linked issue only refers to segment-level caching.
   
   I could be wrong but I believe GroupBy V2 does support results-level caching?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10503: Additional documentation for query caching

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10503:
URL: https://github.com/apache/druid/pull/10503#discussion_r504435143



##########
File path: docs/querying/caching.md
##########
@@ -82,3 +82,12 @@ Note that the task executor processes only support caches that keep their data l
 This restriction exists because the cache stores results at the level of intermediate partial segments generated by the
 ingestion tasks. These intermediate partial segments will not necessarily be identical across task replicas, so
 remote cache types such as `memcached` will be ignored by task executor processes.
+
+## Unsupported queries
+
+Query caching is not available for following
+- queries, that have a union operation, do not support result-level caching - [More details](https://github.com/apache/druid/issues/8713)

Review comment:
       I was deliberate in avoiding datasource term since SQL users don't define `datasource` as such. For them, its just union operator.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10503: Additional documentation for query caching

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10503:
URL: https://github.com/apache/druid/pull/10503#discussion_r504443102



##########
File path: docs/querying/caching.md
##########
@@ -82,3 +82,12 @@ Note that the task executor processes only support caches that keep their data l
 This restriction exists because the cache stores results at the level of intermediate partial segments generated by the
 ingestion tasks. These intermediate partial segments will not necessarily be identical across task replicas, so
 remote cache types such as `memcached` will be ignored by task executor processes.
+
+## Unsupported queries
+
+Query caching is not available for following
+- queries, that have a union operation, do not support result-level caching - [More details](https://github.com/apache/druid/issues/8713)

Review comment:
       Though I think Top Level Union queries may still be cached since they are not translated into a Union datasource. @gianm is that correct? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10503: Additional documentation for query caching

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10503:
URL: https://github.com/apache/druid/pull/10503#discussion_r506981193



##########
File path: docs/querying/caching.md
##########
@@ -82,3 +82,12 @@ Note that the task executor processes only support caches that keep their data l
 This restriction exists because the cache stores results at the level of intermediate partial segments generated by the
 ingestion tasks. These intermediate partial segments will not necessarily be identical across task replicas, so
 remote cache types such as `memcached` will be ignored by task executor processes.
+
+## Unsupported queries
+
+Query caching is not available for following
+- queries, that have a union operation, do not support result-level caching - [More details](https://github.com/apache/druid/issues/8713)

Review comment:
       `Top Level` and `Table level` terminologies were added recently to the documentation that I have linked here as well. 
   https://druid.apache.org/docs/latest/querying/sql.html#union-all




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10503: Additional documentation for query caching

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10503:
URL: https://github.com/apache/druid/pull/10503#discussion_r506980953



##########
File path: docs/querying/caching.md
##########
@@ -82,3 +82,19 @@ Note that the task executor processes only support caches that keep their data l
 This restriction exists because the cache stores results at the level of intermediate partial segments generated by the
 ingestion tasks. These intermediate partial segments will not necessarily be identical across task replicas, so
 remote cache types such as `memcached` will be ignored by task executor processes.
+
+## Unsupported queries
+
+Query caching is not available for following:
+- Queries, that involve a `union` datasource, do not support result-level caching. Refer to the 
+[related github issue](https://github.com/apache/druid/issues/8713) for details. Top level union SQL queries can still 
+be cached at result-level since they are not translated into a `union` datasource. Refer to [Union SQL](sql.md#UNION ALL) 
+for more details on how union SQL queries work.
+- Queries, that involve an `Inline` datasource or a `Lookup` datasource, do not support any caching. 
+- Queries, with a sub-query in them, do not support any caching though the output of sub-queries itself may be cached. 
+Refer to the [Query execution](query-execution.md#query) page for more details on how sub-queries are executed.
+- Join queries do not support any caching on the broker [More details](https://github.com/apache/druid/issues/10444).
+- GroupBy v2 queries do not support any caching on broker [More details](https://github.com/apache/druid/issues/3820).

Review comment:
       It can support result level caching but right now all the caching is disabled for Group By V2 on broker. Result-level caching was introduced after disabling the broker caching for GroupBy V2. So it result-level caching remained disabled as well though there was no reason for it as such. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson merged pull request #10503: Additional documentation for query caching

Posted by GitBox <gi...@apache.org>.
jihoonson merged pull request #10503:
URL: https://github.com/apache/druid/pull/10503


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org