You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/04/08 10:16:10 UTC

[GitHub] [druid] clintropolis opened a new pull request #9647: document useFilterCNF query context parameter

clintropolis opened a new pull request #9647: document useFilterCNF query context parameter
URL: https://github.com/apache/druid/pull/9647
 
 
   Does the thing the title says, and adds `useFilterCNF` to the query context docs. It might be worth adding a preamble of sorts to some of these settings to indicate that they are for advanced tuning and control over druid behavior, but i'm not sure what that should look like or what would be intuitive, so instead just focused on trying to impart that query issuers should experiment before using this setting in a production environment to confirm it is good for the filter shape.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9647: document useFilterCNF query context parameter

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9647: document useFilterCNF query context parameter
URL: https://github.com/apache/druid/pull/9647#discussion_r405972067
 
 

 ##########
 File path: docs/querying/query-context.md
 ##########
 @@ -45,6 +45,7 @@ The query context is used for various query configuration parameters. The follow
 |parallelMergeParallelism|`druid.processing.merge.pool.parallelism`|Maximum number of parallel threads to use for parallel result merging on the Broker. See [Broker configuration](../configuration/index.html#broker) for more details.|
 |parallelMergeInitialYieldRows|`druid.processing.merge.task.initialYieldNumRows`|Number of rows to yield per ForkJoinPool merge task for parallel result merging on the Broker, before forking off a new task to continue merging sequences. See [Broker configuration](../configuration/index.html#broker) for more details.|
 |parallelMergeSmallBatchRows|`druid.processing.merge.task.smallBatchNumRows`|Size of result batches to operate on in ForkJoinPool merge tasks for parallel result merging on the Broker. See [Broker configuration](../configuration/index.html#broker) for more details.|
+|useFilterCNF|`false`| If true, Druid will attempt to convert the query filter to Conjunctive Normal Form (CNF). During query processing, columns can be pre-filtered by intersecting the bitmap indexes of all values that match the eligible filters, often greatly reducing the raw number of rows which need to be scanned. But this effect only happens for the top level filter, or individual clauses of a top level 'and' filter. As such, filters in CNF potentially have a higher chance to utilize a large amount of bitmap indexes on string columns during pre-filtering. However, this setting should be used with great caution, as it can sometimes have a negative effect on performance, and in some cases, the act of computing CNF of a filter can be expensive. We recommend hand tuning your filters to produce an optimal form if possible, or at least verifying through experimentation that using this parameter actually improves your query performance with no ill-effects.|
 
 Review comment:
   > But this effect only happens for the top level filter, or individual clauses of a top level 'and' filter.
   
   Suggest providing a few examples to clarify:
   - An OR filter `A || B` where `A` can be resolved using bitmap indexes but `B` cannot will prevent the whole OR filter from being considered for pre-filtering
   - If it were `A && B` instead, `A` would be considered for pre-filtering but `B` would not.
   - If it were `A && (C || D)` where `C` and `D` can be resolved using bitmap indexes, then the whole filter can be considered for pre-filtering
   - If were `A && (B || C)` only `A` will be considered for pre-filtering

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9647: document useFilterCNF query context parameter

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9647: document useFilterCNF query context parameter
URL: https://github.com/apache/druid/pull/9647#discussion_r405969200
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/QueryContexts.java
 ##########
 @@ -67,7 +68,8 @@
   public static final boolean DEFAULT_ENABLE_JOIN_FILTER_PUSH_DOWN = true;
   public static final boolean DEFAULT_ENABLE_JOIN_FILTER_REWRITE = true;
   public static final boolean DEFAULT_ENABLE_JOIN_FILTER_REWRITE_VALUE_COLUMN_FILTERS = false;
-  public static final long DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY = 10000;
 
 Review comment:
   👍 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9647: document useFilterCNF query context parameter

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9647: document useFilterCNF query context parameter
URL: https://github.com/apache/druid/pull/9647#discussion_r406423219
 
 

 ##########
 File path: docs/querying/query-context.md
 ##########
 @@ -45,6 +45,7 @@ The query context is used for various query configuration parameters. The follow
 |parallelMergeParallelism|`druid.processing.merge.pool.parallelism`|Maximum number of parallel threads to use for parallel result merging on the Broker. See [Broker configuration](../configuration/index.html#broker) for more details.|
 |parallelMergeInitialYieldRows|`druid.processing.merge.task.initialYieldNumRows`|Number of rows to yield per ForkJoinPool merge task for parallel result merging on the Broker, before forking off a new task to continue merging sequences. See [Broker configuration](../configuration/index.html#broker) for more details.|
 |parallelMergeSmallBatchRows|`druid.processing.merge.task.smallBatchNumRows`|Size of result batches to operate on in ForkJoinPool merge tasks for parallel result merging on the Broker. See [Broker configuration](../configuration/index.html#broker) for more details.|
+|useFilterCNF|`false`| If true, Druid will attempt to convert the query filter to Conjunctive Normal Form (CNF). During query processing, columns can be pre-filtered by intersecting the bitmap indexes of all values that match the eligible filters, often greatly reducing the raw number of rows which need to be scanned. But this effect only happens for the top level filter, or individual clauses of a top level 'and' filter. As such, filters in CNF potentially have a higher chance to utilize a large amount of bitmap indexes on string columns during pre-filtering. However, this setting should be used with great caution, as it can sometimes have a negative effect on performance, and in some cases, the act of computing CNF of a filter can be expensive. We recommend hand tuning your filters to produce an optimal form if possible, or at least verifying through experimentation that using this parameter actually improves your query performance with no ill-effects.|
 
 Review comment:
   > Should this be part of this PR, or done as a follow-up? It sort of blows up the scope a bit of what I was looking to do as part of this PR, but it also seems useful so I'm fine either way.
   
   I think the filter tuning guide could be done in a follow-up, it sounds like something that would be much larger than this PR, this PR LGTM.
   
   (There's a merge conflict in the spelling exclusions now)
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis merged pull request #9647: document useFilterCNF query context parameter

Posted by GitBox <gi...@apache.org>.
clintropolis merged pull request #9647: document useFilterCNF query context parameter
URL: https://github.com/apache/druid/pull/9647
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9647: document useFilterCNF query context parameter

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9647: document useFilterCNF query context parameter
URL: https://github.com/apache/druid/pull/9647#discussion_r405999848
 
 

 ##########
 File path: docs/querying/query-context.md
 ##########
 @@ -45,6 +45,7 @@ The query context is used for various query configuration parameters. The follow
 |parallelMergeParallelism|`druid.processing.merge.pool.parallelism`|Maximum number of parallel threads to use for parallel result merging on the Broker. See [Broker configuration](../configuration/index.html#broker) for more details.|
 |parallelMergeInitialYieldRows|`druid.processing.merge.task.initialYieldNumRows`|Number of rows to yield per ForkJoinPool merge task for parallel result merging on the Broker, before forking off a new task to continue merging sequences. See [Broker configuration](../configuration/index.html#broker) for more details.|
 |parallelMergeSmallBatchRows|`druid.processing.merge.task.smallBatchNumRows`|Size of result batches to operate on in ForkJoinPool merge tasks for parallel result merging on the Broker. See [Broker configuration](../configuration/index.html#broker) for more details.|
+|useFilterCNF|`false`| If true, Druid will attempt to convert the query filter to Conjunctive Normal Form (CNF). During query processing, columns can be pre-filtered by intersecting the bitmap indexes of all values that match the eligible filters, often greatly reducing the raw number of rows which need to be scanned. But this effect only happens for the top level filter, or individual clauses of a top level 'and' filter. As such, filters in CNF potentially have a higher chance to utilize a large amount of bitmap indexes on string columns during pre-filtering. However, this setting should be used with great caution, as it can sometimes have a negative effect on performance, and in some cases, the act of computing CNF of a filter can be expensive. We recommend hand tuning your filters to produce an optimal form if possible, or at least verifying through experimentation that using this parameter actually improves your query performance with no ill-effects.|
 
 Review comment:
   Hmm, I think how filters work with query processing and how mechanically filters are split into pre and post filters should be documented _somewhere_, but I don't think this setting is quite the correct avenue. Additionally, since which filters can and can't use bitmaps isn't exactly documented anywhere, I'm not sure how much the examples would help. 
   
   If we added this general description of query processing and how filters are involved could link to this setting, and also could link to documentation we would add for the `filterTuning` added in #8209, as ways the user can help influence how filter processing behaves. Maybe `segments.md` would be an appropriate place since it mentions bitmaps and their role in filtering, or `segment-optimization.md` since it involves how to tune segment sizes? Or perhaps we need an `advanced-tuning.md` to put this and other stuff that users shouldn't really mess with unless they are prepared to roll up their sleeves and experimentally verify the settings to fine tune to their workload?
   
   Should this be part of this PR, or done as a follow-up? It sort of blows up the scope a bit of what I was looking to do as part of this PR, but it also seems useful so I'm fine either way.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org