You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/01/24 20:38:56 UTC

[GitHub] [druid] somu-imply opened a new pull request #12195: Moving in filter check to broker

somu-imply opened a new pull request #12195:
URL: https://github.com/apache/druid/pull/12195


   An existing issue is when users execute a query with a heavy load of numeric IN filters. Internally these are converted to bound filters one per condition which leads to queries stalling historicals. To preempt and catch this we keep the max limit for numeric IN filters to a query to 10K. This can be configured by user through query context as `"maxNumericInFilters": value`
   
   Current behavior now is:
   ```
   SELECT *
   FROM wikipedia
   WHERE metroCode IN (609,517,817,101,102,103)
   ```
   This runs as the default is set to 10,000.
   
   Now if we change the query context as
   ```
   {
     "maxNumericInFilters": 4
   }
   ```
   The following query now works as
   ```
   SELECT *
   FROM wikipedia
   WHERE metroCode IN (609,517,817,101,102,103)
   
   Error: Unknown exception
   Filters for column [metroCode] exceeds configured filter limit of [4]. Cast [6] values to String
   org.apache.druid.sql.calcite.rel.CannotBuildQueryException
   ```
   If the query is now corrected, the same query goes through InFilters and not BoundFilters and the error is not shown and the query passes. An example is
   ```
   SELECT *
   FROM wikipedia
   WHERE metroCode IN ('609','517','817','101','102','103')
   ```
   The value through query context cannot be set to anything below 1 and would throw an error in such a case
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] LakshSingla commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
LakshSingla commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r791413890



##########
File path: docs/querying/query-context.md
##########
@@ -62,7 +62,7 @@ Unless otherwise noted, the following parameters apply to all query types.
 |secondaryPartitionPruning|`true`|Enable secondary partition pruning on the Broker. The Broker will always prune unnecessary segments from the input scan based on a filter on time intervals, but if the data is further partitioned with hash or range partitioning, this option will enable additional pruning based on a filter on secondary partition dimensions.|
 |enableJoinLeftTableScanDirect|`false`|This flag applies to queries which have joins. For joins, where left child is a simple scan with a filter,  by default, druid will run the scan as a query and the join the results to the right child on broker. Setting this flag to true overrides that behavior and druid will attempt to push the join to data servers instead. Please note that the flag could be applicable to queries even if there is no explicit join. since queries can internally translated into a join by the SQL planner.|
 |debug| `false` | Flag indicating whether to enable debugging outputs for the query. When set to false, no additional logs will be produced (logs produced will be entirely dependent on your logging level). When set to true, the following addition logs will be produced:<br />- Log the stack trace of the exception (if any) produced by the query |
-
+|maxNumericInFilters|`10000`|If set to a value between 1 and 10,000, Druid will allow numeric values specified for IN part of the query not to exceed this user defined values and queries with more than 10,000 numeric IN filters will not be run by Druid unless they are cast to String.

Review comment:
       nit: Wording seems a bit off. It is implying (to me) that when the value is set between 1 - 10000 then the behavior would be as specified in the doc. But doesn't mention what would happen if set above 10000. Something like below seems a bit more clearer and concise. Also, in general, I am wary of using the default value in the description of the parameter.
   ```
   |maxNumericInFilters|`10000`| The Maximum number of IN clauses for numeric values that are allowed in a query. To run queries with a greater number of IN clauses, consider casting it to string.
   ```
   Please correct me if I am understanding the flag incorrectly. Also, WDYT? 

##########
File path: docs/querying/query-context.md
##########
@@ -62,7 +62,7 @@ Unless otherwise noted, the following parameters apply to all query types.
 |secondaryPartitionPruning|`true`|Enable secondary partition pruning on the Broker. The Broker will always prune unnecessary segments from the input scan based on a filter on time intervals, but if the data is further partitioned with hash or range partitioning, this option will enable additional pruning based on a filter on secondary partition dimensions.|
 |enableJoinLeftTableScanDirect|`false`|This flag applies to queries which have joins. For joins, where left child is a simple scan with a filter,  by default, druid will run the scan as a query and the join the results to the right child on broker. Setting this flag to true overrides that behavior and druid will attempt to push the join to data servers instead. Please note that the flag could be applicable to queries even if there is no explicit join. since queries can internally translated into a join by the SQL planner.|
 |debug| `false` | Flag indicating whether to enable debugging outputs for the query. When set to false, no additional logs will be produced (logs produced will be entirely dependent on your logging level). When set to true, the following addition logs will be produced:<br />- Log the stack trace of the exception (if any) produced by the query |
-
+|maxNumericInFilters|`10000`|If set to a value between 1 and 10,000, Druid will allow numeric values specified for IN part of the query not to exceed this user defined values and queries with more than 10,000 numeric IN filters will not be run by Druid unless they are cast to String.

Review comment:
       nit: Wording seems a bit off. It is implying (to me) that when the value is set between 1 - 10000 then the behavior would be as specified in the doc. But doesn't mention what would happen if set above 10000. Something like below seems a bit more clearer and concise. Also, I am wary of using the default value in the description of the parameter.
   ```
   |maxNumericInFilters|`10000`| The Maximum number of IN clauses for numeric values that are allowed in a query. To run queries with a greater number of IN clauses, consider casting it to string.
   ```
   Please correct me if I am understanding the flag incorrectly. Also, WDYT? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r800291882



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java
##########
@@ -122,6 +127,32 @@ public boolean feature(QueryFeature feature)
         );
       }
     }
+    int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
+
+    // special corner case handling for numeric IN filters
+    // in case of query containing IN (v1, v2, v3,...) where Vi is numeric
+    // a BoundFilter is created internally for each of the values
+    // whereas when Vi s are String the Filters are converted as BoundFilter to SelectorFilter to InFilter
+    // which takes lesser processing for bitmaps
+    // So in a case where user executes a query with multiple numeric INs, flame graph shows BoundFilter.getBitmapResult
+    // and BoundFilter.match predicate eating up processing time which stalls a historical for a query with large number
+    // of numeric INs (> 10K). In such cases user should change the query to specify the IN clauses as String
+    // Instead of IN(v1,v2,v3) user should specify IN('v1','v2','v3')
+    if (numFilters != PlannerConfig.NUM_FILTER_NOT_USED) {
+      if (query.getFilter() instanceof OrDimFilter) {

Review comment:
       With ANDs or ORs or ORs of ANDs traversing the filter tree should not be expensive. Would be O(#filters). With cases of filters going upto 10K or even 100K (which are also rare) this can be done. In case of combinations such as 
   ```
   dim1=1 AND dim2=2 AND dim3=3 AND dim4 IN (1,2,3)
   ```
   with `maxNumericInFilter=4` it would be difficult to point out which dimension is at fault. A generic error message can be thrown in that case saying something like the number of elements in WHERE clause exceeds system specified limit of K




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r800290582



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/PlannerConfig.java
##########
@@ -190,6 +210,46 @@ public PlannerConfig withOverrides(final Map<String, Object> context)
     return newConfig;
   }
 
+  private int validateMaxNumericInFilters(int queryContextMaxNumericInFilters, int systemConfigMaxNumericInFilters)
+  {
+    // if maxNumericInFIlters through context == 0 catch exception
+    // else if query context exceeds system set value throw error
+    if (queryContextMaxNumericInFilters == 0) {
+      throw new UOE("[%s] must be greater than 0", CTX_MAX_NUMERIC_IN_FILTERS);
+    } else if (queryContextMaxNumericInFilters > systemConfigMaxNumericInFilters
+               && systemConfigMaxNumericInFilters != NUM_FILTER_NOT_USED) {
+      throw new UOE(
+          "Expected parameter[%s] cannot exceed system set value of [%d]",
+          CTX_MAX_NUMERIC_IN_FILTERS,
+          systemConfigMaxNumericInFilters
+      );
+    }
+    // if system set value is not present, thereby inferring default of -1
+    if (systemConfigMaxNumericInFilters == NUM_FILTER_NOT_USED) {
+      return systemConfigMaxNumericInFilters;

Review comment:
       In case the value is set to -1 (disabled by default), we do not consider the value entered through query context as the feature is disabled by default. Hence we return the `systemConfigMaxNumericInFilter`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r791179602



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/PlannerConfig.java
##########
@@ -190,6 +201,24 @@ public PlannerConfig withOverrides(final Map<String, Object> context)
     return newConfig;
   }
 
+  private static int getContextInt(
+      final Map<String, Object> context,
+      final String parameter,
+      final int defaultValue
+  )
+  {
+    final Object value = context.get(parameter);
+    if (value == null) {
+      return defaultValue;
+    } else if (value instanceof String) {
+      return Integer.parseInt((String) value);
+    } else if (value instanceof Integer) {
+      return (Integer) value;
+    } else {
+      throw new IAE("Expected parameter[%s] to be integer", parameter);
+    }

Review comment:
       nit: you can use `Numbers.parseInt()` instead.

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java
##########
@@ -122,6 +125,24 @@ public boolean feature(QueryFeature feature)
         );
       }
     }
+    int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
+    if (numFilters < 1) {
+      throw new CannotBuildQueryException("maxNumericInFilters must be greater than 0");
+    }
+    //special corner case handling for numeric IN filters

Review comment:
       Can you add some comments on why this edge case is worth an extra check here and why the scope of the check is specialized to a particular filter shape?

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java
##########
@@ -122,6 +125,24 @@ public boolean feature(QueryFeature feature)
         );
       }
     }
+    int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
+    if (numFilters < 1) {
+      throw new CannotBuildQueryException("maxNumericInFilters must be greater than 0");

Review comment:
       @somu-imply `CannotBuildQueryException` doesn't seem right because it's a configuration error, not query planning error.
   
   > Is it helpful to add a hint using `PlannerContext#setPlanningError` here? I don't see any usages of that in the `NativeQueryMaker` so I might be wrong.
   
   ```
     /**
      * Sets the planning error in the context that will be shown to the user if the SQL query cannot be translated
      * to a native query. This error is often a hint and thus should be phrased as such. Also, the final plan can
      * be very different from SQL that user has written. So again, the error should be phrased to indicate this gap
      * clearly.
      */
     public void setPlanningError(String formatText, Object... arguments)
   ```
   
   @LakshSingla Here is the javadoc of `setPlanningError`. This method can be used when there is an exception thrown during query planning that can be useful for users to fix their queries by themselves to avoid the planning error. Since `queryMaker` is called after query planning is done, it seems fine to not use it.

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java
##########
@@ -122,6 +125,24 @@ public boolean feature(QueryFeature feature)
         );
       }
     }
+    int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
+    if (numFilters < 1) {
+      throw new CannotBuildQueryException("maxNumericInFilters must be greater than 0");
+    }
+    //special corner case handling for numeric IN filters
+    if (query.getFilter() instanceof OrDimFilter) {
+      OrDimFilter orDimFilter = (OrDimFilter) query.getFilter();
+      if (orDimFilter.getFields().size() > numFilters) {
+        String dimension = ((BoundDimFilter) (orDimFilter.getFields().get(0))).getDimension();
+        throw new CannotBuildQueryException(StringUtils.format(
+            "Filters for column [%s] exceeds configured filter limit of [%s]. Cast [%s] values to String",

Review comment:
       This error message seems not intuitive as it suggest casting integers to strings. It could be useful to add more details on why they should consider the casting.

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -6903,6 +6903,27 @@ public void testMaxSubqueryRows() throws Exception
     );
   }
 
+  @Test
+  public void testZeroMaxNumericInFilter() throws Exception

Review comment:
       Yes, we should add a test that covers this case. @somu-imply can you please add one?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] LakshSingla commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
LakshSingla commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r791173507



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java
##########
@@ -122,6 +125,24 @@ public boolean feature(QueryFeature feature)
         );
       }
     }
+    int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
+    if (numFilters < 1) {

Review comment:
       Can this validation be moved somehow outside the `NativeQueryMaker`, since this is not specific to runQuery, but to when the `PlannerConfig` gets generated? There can be 2 cases to it: 
   1. When overriding incorrectly in the specific query (i.e. `PlannerConfig#withOverrides` method)
   2. When passing the incorrect value as `druid.sql.planner.maxNumericInFilter` -> Not sure how to achieve this since we can't validate @JsonProperty I think.
   Let me know wdyt about moving this check elsewhere? 

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -6903,6 +6903,27 @@ public void testMaxSubqueryRows() throws Exception
     );
   }
 
+  @Test
+  public void testZeroMaxNumericInFilter() throws Exception

Review comment:
       W.r.t tests, following case can also be helpful: when the number of IN filters in the query are greater than the MAX_NUMERIC_IN_FILTERS.

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java
##########
@@ -122,6 +125,24 @@ public boolean feature(QueryFeature feature)
         );
       }
     }
+    int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
+    if (numFilters < 1) {
+      throw new CannotBuildQueryException("maxNumericInFilters must be greater than 0");

Review comment:
       Is it helpful to add a hint using `PlannerContext#setPlanningError` here? I don't see any usages of that in the `NativeQueryMaker` so I might be wrong. 

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java
##########
@@ -122,6 +125,24 @@ public boolean feature(QueryFeature feature)
         );
       }
     }
+    int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
+    if (numFilters < 1) {
+      throw new CannotBuildQueryException("maxNumericInFilters must be greater than 0");

Review comment:
       Thanks. Yeah, I realized as well that setting the error afterward would be in vain since it doesn't get surfaced as a hint to the user. 

##########
File path: docs/configuration/index.md
##########
@@ -1846,6 +1846,7 @@ The Druid SQL server is configured through the following properties on the Broke
 |`druid.sql.planner.metadataSegmentPollPeriod`|How often to poll coordinator for published segments list if `druid.sql.planner.metadataSegmentCacheEnable` is set to true. Poll period is in milliseconds. |60000|
 |`druid.sql.planner.authorizeSystemTablesDirectly`|If true, Druid authorizes queries against any of the system schema tables (`sys` in SQL) as `SYSTEM_TABLE` resources which require `READ` access, in addition to permissions based content filtering.|false|
 |`druid.sql.planner.useNativeQueryExplain`|If true, `EXPLAIN PLAN FOR` will return the explain plan as a JSON representation of equivalent native query(s), else it will return the original version of explain plan generated by Calcite. It can be overridden per query with `useNativeQueryExplain` context key.|false|
+|`druid.sql.planner.maxNumericInFilters`|If set to a value between 1 and 10,000, Druid will allow numeric values specified for IN part of the query not to exceed this user defined values and queries with more than 10,000 numeric IN filters will not be run by Druid unless they are cast to String.|`10000`|

Review comment:
       Would be good if we specify that it could be overridden in the context using the `maxNumericInFilters` key. 

##########
File path: docs/querying/query-context.md
##########
@@ -62,7 +62,7 @@ Unless otherwise noted, the following parameters apply to all query types.
 |secondaryPartitionPruning|`true`|Enable secondary partition pruning on the Broker. The Broker will always prune unnecessary segments from the input scan based on a filter on time intervals, but if the data is further partitioned with hash or range partitioning, this option will enable additional pruning based on a filter on secondary partition dimensions.|
 |enableJoinLeftTableScanDirect|`false`|This flag applies to queries which have joins. For joins, where left child is a simple scan with a filter,  by default, druid will run the scan as a query and the join the results to the right child on broker. Setting this flag to true overrides that behavior and druid will attempt to push the join to data servers instead. Please note that the flag could be applicable to queries even if there is no explicit join. since queries can internally translated into a join by the SQL planner.|
 |debug| `false` | Flag indicating whether to enable debugging outputs for the query. When set to false, no additional logs will be produced (logs produced will be entirely dependent on your logging level). When set to true, the following addition logs will be produced:<br />- Log the stack trace of the exception (if any) produced by the query |
-
+|maxNumericInFilters|`10000`|If set to a value between 1 and 10,000, Druid will allow numeric values specified for IN part of the query not to exceed this user defined values and queries with more than 10,000 numeric IN filters will not be run by Druid unless they are cast to String.

Review comment:
       nit: Wording seems a bit off. It is implying (to me) that when the value is set between 1 - 10000 then the behavior would be as specified in the doc. But doesn't mention what would happen if set above 10000. Something like below seems a bit more clearer and concise. Also, in general, I am wary of using the default value in the description of the doc.
   ```
   |maxNumericInFilters|`10000`| The Maximum number of IN clauses for numeric values that are allowed in a query. To run queries with a greater number of IN clauses, consider casting it to string.
   ```
   Please correct me if I am understanding the flag incorrectly. Also, WDYT? 

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/util/CalciteTests.java
##########
@@ -300,7 +300,7 @@ public AuthenticationResult createEscalatedAuthenticationResult()
           new TimestampSpec(TIMESTAMP_COLUMN, "iso", null),
           new DimensionsSpec(
               ImmutableList.<DimensionSchema>builder()
-                  .addAll(DimensionsSpec.getDefaultSchemas(ImmutableList.of("dim1", "dim2", "dim3", "dim4", "dim5")))
+                  .addAll(DimensionsSpec.getDefaultSchemas(ImmutableList.of("dim1", "dim2", "dim3", "dim4", "dim5", "dim6")))

Review comment:
       Please confirm once if other tests are not failing on changing the row shape. 

##########
File path: docs/querying/query-context.md
##########
@@ -62,7 +62,7 @@ Unless otherwise noted, the following parameters apply to all query types.
 |secondaryPartitionPruning|`true`|Enable secondary partition pruning on the Broker. The Broker will always prune unnecessary segments from the input scan based on a filter on time intervals, but if the data is further partitioned with hash or range partitioning, this option will enable additional pruning based on a filter on secondary partition dimensions.|
 |enableJoinLeftTableScanDirect|`false`|This flag applies to queries which have joins. For joins, where left child is a simple scan with a filter,  by default, druid will run the scan as a query and the join the results to the right child on broker. Setting this flag to true overrides that behavior and druid will attempt to push the join to data servers instead. Please note that the flag could be applicable to queries even if there is no explicit join. since queries can internally translated into a join by the SQL planner.|
 |debug| `false` | Flag indicating whether to enable debugging outputs for the query. When set to false, no additional logs will be produced (logs produced will be entirely dependent on your logging level). When set to true, the following addition logs will be produced:<br />- Log the stack trace of the exception (if any) produced by the query |
-
+|maxNumericInFilters|`10000`|If set to a value between 1 and 10,000, Druid will allow numeric values specified for IN part of the query not to exceed this user defined values and queries with more than 10,000 numeric IN filters will not be run by Druid unless they are cast to String.

Review comment:
       nit: Wording seems a bit off. It is implying (to me) that when the value is set between 1 - 10000 then the behavior would be as specified in the doc. But doesn't mention what would happen if set above 10000. Something like below seems a bit more clearer and concise. Also, in general, I am wary of using the default value in the description of the parameter.
   ```
   |maxNumericInFilters|`10000`| The Maximum number of IN clauses for numeric values that are allowed in a query. To run queries with a greater number of IN clauses, consider casting it to string.
   ```
   Please correct me if I am understanding the flag incorrectly. Also, WDYT? 

##########
File path: docs/querying/query-context.md
##########
@@ -62,7 +62,7 @@ Unless otherwise noted, the following parameters apply to all query types.
 |secondaryPartitionPruning|`true`|Enable secondary partition pruning on the Broker. The Broker will always prune unnecessary segments from the input scan based on a filter on time intervals, but if the data is further partitioned with hash or range partitioning, this option will enable additional pruning based on a filter on secondary partition dimensions.|
 |enableJoinLeftTableScanDirect|`false`|This flag applies to queries which have joins. For joins, where left child is a simple scan with a filter,  by default, druid will run the scan as a query and the join the results to the right child on broker. Setting this flag to true overrides that behavior and druid will attempt to push the join to data servers instead. Please note that the flag could be applicable to queries even if there is no explicit join. since queries can internally translated into a join by the SQL planner.|
 |debug| `false` | Flag indicating whether to enable debugging outputs for the query. When set to false, no additional logs will be produced (logs produced will be entirely dependent on your logging level). When set to true, the following addition logs will be produced:<br />- Log the stack trace of the exception (if any) produced by the query |
-
+|maxNumericInFilters|`10000`|If set to a value between 1 and 10,000, Druid will allow numeric values specified for IN part of the query not to exceed this user defined values and queries with more than 10,000 numeric IN filters will not be run by Druid unless they are cast to String.

Review comment:
       nit: Wording seems a bit off. It is implying (to me) that when the value is set between 1 - 10000 then the behavior would be as specified in the doc. But doesn't mention what would happen if set above 10000. Something like below seems a bit more clearer and concise. Also, I am wary of using the default value in the description of the parameter.
   ```
   |maxNumericInFilters|`10000`| The Maximum number of IN clauses for numeric values that are allowed in a query. To run queries with a greater number of IN clauses, consider casting it to string.
   ```
   Please correct me if I am understanding the flag incorrectly. Also, WDYT? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r800292742



##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -6903,6 +6904,67 @@ public void testMaxSubqueryRows() throws Exception
     );
   }
 
+  @Test
+  public void testZeroMaxNumericInFilter() throws Exception
+  {
+    expectedException.expect(UOE.class);
+    expectedException.expectMessage("[maxNumericInFilters] must be greater than 0");
+
+    testQuery(
+        PLANNER_CONFIG_DEFAULT,
+        ImmutableMap.of(QueryContexts.MAX_NUMERIC_IN_FILTERS, 0),
+        "SELECT COUNT(*)\n"
+        + "FROM druid.numfoo\n"
+        + "WHERE dim6 IN (\n"
+        + "1,2,3\n"
+        + ")\n",
+        CalciteTests.REGULAR_USER_AUTH_RESULT,
+        ImmutableList.of(),
+        ImmutableList.of()
+    );
+  }
+
+  @Test
+  public void testHighestMaxNumericInFilter() throws Exception
+  {
+    expectedException.expect(UOE.class);
+    expectedException.expectMessage("Expected parameter[maxNumericInFilters] cannot exceed system set value of [100]");
+
+    testQuery(
+        PLANNER_CONFIG_MAX_NUMERIC_IN_FILTER,
+        ImmutableMap.of(QueryContexts.MAX_NUMERIC_IN_FILTERS, 20000),
+        "SELECT COUNT(*)\n"
+        + "FROM druid.numfoo\n"
+        + "WHERE dim6 IN (\n"
+        + "1,2,3\n"
+        + ")\n",
+        CalciteTests.REGULAR_USER_AUTH_RESULT,
+        ImmutableList.of(),
+        ImmutableList.of()
+    );
+  }
+
+  @Test
+  public void testQueryWithMoreThanMaxNumericInFilter() throws Exception
+  {
+    expectedException.expect(UOE.class);
+    expectedException.expectMessage("The number of values in the IN clause for [dim6] in query exceeds configured maxNumericFilter limit of [2] for INs. Cast [3] values of IN clause to String");
+
+    testQuery(
+        PLANNER_CONFIG_MAX_NUMERIC_IN_FILTER,
+        ImmutableMap.of(QueryContexts.MAX_NUMERIC_IN_FILTERS, 2),
+        "SELECT COUNT(*)\n"
+        + "FROM druid.numfoo\n"
+        + "WHERE dim6 IN (\n"

Review comment:
       The case of the NOT does not go through a BoundFilter. It tracks through NotDimFilter -> SelectorDimFilter and the issue reported that was causing queries to stall was the use of BoundFilters. I'll add the additional tests




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r804926704



##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -6903,6 +6904,67 @@ public void testMaxSubqueryRows() throws Exception
     );
   }
 
+  @Test
+  public void testZeroMaxNumericInFilter() throws Exception
+  {
+    expectedException.expect(UOE.class);
+    expectedException.expectMessage("[maxNumericInFilters] must be greater than 0");
+
+    testQuery(
+        PLANNER_CONFIG_DEFAULT,
+        ImmutableMap.of(QueryContexts.MAX_NUMERIC_IN_FILTERS, 0),

Review comment:
       In case of a non zero query context with default config where the feature is turned off by default. Atm we are going with this special case of ORs. We'll revisit the bitmap iterator approach and will add revisit the test cases for them




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] LakshSingla commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
LakshSingla commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r791410654



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java
##########
@@ -122,6 +125,24 @@ public boolean feature(QueryFeature feature)
         );
       }
     }
+    int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
+    if (numFilters < 1) {
+      throw new CannotBuildQueryException("maxNumericInFilters must be greater than 0");

Review comment:
       Thanks. Yeah, I realized as well that setting the error afterward would be in vain since it doesn't get surfaced as a hint to the user. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r791263729



##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -6903,6 +6903,27 @@ public void testMaxSubqueryRows() throws Exception
     );
   }
 
+  @Test
+  public void testZeroMaxNumericInFilter() throws Exception

Review comment:
       Added

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java
##########
@@ -122,6 +125,24 @@ public boolean feature(QueryFeature feature)
         );
       }
     }
+    int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
+    if (numFilters < 1) {
+      throw new CannotBuildQueryException("maxNumericInFilters must be greater than 0");
+    }
+    //special corner case handling for numeric IN filters
+    if (query.getFilter() instanceof OrDimFilter) {
+      OrDimFilter orDimFilter = (OrDimFilter) query.getFilter();
+      if (orDimFilter.getFields().size() > numFilters) {
+        String dimension = ((BoundDimFilter) (orDimFilter.getFields().get(0))).getDimension();
+        throw new CannotBuildQueryException(StringUtils.format(
+            "Filters for column [%s] exceeds configured filter limit of [%s]. Cast [%s] values to String",

Review comment:
       I have updated to make them more meaningful

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java
##########
@@ -122,6 +125,24 @@ public boolean feature(QueryFeature feature)
         );
       }
     }
+    int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
+    if (numFilters < 1) {
+      throw new CannotBuildQueryException("maxNumericInFilters must be greater than 0");
+    }
+    //special corner case handling for numeric IN filters

Review comment:
       Added comments




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] LakshSingla commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
LakshSingla commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r791173507



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java
##########
@@ -122,6 +125,24 @@ public boolean feature(QueryFeature feature)
         );
       }
     }
+    int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
+    if (numFilters < 1) {

Review comment:
       Can this validation be moved somehow outside the `NativeQueryMaker`, since this is not specific to runQuery, but to when the `PlannerConfig` gets generated? There can be 2 cases to it: 
   1. When overriding incorrectly in the specific query (i.e. `PlannerConfig#withOverrides` method)
   2. When passing the incorrect value as `druid.sql.planner.maxNumericInFilter` -> Not sure how to achieve this since we can't validate @JsonProperty I think.
   Let me know wdyt about moving this check elsewhere? 

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -6903,6 +6903,27 @@ public void testMaxSubqueryRows() throws Exception
     );
   }
 
+  @Test
+  public void testZeroMaxNumericInFilter() throws Exception

Review comment:
       W.r.t tests, following case can also be helpful: when the number of IN filters in the query are greater than the MAX_NUMERIC_IN_FILTERS.

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java
##########
@@ -122,6 +125,24 @@ public boolean feature(QueryFeature feature)
         );
       }
     }
+    int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
+    if (numFilters < 1) {
+      throw new CannotBuildQueryException("maxNumericInFilters must be greater than 0");

Review comment:
       Is it helpful to add a hint using `PlannerContext#setPlanningError` here? I don't see any usages of that in the `NativeQueryMaker` so I might be wrong. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] LakshSingla commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
LakshSingla commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r791415699



##########
File path: docs/configuration/index.md
##########
@@ -1846,6 +1846,7 @@ The Druid SQL server is configured through the following properties on the Broke
 |`druid.sql.planner.metadataSegmentPollPeriod`|How often to poll coordinator for published segments list if `druid.sql.planner.metadataSegmentCacheEnable` is set to true. Poll period is in milliseconds. |60000|
 |`druid.sql.planner.authorizeSystemTablesDirectly`|If true, Druid authorizes queries against any of the system schema tables (`sys` in SQL) as `SYSTEM_TABLE` resources which require `READ` access, in addition to permissions based content filtering.|false|
 |`druid.sql.planner.useNativeQueryExplain`|If true, `EXPLAIN PLAN FOR` will return the explain plan as a JSON representation of equivalent native query(s), else it will return the original version of explain plan generated by Calcite. It can be overridden per query with `useNativeQueryExplain` context key.|false|
+|`druid.sql.planner.maxNumericInFilters`|If set to a value between 1 and 10,000, Druid will allow numeric values specified for IN part of the query not to exceed this user defined values and queries with more than 10,000 numeric IN filters will not be run by Druid unless they are cast to String.|`10000`|

Review comment:
       Would be good if we specify that it could be overridden in the context using the `maxNumericInFilters` key. 

##########
File path: docs/querying/query-context.md
##########
@@ -62,7 +62,7 @@ Unless otherwise noted, the following parameters apply to all query types.
 |secondaryPartitionPruning|`true`|Enable secondary partition pruning on the Broker. The Broker will always prune unnecessary segments from the input scan based on a filter on time intervals, but if the data is further partitioned with hash or range partitioning, this option will enable additional pruning based on a filter on secondary partition dimensions.|
 |enableJoinLeftTableScanDirect|`false`|This flag applies to queries which have joins. For joins, where left child is a simple scan with a filter,  by default, druid will run the scan as a query and the join the results to the right child on broker. Setting this flag to true overrides that behavior and druid will attempt to push the join to data servers instead. Please note that the flag could be applicable to queries even if there is no explicit join. since queries can internally translated into a join by the SQL planner.|
 |debug| `false` | Flag indicating whether to enable debugging outputs for the query. When set to false, no additional logs will be produced (logs produced will be entirely dependent on your logging level). When set to true, the following addition logs will be produced:<br />- Log the stack trace of the exception (if any) produced by the query |
-
+|maxNumericInFilters|`10000`|If set to a value between 1 and 10,000, Druid will allow numeric values specified for IN part of the query not to exceed this user defined values and queries with more than 10,000 numeric IN filters will not be run by Druid unless they are cast to String.

Review comment:
       nit: Wording seems a bit off. It is implying (to me) that when the value is set between 1 - 10000 then the behavior would be as specified in the doc. But doesn't mention what would happen if set above 10000. Something like below seems a bit more clearer and concise. Also, in general, I am wary of using the default value in the description of the doc.
   ```
   |maxNumericInFilters|`10000`| The Maximum number of IN clauses for numeric values that are allowed in a query. To run queries with a greater number of IN clauses, consider casting it to string.
   ```
   Please correct me if I am understanding the flag incorrectly. Also, WDYT? 

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/util/CalciteTests.java
##########
@@ -300,7 +300,7 @@ public AuthenticationResult createEscalatedAuthenticationResult()
           new TimestampSpec(TIMESTAMP_COLUMN, "iso", null),
           new DimensionsSpec(
               ImmutableList.<DimensionSchema>builder()
-                  .addAll(DimensionsSpec.getDefaultSchemas(ImmutableList.of("dim1", "dim2", "dim3", "dim4", "dim5")))
+                  .addAll(DimensionsSpec.getDefaultSchemas(ImmutableList.of("dim1", "dim2", "dim3", "dim4", "dim5", "dim6")))

Review comment:
       Please confirm once if other tests are not failing on changing the row shape. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r800227896



##########
File path: docs/querying/query-context.md
##########
@@ -62,7 +62,7 @@ Unless otherwise noted, the following parameters apply to all query types.
 |secondaryPartitionPruning|`true`|Enable secondary partition pruning on the Broker. The Broker will always prune unnecessary segments from the input scan based on a filter on time intervals, but if the data is further partitioned with hash or range partitioning, this option will enable additional pruning based on a filter on secondary partition dimensions.|
 |enableJoinLeftTableScanDirect|`false`|This flag applies to queries which have joins. For joins, where left child is a simple scan with a filter,  by default, druid will run the scan as a query and the join the results to the right child on broker. Setting this flag to true overrides that behavior and druid will attempt to push the join to data servers instead. Please note that the flag could be applicable to queries even if there is no explicit join. since queries can internally translated into a join by the SQL planner.|
 |debug| `false` | Flag indicating whether to enable debugging outputs for the query. When set to false, no additional logs will be produced (logs produced will be entirely dependent on your logging level). When set to true, the following addition logs will be produced:<br />- Log the stack trace of the exception (if any) produced by the query |
-
+|maxNumericInFilters|`-1`|Max limit for the amount of numeric values that can be compared for a string type dimension when the entire SQL WHERE clause of a query translates to an [IN filter](../querying/filters.md#in-filter). By default, Druid does not restrict the amount of numbers in an IN filter, although this situation may block other queries from running. Set this property to a smaller value to prevent Druid from running queries that have prohibitively long segment processing times. The optimal limit requires some trial and error; we recommend starting with 100.  Users who submit a query that exceeds the limit of `maxNumericInFilters` should instead rewrite their queries to use strings in the IN filter instead of numbers. For example, `WHERE someString IN (‘123’, ‘456’)`. This value cannot exceed the set system configuration `druid.sql.planner.maxNumericInFilters`. This value is ignored if `druid.sql.planner.maxNumericInFilters` is not set explicitly.|

Review comment:
       Docs should include a note of how this field interacts with the system property `maxNumericInFilters`

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/PlannerConfig.java
##########
@@ -190,6 +210,46 @@ public PlannerConfig withOverrides(final Map<String, Object> context)
     return newConfig;
   }
 
+  private int validateMaxNumericInFilters(int queryContextMaxNumericInFilters, int systemConfigMaxNumericInFilters)
+  {
+    // if maxNumericInFIlters through context == 0 catch exception
+    // else if query context exceeds system set value throw error
+    if (queryContextMaxNumericInFilters == 0) {
+      throw new UOE("[%s] must be greater than 0", CTX_MAX_NUMERIC_IN_FILTERS);
+    } else if (queryContextMaxNumericInFilters > systemConfigMaxNumericInFilters
+               && systemConfigMaxNumericInFilters != NUM_FILTER_NOT_USED) {
+      throw new UOE(
+          "Expected parameter[%s] cannot exceed system set value of [%d]",
+          CTX_MAX_NUMERIC_IN_FILTERS,
+          systemConfigMaxNumericInFilters
+      );
+    }
+    // if system set value is not present, thereby inferring default of -1
+    if (systemConfigMaxNumericInFilters == NUM_FILTER_NOT_USED) {
+      return systemConfigMaxNumericInFilters;

Review comment:
       I think you want this to return the queryContextMaxNumericInFilters in this case. The reason I think this is a desired behavior is because it allows a sys admin to test what a reasonable default would be for the cluster by using the query context. Once the sys admin sets the maxNumericInFilters to some value, then having the system config be an upper limit makes sense. This should be documented in `configuration/index.md` and `querying/query-context.md`
   ```suggestion
         return queryContextMaxNumericInFilters;
   ```

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -6903,6 +6904,67 @@ public void testMaxSubqueryRows() throws Exception
     );
   }
 
+  @Test
+  public void testZeroMaxNumericInFilter() throws Exception
+  {
+    expectedException.expect(UOE.class);
+    expectedException.expectMessage("[maxNumericInFilters] must be greater than 0");
+
+    testQuery(
+        PLANNER_CONFIG_DEFAULT,
+        ImmutableMap.of(QueryContexts.MAX_NUMERIC_IN_FILTERS, 0),

Review comment:
       Can we add a test for a non 0 query context with the planner config default to address my comment in PlannerConfig

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -6903,6 +6904,67 @@ public void testMaxSubqueryRows() throws Exception
     );
   }
 
+  @Test
+  public void testZeroMaxNumericInFilter() throws Exception
+  {
+    expectedException.expect(UOE.class);
+    expectedException.expectMessage("[maxNumericInFilters] must be greater than 0");
+
+    testQuery(
+        PLANNER_CONFIG_DEFAULT,
+        ImmutableMap.of(QueryContexts.MAX_NUMERIC_IN_FILTERS, 0),
+        "SELECT COUNT(*)\n"
+        + "FROM druid.numfoo\n"
+        + "WHERE dim6 IN (\n"
+        + "1,2,3\n"
+        + ")\n",
+        CalciteTests.REGULAR_USER_AUTH_RESULT,
+        ImmutableList.of(),
+        ImmutableList.of()
+    );
+  }
+
+  @Test
+  public void testHighestMaxNumericInFilter() throws Exception
+  {
+    expectedException.expect(UOE.class);
+    expectedException.expectMessage("Expected parameter[maxNumericInFilters] cannot exceed system set value of [100]");
+
+    testQuery(
+        PLANNER_CONFIG_MAX_NUMERIC_IN_FILTER,
+        ImmutableMap.of(QueryContexts.MAX_NUMERIC_IN_FILTERS, 20000),
+        "SELECT COUNT(*)\n"
+        + "FROM druid.numfoo\n"
+        + "WHERE dim6 IN (\n"
+        + "1,2,3\n"
+        + ")\n",
+        CalciteTests.REGULAR_USER_AUTH_RESULT,
+        ImmutableList.of(),
+        ImmutableList.of()
+    );
+  }
+
+  @Test
+  public void testQueryWithMoreThanMaxNumericInFilter() throws Exception
+  {
+    expectedException.expect(UOE.class);
+    expectedException.expectMessage("The number of values in the IN clause for [dim6] in query exceeds configured maxNumericFilter limit of [2] for INs. Cast [3] values of IN clause to String");
+
+    testQuery(
+        PLANNER_CONFIG_MAX_NUMERIC_IN_FILTER,
+        ImmutableMap.of(QueryContexts.MAX_NUMERIC_IN_FILTERS, 2),
+        "SELECT COUNT(*)\n"
+        + "FROM druid.numfoo\n"
+        + "WHERE dim6 IN (\n"

Review comment:
       I think we'll want some additional tests here for more complex filters - Some examples:
   * ORs of multiple dimensions eg. something like dim1 = 1 OR dim2 =2 or dim3 = 3
   * OR filter several levels deep eg. dim1='one' AND dim2 IN (1,2,3,4,5)
   * NOT IN filter eg. `dim6 NOT IN (...)`

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java
##########
@@ -122,6 +127,32 @@ public boolean feature(QueryFeature feature)
         );
       }
     }
+    int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
+
+    // special corner case handling for numeric IN filters
+    // in case of query containing IN (v1, v2, v3,...) where Vi is numeric
+    // a BoundFilter is created internally for each of the values
+    // whereas when Vi s are String the Filters are converted as BoundFilter to SelectorFilter to InFilter
+    // which takes lesser processing for bitmaps
+    // So in a case where user executes a query with multiple numeric INs, flame graph shows BoundFilter.getBitmapResult
+    // and BoundFilter.match predicate eating up processing time which stalls a historical for a query with large number
+    // of numeric INs (> 10K). In such cases user should change the query to specify the IN clauses as String
+    // Instead of IN(v1,v2,v3) user should specify IN('v1','v2','v3')
+    if (numFilters != PlannerConfig.NUM_FILTER_NOT_USED) {
+      if (query.getFilter() instanceof OrDimFilter) {

Review comment:
       This looks like it would only supports filtering on a top level OR filter. However if a filter on a query is nested, then this guardrail would not be effective. For example what if the filter was 
   
   ```
   dim1='one' AND dim2 IN (1,2,3,4,5)
   ```
   
   In this case, it looks like the query would still run.
   
   Do you think walking the entire filter tree would be expensive at this point in the query plan? The other thing I'd like to know more about is how `DimFilter#optimize` plays into this guardrail - is the query filter that is being used here before or after the optimize function is called? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r800291993



##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -6903,6 +6904,67 @@ public void testMaxSubqueryRows() throws Exception
     );
   }
 
+  @Test
+  public void testZeroMaxNumericInFilter() throws Exception
+  {
+    expectedException.expect(UOE.class);
+    expectedException.expectMessage("[maxNumericInFilters] must be greater than 0");
+
+    testQuery(
+        PLANNER_CONFIG_DEFAULT,
+        ImmutableMap.of(QueryContexts.MAX_NUMERIC_IN_FILTERS, 0),

Review comment:
       Sure, I'll add combinations of ANDs and ORs




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r791179602



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/PlannerConfig.java
##########
@@ -190,6 +201,24 @@ public PlannerConfig withOverrides(final Map<String, Object> context)
     return newConfig;
   }
 
+  private static int getContextInt(
+      final Map<String, Object> context,
+      final String parameter,
+      final int defaultValue
+  )
+  {
+    final Object value = context.get(parameter);
+    if (value == null) {
+      return defaultValue;
+    } else if (value instanceof String) {
+      return Integer.parseInt((String) value);
+    } else if (value instanceof Integer) {
+      return (Integer) value;
+    } else {
+      throw new IAE("Expected parameter[%s] to be integer", parameter);
+    }

Review comment:
       nit: you can use `Numbers.parseInt()` instead.

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java
##########
@@ -122,6 +125,24 @@ public boolean feature(QueryFeature feature)
         );
       }
     }
+    int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
+    if (numFilters < 1) {
+      throw new CannotBuildQueryException("maxNumericInFilters must be greater than 0");
+    }
+    //special corner case handling for numeric IN filters

Review comment:
       Can you add some comments on why this edge case is worth an extra check here and why the scope of the check is specialized to a particular filter shape?

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java
##########
@@ -122,6 +125,24 @@ public boolean feature(QueryFeature feature)
         );
       }
     }
+    int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
+    if (numFilters < 1) {
+      throw new CannotBuildQueryException("maxNumericInFilters must be greater than 0");

Review comment:
       @somu-imply `CannotBuildQueryException` doesn't seem right because it's a configuration error, not query planning error.
   
   > Is it helpful to add a hint using `PlannerContext#setPlanningError` here? I don't see any usages of that in the `NativeQueryMaker` so I might be wrong.
   
   ```
     /**
      * Sets the planning error in the context that will be shown to the user if the SQL query cannot be translated
      * to a native query. This error is often a hint and thus should be phrased as such. Also, the final plan can
      * be very different from SQL that user has written. So again, the error should be phrased to indicate this gap
      * clearly.
      */
     public void setPlanningError(String formatText, Object... arguments)
   ```
   
   @LakshSingla Here is the javadoc of `setPlanningError`. This method can be used when there is an exception thrown during query planning that can be useful for users to fix their queries by themselves to avoid the planning error. Since `queryMaker` is called after query planning is done, it seems fine to not use it.

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java
##########
@@ -122,6 +125,24 @@ public boolean feature(QueryFeature feature)
         );
       }
     }
+    int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
+    if (numFilters < 1) {
+      throw new CannotBuildQueryException("maxNumericInFilters must be greater than 0");
+    }
+    //special corner case handling for numeric IN filters
+    if (query.getFilter() instanceof OrDimFilter) {
+      OrDimFilter orDimFilter = (OrDimFilter) query.getFilter();
+      if (orDimFilter.getFields().size() > numFilters) {
+        String dimension = ((BoundDimFilter) (orDimFilter.getFields().get(0))).getDimension();
+        throw new CannotBuildQueryException(StringUtils.format(
+            "Filters for column [%s] exceeds configured filter limit of [%s]. Cast [%s] values to String",

Review comment:
       This error message seems not intuitive as it suggest casting integers to strings. It could be useful to add more details on why they should consider the casting.

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -6903,6 +6903,27 @@ public void testMaxSubqueryRows() throws Exception
     );
   }
 
+  @Test
+  public void testZeroMaxNumericInFilter() throws Exception

Review comment:
       Yes, we should add a test that covers this case. @somu-imply can you please add one?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r799032168



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java
##########
@@ -122,6 +125,24 @@ public boolean feature(QueryFeature feature)
         );
       }
     }
+    int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
+    if (numFilters < 1) {

Review comment:
       @LakshSingla the check has been moved to `PlannerConfig`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on pull request #12195:
URL: https://github.com/apache/druid/pull/12195#issuecomment-1020690373


   @jihoonson I have kept it at 10K as 10K values in IN for wikipedia dataset on my system finishes in 15 sec (https://implydata.atlassian.net/wiki/spaces/~976946634/pages/2395111503/Query+Stability). I have added the testcases and have made the error message more meaningful. @LakshSingla and @jihoonson thanks for your reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson merged pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
jihoonson merged pull request #12195:
URL: https://github.com/apache/druid/pull/12195


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r791954139



##########
File path: docs/querying/query-context.md
##########
@@ -62,7 +62,7 @@ Unless otherwise noted, the following parameters apply to all query types.
 |secondaryPartitionPruning|`true`|Enable secondary partition pruning on the Broker. The Broker will always prune unnecessary segments from the input scan based on a filter on time intervals, but if the data is further partitioned with hash or range partitioning, this option will enable additional pruning based on a filter on secondary partition dimensions.|
 |enableJoinLeftTableScanDirect|`false`|This flag applies to queries which have joins. For joins, where left child is a simple scan with a filter,  by default, druid will run the scan as a query and the join the results to the right child on broker. Setting this flag to true overrides that behavior and druid will attempt to push the join to data servers instead. Please note that the flag could be applicable to queries even if there is no explicit join. since queries can internally translated into a join by the SQL planner.|
 |debug| `false` | Flag indicating whether to enable debugging outputs for the query. When set to false, no additional logs will be produced (logs produced will be entirely dependent on your logging level). When set to true, the following addition logs will be produced:<br />- Log the stack trace of the exception (if any) produced by the query |
-
+|maxNumericInFilters|`10000`|If set to a value between 1 and 10,000, Druid will allow numeric values specified for IN part of the query not to exceed this user defined values and queries with more than 10,000 numeric IN filters will not be run by Druid unless they are cast to String.

Review comment:
       If the value is set above 10K Druid will throw an error saying that you cannot go over 10K numeric IN filters




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply edited a comment on pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply edited a comment on pull request #12195:
URL: https://github.com/apache/druid/pull/12195#issuecomment-1020690373


   @jihoonson I have kept it at 10K as 10K values in IN for wikipedia dataset on my system finishes in 15 sec. I have added the testcases and have made the error message more meaningful. @LakshSingla and @jihoonson thanks for your reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r798973399



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/PlannerConfig.java
##########
@@ -190,6 +210,47 @@ public PlannerConfig withOverrides(final Map<String, Object> context)
     return newConfig;
   }
 
+  private int validateMaxNumericInFilters(int queryContextMaxNumericInFilters, int systemConfigMaxNumericInFilters)
+  {
+    // if maxNumericInFIlters through context == 0 catch exception
+    if (queryContextMaxNumericInFilters == 0) {
+      throw new UOE("[%s] must be greater than 0", CTX_MAX_NUMERIC_IN_FILTERS);
+    }
+    // if query context exceeds system set value throw error
+    else if (queryContextMaxNumericInFilters > systemConfigMaxNumericInFilters
+             && systemConfigMaxNumericInFilters != NUM_FILTER_NOT_USED) {
+      throw new UOE(String.format(
+          "Expected parameter[%s] cannot exceed system set value of [%d]",
+          CTX_MAX_NUMERIC_IN_FILTERS,
+          systemConfigMaxNumericInFilters
+      ));
+    }
+    // if system set value is not present, thereby inferring default of -1
+    if (systemConfigMaxNumericInFilters == NUM_FILTER_NOT_USED) {
+      return systemConfigMaxNumericInFilters;
+    }

Review comment:
       Can you please add this behavior in the doc too? The doc should explain that the query context is ignored when the system property is disabled.

##########
File path: docs/configuration/index.md
##########
@@ -1846,6 +1846,7 @@ The Druid SQL server is configured through the following properties on the Broke
 |`druid.sql.planner.metadataSegmentPollPeriod`|How often to poll coordinator for published segments list if `druid.sql.planner.metadataSegmentCacheEnable` is set to true. Poll period is in milliseconds. |60000|
 |`druid.sql.planner.authorizeSystemTablesDirectly`|If true, Druid authorizes queries against any of the system schema tables (`sys` in SQL) as `SYSTEM_TABLE` resources which require `READ` access, in addition to permissions based content filtering.|false|
 |`druid.sql.planner.useNativeQueryExplain`|If true, `EXPLAIN PLAN FOR` will return the explain plan as a JSON representation of equivalent native query(s), else it will return the original version of explain plan generated by Calcite. It can be overridden per query with `useNativeQueryExplain` context key.|false|
+|`druid.sql.planner.maxNumericInFilters`|Max limit for the amount of numeric values that can be compared for a string type dimension when the entire SQL WHERE clause of a query translates to an [IN filter](../querying/filters.md#in-filter). By default, Druid does not restrict the amount of numbers in an IN filter, although this situation may block other queries from running. Set this property to a smaller value to prevent Druid from running queries that have prohibitively long segment processing times. The optimal limit requires some trial and error; we recommend starting with 100.  Users who submit a query that exceeds the limit of `maxNumericInFilters` should instead rewrite their queries to use strings in the IN filter instead of numbers. For example, `WHERE someString IN (‘123’, ‘456’)`.|`-1`|

Review comment:
       ```suggestion
   |`druid.sql.planner.maxNumericInFilters`|Max limit for the amount of numeric values that can be compared for a string type dimension when the entire SQL WHERE clause of a query translates to an [IN filter](../querying/filters.md#in-filter). By default, Druid does not restrict the amount of numbers in an IN filter, although this situation may block other queries from running. Set this property to a smaller value to prevent Druid from running queries that have prohibitively long segment processing times. The optimal limit requires some trial and error; we recommend starting with 100.  Users who submit a query that exceeds the limit of `maxNumericInFilters` should instead rewrite their queries to use strings in the IN filter instead of numbers. For example, `WHERE someString IN (‘123’, ‘456’)`.|`-1` (disabled)|
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r799032516



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/PlannerConfig.java
##########
@@ -190,6 +210,47 @@ public PlannerConfig withOverrides(final Map<String, Object> context)
     return newConfig;
   }
 
+  private int validateMaxNumericInFilters(int queryContextMaxNumericInFilters, int systemConfigMaxNumericInFilters)
+  {
+    // if maxNumericInFIlters through context == 0 catch exception
+    if (queryContextMaxNumericInFilters == 0) {
+      throw new UOE("[%s] must be greater than 0", CTX_MAX_NUMERIC_IN_FILTERS);
+    }
+    // if query context exceeds system set value throw error
+    else if (queryContextMaxNumericInFilters > systemConfigMaxNumericInFilters
+             && systemConfigMaxNumericInFilters != NUM_FILTER_NOT_USED) {
+      throw new UOE(String.format(
+          "Expected parameter[%s] cannot exceed system set value of [%d]",
+          CTX_MAX_NUMERIC_IN_FILTERS,
+          systemConfigMaxNumericInFilters
+      ));
+    }
+    // if system set value is not present, thereby inferring default of -1
+    if (systemConfigMaxNumericInFilters == NUM_FILTER_NOT_USED) {
+      return systemConfigMaxNumericInFilters;
+    }

Review comment:
       Added




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on pull request #12195:
URL: https://github.com/apache/druid/pull/12195#issuecomment-1035581987


   As discussed the band-aid just addresses a specific use case. A query with multiple numeric values for string columns with OR conditions which blows up use of BoundFilters. We shall inspect the generic case of the use of bitmap iterators through a separate PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r800291993



##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -6903,6 +6904,67 @@ public void testMaxSubqueryRows() throws Exception
     );
   }
 
+  @Test
+  public void testZeroMaxNumericInFilter() throws Exception
+  {
+    expectedException.expect(UOE.class);
+    expectedException.expectMessage("[maxNumericInFilters] must be greater than 0");
+
+    testQuery(
+        PLANNER_CONFIG_DEFAULT,
+        ImmutableMap.of(QueryContexts.MAX_NUMERIC_IN_FILTERS, 0),

Review comment:
       Sure, I'll add combinations of ANDs and ORs




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r791264927



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java
##########
@@ -122,6 +125,24 @@ public boolean feature(QueryFeature feature)
         );
       }
     }
+    int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
+    if (numFilters < 1) {
+      throw new CannotBuildQueryException("maxNumericInFilters must be greater than 0");
+    }
+    //special corner case handling for numeric IN filters
+    if (query.getFilter() instanceof OrDimFilter) {
+      OrDimFilter orDimFilter = (OrDimFilter) query.getFilter();
+      if (orDimFilter.getFields().size() > numFilters) {
+        String dimension = ((BoundDimFilter) (orDimFilter.getFields().get(0))).getDimension();
+        throw new CannotBuildQueryException(StringUtils.format(
+            "Filters for column [%s] exceeds configured filter limit of [%s]. Cast [%s] values to String",

Review comment:
       I have updated to make them more meaningful




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r792053035



##########
File path: docs/configuration/index.md
##########
@@ -1846,6 +1846,7 @@ The Druid SQL server is configured through the following properties on the Broke
 |`druid.sql.planner.metadataSegmentPollPeriod`|How often to poll coordinator for published segments list if `druid.sql.planner.metadataSegmentCacheEnable` is set to true. Poll period is in milliseconds. |60000|
 |`druid.sql.planner.authorizeSystemTablesDirectly`|If true, Druid authorizes queries against any of the system schema tables (`sys` in SQL) as `SYSTEM_TABLE` resources which require `READ` access, in addition to permissions based content filtering.|false|
 |`druid.sql.planner.useNativeQueryExplain`|If true, `EXPLAIN PLAN FOR` will return the explain plan as a JSON representation of equivalent native query(s), else it will return the original version of explain plan generated by Calcite. It can be overridden per query with `useNativeQueryExplain` context key.|false|
+|`druid.sql.planner.maxNumericInFilters`|If set to a value between 1 and 10,000, Druid will allow numeric values specified for IN part of the query not to exceed this user defined values and queries with more than 10,000 numeric IN filters will not be run by Druid unless they are cast to String.|`10000`|

Review comment:
       Done
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r804925212



##########
File path: docs/querying/query-context.md
##########
@@ -62,7 +62,7 @@ Unless otherwise noted, the following parameters apply to all query types.
 |secondaryPartitionPruning|`true`|Enable secondary partition pruning on the Broker. The Broker will always prune unnecessary segments from the input scan based on a filter on time intervals, but if the data is further partitioned with hash or range partitioning, this option will enable additional pruning based on a filter on secondary partition dimensions.|
 |enableJoinLeftTableScanDirect|`false`|This flag applies to queries which have joins. For joins, where left child is a simple scan with a filter,  by default, druid will run the scan as a query and the join the results to the right child on broker. Setting this flag to true overrides that behavior and druid will attempt to push the join to data servers instead. Please note that the flag could be applicable to queries even if there is no explicit join. since queries can internally translated into a join by the SQL planner.|
 |debug| `false` | Flag indicating whether to enable debugging outputs for the query. When set to false, no additional logs will be produced (logs produced will be entirely dependent on your logging level). When set to true, the following addition logs will be produced:<br />- Log the stack trace of the exception (if any) produced by the query |
-
+|maxNumericInFilters|`-1`|Max limit for the amount of numeric values that can be compared for a string type dimension when the entire SQL WHERE clause of a query translates to an [IN filter](../querying/filters.md#in-filter). By default, Druid does not restrict the amount of numbers in an IN filter, although this situation may block other queries from running. Set this property to a smaller value to prevent Druid from running queries that have prohibitively long segment processing times. The optimal limit requires some trial and error; we recommend starting with 100.  Users who submit a query that exceeds the limit of `maxNumericInFilters` should instead rewrite their queries to use strings in the IN filter instead of numbers. For example, `WHERE someString IN (‘123’, ‘456’)`. This value cannot exceed the set system configuration `druid.sql.planner.maxNumericInFilters`. This value is ignored if `druid.sql.planner.maxNumericInFilters` is not set explicitly.|

Review comment:
       As discussed docs updated indicating that it supports only ORs of IN filters




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r791264978



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java
##########
@@ -122,6 +125,24 @@ public boolean feature(QueryFeature feature)
         );
       }
     }
+    int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
+    if (numFilters < 1) {
+      throw new CannotBuildQueryException("maxNumericInFilters must be greater than 0");
+    }
+    //special corner case handling for numeric IN filters

Review comment:
       Added comments




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r791263729



##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -6903,6 +6903,27 @@ public void testMaxSubqueryRows() throws Exception
     );
   }
 
+  @Test
+  public void testZeroMaxNumericInFilter() throws Exception

Review comment:
       Added




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on pull request #12195:
URL: https://github.com/apache/druid/pull/12195#issuecomment-1020690373


   @jihoonson I have kept it at 10K as 10K values in IN for wikipedia dataset on my system finishes in 15 sec (https://implydata.atlassian.net/wiki/spaces/~976946634/pages/2395111503/Query+Stability). I have added the testcases and have made the error message more meaningful. @LakshSingla and @jihoonson thanks for your reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r795382674



##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/util/CalciteTests.java
##########
@@ -300,7 +300,7 @@ public AuthenticationResult createEscalatedAuthenticationResult()
           new TimestampSpec(TIMESTAMP_COLUMN, "iso", null),
           new DimensionsSpec(
               ImmutableList.<DimensionSchema>builder()
-                  .addAll(DimensionsSpec.getDefaultSchemas(ImmutableList.of("dim1", "dim2", "dim3", "dim4", "dim5")))
+                  .addAll(DimensionsSpec.getDefaultSchemas(ImmutableList.of("dim1", "dim2", "dim3", "dim4", "dim5", "dim6")))

Review comment:
       Confirmed. No other tests are failing. Travis passes too




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply edited a comment on pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply edited a comment on pull request #12195:
URL: https://github.com/apache/druid/pull/12195#issuecomment-1020690373


   @jihoonson I have kept it at 10K as 10K values in IN for wikipedia dataset on my system finishes in 15 sec. I have added the testcases and have made the error message more meaningful. @LakshSingla and @jihoonson thanks for your reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r791954642



##########
File path: docs/configuration/index.md
##########
@@ -1846,6 +1846,7 @@ The Druid SQL server is configured through the following properties on the Broke
 |`druid.sql.planner.metadataSegmentPollPeriod`|How often to poll coordinator for published segments list if `druid.sql.planner.metadataSegmentCacheEnable` is set to true. Poll period is in milliseconds. |60000|
 |`druid.sql.planner.authorizeSystemTablesDirectly`|If true, Druid authorizes queries against any of the system schema tables (`sys` in SQL) as `SYSTEM_TABLE` resources which require `READ` access, in addition to permissions based content filtering.|false|
 |`druid.sql.planner.useNativeQueryExplain`|If true, `EXPLAIN PLAN FOR` will return the explain plan as a JSON representation of equivalent native query(s), else it will return the original version of explain plan generated by Calcite. It can be overridden per query with `useNativeQueryExplain` context key.|false|
+|`druid.sql.planner.maxNumericInFilters`|If set to a value between 1 and 10,000, Druid will allow numeric values specified for IN part of the query not to exceed this user defined values and queries with more than 10,000 numeric IN filters will not be run by Druid unless they are cast to String.|`10000`|

Review comment:
       Good point. I'll add that it can be overridden through the query context




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r800291882



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java
##########
@@ -122,6 +127,32 @@ public boolean feature(QueryFeature feature)
         );
       }
     }
+    int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
+
+    // special corner case handling for numeric IN filters
+    // in case of query containing IN (v1, v2, v3,...) where Vi is numeric
+    // a BoundFilter is created internally for each of the values
+    // whereas when Vi s are String the Filters are converted as BoundFilter to SelectorFilter to InFilter
+    // which takes lesser processing for bitmaps
+    // So in a case where user executes a query with multiple numeric INs, flame graph shows BoundFilter.getBitmapResult
+    // and BoundFilter.match predicate eating up processing time which stalls a historical for a query with large number
+    // of numeric INs (> 10K). In such cases user should change the query to specify the IN clauses as String
+    // Instead of IN(v1,v2,v3) user should specify IN('v1','v2','v3')
+    if (numFilters != PlannerConfig.NUM_FILTER_NOT_USED) {
+      if (query.getFilter() instanceof OrDimFilter) {

Review comment:
       With ANDs or ORs or ORs of ANDs traversing the filter tree should not be expensive. Would be O(#filters). With cases of filters going upto 10K or even 100K this can be done. In case of combinations such as 
   ```
   dim1=1 AND dim2=2 AND dim3=3 AND dim4 IN (1,2,3)
   ```
   with `maxNumericInFilter=4` it would be difficult to point out which dimension is at fault. A generic error message can be thrown in that case saying something like the number of elements in WHERE clause exceeds system specified limit of K




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r800292972



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java
##########
@@ -122,6 +127,32 @@ public boolean feature(QueryFeature feature)
         );
       }
     }
+    int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
+
+    // special corner case handling for numeric IN filters
+    // in case of query containing IN (v1, v2, v3,...) where Vi is numeric
+    // a BoundFilter is created internally for each of the values
+    // whereas when Vi s are String the Filters are converted as BoundFilter to SelectorFilter to InFilter
+    // which takes lesser processing for bitmaps
+    // So in a case where user executes a query with multiple numeric INs, flame graph shows BoundFilter.getBitmapResult
+    // and BoundFilter.match predicate eating up processing time which stalls a historical for a query with large number
+    // of numeric INs (> 10K). In such cases user should change the query to specify the IN clauses as String
+    // Instead of IN(v1,v2,v3) user should specify IN('v1','v2','v3')
+    if (numFilters != PlannerConfig.NUM_FILTER_NOT_USED) {
+      if (query.getFilter() instanceof OrDimFilter) {

Review comment:
       I am thinking in lines of a very simple function that can traverse the entire filter tree would look something like
   ```
     private int countFilters(DimFilter filter)
     {
       int count = 0;
       if (filter instanceof BoundDimFilter) {
         count += 1;
       } else if (filter instanceof OrDimFilter) {
         for (DimFilter e : ((OrDimFilter) filter).getFields()) {
           count += countFilters(e);
         }
       } else if (filter instanceof AndDimFilter) {
         for (DimFilter e : ((AndDimFilter) filter).getFields()) {
           count += countFilters(e);
         }
       }
       return count;
     }
     ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a change in pull request #12195: Moving in filter check to broker

Posted by GitBox <gi...@apache.org>.
somu-imply commented on a change in pull request #12195:
URL: https://github.com/apache/druid/pull/12195#discussion_r804925453



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/run/NativeQueryMaker.java
##########
@@ -122,6 +127,32 @@ public boolean feature(QueryFeature feature)
         );
       }
     }
+    int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
+
+    // special corner case handling for numeric IN filters
+    // in case of query containing IN (v1, v2, v3,...) where Vi is numeric
+    // a BoundFilter is created internally for each of the values
+    // whereas when Vi s are String the Filters are converted as BoundFilter to SelectorFilter to InFilter
+    // which takes lesser processing for bitmaps
+    // So in a case where user executes a query with multiple numeric INs, flame graph shows BoundFilter.getBitmapResult
+    // and BoundFilter.match predicate eating up processing time which stalls a historical for a query with large number
+    // of numeric INs (> 10K). In such cases user should change the query to specify the IN clauses as String
+    // Instead of IN(v1,v2,v3) user should specify IN('v1','v2','v3')
+    if (numFilters != PlannerConfig.NUM_FILTER_NOT_USED) {
+      if (query.getFilter() instanceof OrDimFilter) {

Review comment:
       Docs updated to make the behavior clear




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org