You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/09/24 08:38:53 UTC

[GitHub] [druid] clintropolis opened a new pull request #10432: add vectorizeVirtualColumns query context parameter

clintropolis opened a new pull request #10432:
URL: https://github.com/apache/druid/pull/10432


   ### Description
   Follow-up to #10401, this PR adds an escape hatch in case anything goes wrong in the form of a new query context parameter, `vectorizeVirtualColumns`, which follows the same pattern as `vectorize` with options `false`, `true`, `force`. 
   
   This context parameter is layered on top of `vectorize`, which also must be set to `true` for a query to use vectorized engines, but _can_ be set to `false` to disable vectorization selectively for only queries with virtual columns by default with
   
   ```
   druid.query.default.context.vectorizeVirtualColumns=false
   ```
   
   in `runtime.properties` in the event of any bugs resulting from the vectorized expression virtual columns added in #10401.
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
   - [x] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #10432: add vectorizeVirtualColumns query context parameter

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #10432:
URL: https://github.com/apache/druid/pull/10432#discussion_r494536534



##########
File path: docs/querying/query-context.md
##########
@@ -92,7 +92,7 @@ include "selector", "bound", "in", "like", "regex", "search", "and", "or", and "
 - All aggregators must offer vectorized implementations. These include "count", "doubleSum", "floatSum", "longSum", "longMin",
  "longMax", "doubleMin", "doubleMax", "floatMin", "floatMax", "longAny", "doubleAny", "floatAny", "stringAny",
  "hyperUnique", "filtered", "approxHistogram", "approxHistogramFold", and "fixedBucketsHistogram" (with numerical input). 
-- No virtual columns.
+- All virtual columns must offer vectorized implementations.

Review comment:
       There's only one built-in virtual column: "expression". So, we should mention when that specific type of virtual column can be vectorized. (Does it depend on the type of expression? Types of input? Functions used? etc)

##########
File path: docs/querying/query-context.md
##########
@@ -106,3 +106,4 @@ vectorization. These query types will ignore the "vectorize" parameter even if i
 |--------|-------|------------|
 |vectorize|`true`|Enables or disables vectorized query execution. Possible values are `false` (disabled), `true` (enabled if possible, disabled otherwise, on a per-segment basis), and `force` (enabled, and groupBy or timeseries queries that cannot be vectorized will fail). The `"force"` setting is meant to aid in testing, and is not generally useful in production (since real-time segments can never be processed with vectorized execution, any queries on real-time data will fail). This will override `druid.query.default.context.vectorize` if it's set.|
 |vectorSize|`512`|Sets the row batching size for a particular query. This will override `druid.query.default.context.vectorSize` if it's set.|
+|vectorizeVirtualColumns|`true`|Enables or disables vectorized query processing of queries with virtual columns, layered on top of `vectorize` (`vectorize` must also be set to true for a query to utilize vectorization). Possible values are `false` (disabled), `true` (enabled if possible, disabled otherwise, on a per-segment basis), and `force` (enabled, and groupBy or timeseries queries with virtual columns that cannot be vectorized will fail). The `"force"` setting is meant to aid in testing, and is not generally useful in production. This will override `druid.query.default.context.vectorizeVirtualColumns` if it's set.|

Review comment:
       IMO, it'd be best to set it to `false` now, then change it to `true` for the next release. It'll lower the risk around upgrading, and we want upgrades to be really low risk so people feel comfortable just dropping in the newest version.
   
   The specific risks here are:
   
   1) Bugs in vectorized expressions.
   2) Situations where vectorized expressions perform worse than non-vectorized ones (perhaps some optimizations are implemented for the non-vectorized case, but not the vectorized case).
   3) Situations where there is a bug in the vectorized query engines _in general_, but a specific user isn't hitting it, because they're using expressions and so the query wouldn't vectorize anyway.

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/vector/VectorGroupByEngine.java
##########
@@ -76,10 +76,18 @@ public static boolean canVectorize(
     Function<String, ColumnCapabilities> capabilitiesFunction = name ->
         query.getVirtualColumns().getColumnCapabilitiesWithFallback(adapter, name);
 
+    final boolean adapterCanVectorize = adapter.canVectorize(filter, query.getVirtualColumns(), false);
+    final boolean virtualColumnsCanVectorize;
+    if (query.getVirtualColumns().getVirtualColumns().length > 0) {

Review comment:
       There's some duplication here between the query engines — how about creating a helper that takes the query and the virtual columns and creates a `Vectorize` instance that synthesizes both?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #10432: add vectorizeVirtualColumns query context parameter

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #10432:
URL: https://github.com/apache/druid/pull/10432#discussion_r494752606



##########
File path: docs/querying/query-context.md
##########
@@ -92,7 +92,7 @@ include "selector", "bound", "in", "like", "regex", "search", "and", "or", and "
 - All aggregators must offer vectorized implementations. These include "count", "doubleSum", "floatSum", "longSum", "longMin",
  "longMax", "doubleMin", "doubleMax", "floatMin", "floatMax", "longAny", "doubleAny", "floatAny", "stringAny",
  "hyperUnique", "filtered", "approxHistogram", "approxHistogramFold", and "fixedBucketsHistogram" (with numerical input). 
-- No virtual columns.
+- All virtual columns must offer vectorized implementations.

Review comment:
       Ah, I was planning on filling out a list of supported expressions but figured the list was sort of volatile since I have a few PRs and branches in flight and was originally going to do this later. I was planning to wait for #10429 to go in before I fixup this PR (to make sure changing the default to false doesn't cause any ill effects to tests) so I can go ahead and fill out this list to include the stuff from #10401 and #10429 in this PR at least.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei merged pull request #10432: add vectorizeVirtualColumns query context parameter

Posted by GitBox <gi...@apache.org>.
jon-wei merged pull request #10432:
URL: https://github.com/apache/druid/pull/10432


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ccaominh commented on a change in pull request #10432: add vectorizeVirtualColumns query context parameter

Posted by GitBox <gi...@apache.org>.
ccaominh commented on a change in pull request #10432:
URL: https://github.com/apache/druid/pull/10432#discussion_r496201157



##########
File path: docs/misc/math-expr.md
##########
@@ -214,3 +214,16 @@ For the IPv4 address functions, the `address` argument can either be an IPv4 dot
 | ipv4_match(address, subnet) | Returns 1 if the `address` belongs to the `subnet` literal, else 0. If `address` is not a valid IPv4 address, then 0 is returned. This function is more efficient if `address` is a long instead of a string.|
 | ipv4_parse(address) | Parses `address` into an IPv4 address stored as a long. If `address` is a long that is a valid IPv4 address, then it is passed through. Returns null if `address` cannot be represented as an IPv4 address. |
 | ipv4_stringify(address) | Converts `address` into an IPv4 address dotted-decimal string. If `address` is a string that is a valid IPv4 address, then it is passed through. Returns null if `address` cannot be represented as an IPv4 address.|
+
+
+## Vectorization Support
+A number of expressions support ['vectorized' query engines](../querying/query-context.md#vectorization-parameters)
+
+supported features:
+* constants and identifiers are supported for any column type
+* `cast` is supported for numeric and string types
+* math operators: `+`,`-`,`*`,`/`,`%`,`^` are supported for numeric types
+* comparison operators: `=`, `!=`, >`, `>=`, `<`, `<=` are supported for numeric types

Review comment:
       ```suggestion
   * comparison operators: `=`, `!=`, `>`, `>=`, `<`, `<=` are supported for numeric types
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #10432: add vectorizeVirtualColumns query context parameter

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #10432:
URL: https://github.com/apache/druid/pull/10432#discussion_r494751677



##########
File path: docs/querying/query-context.md
##########
@@ -106,3 +106,4 @@ vectorization. These query types will ignore the "vectorize" parameter even if i
 |--------|-------|------------|
 |vectorize|`true`|Enables or disables vectorized query execution. Possible values are `false` (disabled), `true` (enabled if possible, disabled otherwise, on a per-segment basis), and `force` (enabled, and groupBy or timeseries queries that cannot be vectorized will fail). The `"force"` setting is meant to aid in testing, and is not generally useful in production (since real-time segments can never be processed with vectorized execution, any queries on real-time data will fail). This will override `druid.query.default.context.vectorize` if it's set.|
 |vectorSize|`512`|Sets the row batching size for a particular query. This will override `druid.query.default.context.vectorSize` if it's set.|
+|vectorizeVirtualColumns|`true`|Enables or disables vectorized query processing of queries with virtual columns, layered on top of `vectorize` (`vectorize` must also be set to true for a query to utilize vectorization). Possible values are `false` (disabled), `true` (enabled if possible, disabled otherwise, on a per-segment basis), and `force` (enabled, and groupBy or timeseries queries with virtual columns that cannot be vectorized will fail). The `"force"` setting is meant to aid in testing, and is not generally useful in production. This will override `druid.query.default.context.vectorizeVirtualColumns` if it's set.|

Review comment:
       Given that #10429 has uncovered a bug with vectorized group by engine when grouping on numeric columns with null values in sql compatible null handling mode that isn't related to expressions/virtual columns, I guess this seems prudent and I've come around to being in the 'false by default' camp.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ccaominh commented on a change in pull request #10432: add vectorizeVirtualColumns query context parameter

Posted by GitBox <gi...@apache.org>.
ccaominh commented on a change in pull request #10432:
URL: https://github.com/apache/druid/pull/10432#discussion_r496201316



##########
File path: docs/misc/math-expr.md
##########
@@ -214,3 +214,16 @@ For the IPv4 address functions, the `address` argument can either be an IPv4 dot
 | ipv4_match(address, subnet) | Returns 1 if the `address` belongs to the `subnet` literal, else 0. If `address` is not a valid IPv4 address, then 0 is returned. This function is more efficient if `address` is a long instead of a string.|
 | ipv4_parse(address) | Parses `address` into an IPv4 address stored as a long. If `address` is a long that is a valid IPv4 address, then it is passed through. Returns null if `address` cannot be represented as an IPv4 address. |
 | ipv4_stringify(address) | Converts `address` into an IPv4 address dotted-decimal string. If `address` is a string that is a valid IPv4 address, then it is passed through. Returns null if `address` cannot be represented as an IPv4 address.|
+
+
+## Vectorization Support
+A number of expressions support ['vectorized' query engines](../querying/query-context.md#vectorization-parameters)
+
+supported features:
+* constants and identifiers are supported for any column type
+* `cast` is supported for numeric and string types
+* math operators: `+`,`-`,`*`,`/`,`%`,`^` are supported for numeric types
+* comparison operators: `=`, `!=`, >`, `>=`, `<`, `<=` are supported for numeric types

Review comment:
       I think that should fix the spellcheck failure




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #10432: add vectorizeVirtualColumns query context parameter

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #10432:
URL: https://github.com/apache/druid/pull/10432#discussion_r494751677



##########
File path: docs/querying/query-context.md
##########
@@ -106,3 +106,4 @@ vectorization. These query types will ignore the "vectorize" parameter even if i
 |--------|-------|------------|
 |vectorize|`true`|Enables or disables vectorized query execution. Possible values are `false` (disabled), `true` (enabled if possible, disabled otherwise, on a per-segment basis), and `force` (enabled, and groupBy or timeseries queries that cannot be vectorized will fail). The `"force"` setting is meant to aid in testing, and is not generally useful in production (since real-time segments can never be processed with vectorized execution, any queries on real-time data will fail). This will override `druid.query.default.context.vectorize` if it's set.|
 |vectorSize|`512`|Sets the row batching size for a particular query. This will override `druid.query.default.context.vectorSize` if it's set.|
+|vectorizeVirtualColumns|`true`|Enables or disables vectorized query processing of queries with virtual columns, layered on top of `vectorize` (`vectorize` must also be set to true for a query to utilize vectorization). Possible values are `false` (disabled), `true` (enabled if possible, disabled otherwise, on a per-segment basis), and `force` (enabled, and groupBy or timeseries queries with virtual columns that cannot be vectorized will fail). The `"force"` setting is meant to aid in testing, and is not generally useful in production. This will override `druid.query.default.context.vectorizeVirtualColumns` if it's set.|

Review comment:
       Given that #10429 has uncovered a bug with vectorized group by engine when grouping on numeric columns with null values in sql compatible null handling mode that isn't related to expressions/virtual columns, I guess this seems prudent and I've come around to being in the 'false by default' camp.

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/vector/VectorGroupByEngine.java
##########
@@ -76,10 +76,18 @@ public static boolean canVectorize(
     Function<String, ColumnCapabilities> capabilitiesFunction = name ->
         query.getVirtualColumns().getColumnCapabilitiesWithFallback(adapter, name);
 
+    final boolean adapterCanVectorize = adapter.canVectorize(filter, query.getVirtualColumns(), false);
+    final boolean virtualColumnsCanVectorize;
+    if (query.getVirtualColumns().getVirtualColumns().length > 0) {

Review comment:
       I'll see if I can clean this up a bit

##########
File path: docs/querying/query-context.md
##########
@@ -92,7 +92,7 @@ include "selector", "bound", "in", "like", "regex", "search", "and", "or", and "
 - All aggregators must offer vectorized implementations. These include "count", "doubleSum", "floatSum", "longSum", "longMin",
  "longMax", "doubleMin", "doubleMax", "floatMin", "floatMax", "longAny", "doubleAny", "floatAny", "stringAny",
  "hyperUnique", "filtered", "approxHistogram", "approxHistogramFold", and "fixedBucketsHistogram" (with numerical input). 
-- No virtual columns.
+- All virtual columns must offer vectorized implementations.

Review comment:
       Ah, I was planning on filling out a list of supported expressions but figured the list was sort of volatile since I have a few PRs and branches in flight and was originally going to do this later. I was planning to wait for #10429 to go in before I fixup this PR (to make sure changing the default to false doesn't cause any ill effects to tests) so I can go ahead and fill out this list to include the stuff from #10401 and #10429 in this PR at least.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #10432: add vectorizeVirtualColumns query context parameter

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #10432:
URL: https://github.com/apache/druid/pull/10432#discussion_r495735021



##########
File path: docs/querying/query-context.md
##########
@@ -106,3 +106,4 @@ vectorization. These query types will ignore the "vectorize" parameter even if i
 |--------|-------|------------|
 |vectorize|`true`|Enables or disables vectorized query execution. Possible values are `false` (disabled), `true` (enabled if possible, disabled otherwise, on a per-segment basis), and `force` (enabled, and groupBy or timeseries queries that cannot be vectorized will fail). The `"force"` setting is meant to aid in testing, and is not generally useful in production (since real-time segments can never be processed with vectorized execution, any queries on real-time data will fail). This will override `druid.query.default.context.vectorize` if it's set.|
 |vectorSize|`512`|Sets the row batching size for a particular query. This will override `druid.query.default.context.vectorSize` if it's set.|
+|vectorizeVirtualColumns|`true`|Enables or disables vectorized query processing of queries with virtual columns, layered on top of `vectorize` (`vectorize` must also be set to true for a query to utilize vectorization). Possible values are `false` (disabled), `true` (enabled if possible, disabled otherwise, on a per-segment basis), and `force` (enabled, and groupBy or timeseries queries with virtual columns that cannot be vectorized will fail). The `"force"` setting is meant to aid in testing, and is not generally useful in production. This will override `druid.query.default.context.vectorizeVirtualColumns` if it's set.|

Review comment:
       updated to default to false. I modified the tests to also set this parameter to `force` whenever `vectorize` is set to `force` so that we can keep the test coverage




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #10432: add vectorizeVirtualColumns query context parameter

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #10432:
URL: https://github.com/apache/druid/pull/10432#discussion_r495734682



##########
File path: docs/querying/query-context.md
##########
@@ -92,7 +92,7 @@ include "selector", "bound", "in", "like", "regex", "search", "and", "or", and "
 - All aggregators must offer vectorized implementations. These include "count", "doubleSum", "floatSum", "longSum", "longMin",
  "longMax", "doubleMin", "doubleMax", "floatMin", "floatMax", "longAny", "doubleAny", "floatAny", "stringAny",
  "hyperUnique", "filtered", "approxHistogram", "approxHistogramFold", and "fixedBucketsHistogram" (with numerical input). 
-- No virtual columns.
+- All virtual columns must offer vectorized implementations.

Review comment:
       added docs explaining what is currently supported




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #10432: add vectorizeVirtualColumns query context parameter

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #10432:
URL: https://github.com/apache/druid/pull/10432#discussion_r494536534



##########
File path: docs/querying/query-context.md
##########
@@ -92,7 +92,7 @@ include "selector", "bound", "in", "like", "regex", "search", "and", "or", and "
 - All aggregators must offer vectorized implementations. These include "count", "doubleSum", "floatSum", "longSum", "longMin",
  "longMax", "doubleMin", "doubleMax", "floatMin", "floatMax", "longAny", "doubleAny", "floatAny", "stringAny",
  "hyperUnique", "filtered", "approxHistogram", "approxHistogramFold", and "fixedBucketsHistogram" (with numerical input). 
-- No virtual columns.
+- All virtual columns must offer vectorized implementations.

Review comment:
       There's only one built-in virtual column: "expression". So, we should mention when that specific type of virtual column can be vectorized. (Does it depend on the type of expression? Types of input? Functions used? etc)

##########
File path: docs/querying/query-context.md
##########
@@ -106,3 +106,4 @@ vectorization. These query types will ignore the "vectorize" parameter even if i
 |--------|-------|------------|
 |vectorize|`true`|Enables or disables vectorized query execution. Possible values are `false` (disabled), `true` (enabled if possible, disabled otherwise, on a per-segment basis), and `force` (enabled, and groupBy or timeseries queries that cannot be vectorized will fail). The `"force"` setting is meant to aid in testing, and is not generally useful in production (since real-time segments can never be processed with vectorized execution, any queries on real-time data will fail). This will override `druid.query.default.context.vectorize` if it's set.|
 |vectorSize|`512`|Sets the row batching size for a particular query. This will override `druid.query.default.context.vectorSize` if it's set.|
+|vectorizeVirtualColumns|`true`|Enables or disables vectorized query processing of queries with virtual columns, layered on top of `vectorize` (`vectorize` must also be set to true for a query to utilize vectorization). Possible values are `false` (disabled), `true` (enabled if possible, disabled otherwise, on a per-segment basis), and `force` (enabled, and groupBy or timeseries queries with virtual columns that cannot be vectorized will fail). The `"force"` setting is meant to aid in testing, and is not generally useful in production. This will override `druid.query.default.context.vectorizeVirtualColumns` if it's set.|

Review comment:
       IMO, it'd be best to set it to `false` now, then change it to `true` for the next release. It'll lower the risk around upgrading, and we want upgrades to be really low risk so people feel comfortable just dropping in the newest version.
   
   The specific risks here are:
   
   1) Bugs in vectorized expressions.
   2) Situations where vectorized expressions perform worse than non-vectorized ones (perhaps some optimizations are implemented for the non-vectorized case, but not the vectorized case).
   3) Situations where there is a bug in the vectorized query engines _in general_, but a specific user isn't hitting it, because they're using expressions and so the query wouldn't vectorize anyway.

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/vector/VectorGroupByEngine.java
##########
@@ -76,10 +76,18 @@ public static boolean canVectorize(
     Function<String, ColumnCapabilities> capabilitiesFunction = name ->
         query.getVirtualColumns().getColumnCapabilitiesWithFallback(adapter, name);
 
+    final boolean adapterCanVectorize = adapter.canVectorize(filter, query.getVirtualColumns(), false);
+    final boolean virtualColumnsCanVectorize;
+    if (query.getVirtualColumns().getVirtualColumns().length > 0) {

Review comment:
       There's some duplication here between the query engines — how about creating a helper that takes the query and the virtual columns and creates a `Vectorize` instance that synthesizes both?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #10432: add vectorizeVirtualColumns query context parameter

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #10432:
URL: https://github.com/apache/druid/pull/10432#discussion_r494751773



##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/vector/VectorGroupByEngine.java
##########
@@ -76,10 +76,18 @@ public static boolean canVectorize(
     Function<String, ColumnCapabilities> capabilitiesFunction = name ->
         query.getVirtualColumns().getColumnCapabilitiesWithFallback(adapter, name);
 
+    final boolean adapterCanVectorize = adapter.canVectorize(filter, query.getVirtualColumns(), false);
+    final boolean virtualColumnsCanVectorize;
+    if (query.getVirtualColumns().getVirtualColumns().length > 0) {

Review comment:
       I'll see if I can clean this up a bit




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #10432: add vectorizeVirtualColumns query context parameter

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #10432:
URL: https://github.com/apache/druid/pull/10432#discussion_r495735672



##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/vector/VectorGroupByEngine.java
##########
@@ -76,10 +76,18 @@ public static boolean canVectorize(
     Function<String, ColumnCapabilities> capabilitiesFunction = name ->
         query.getVirtualColumns().getColumnCapabilitiesWithFallback(adapter, name);
 
+    final boolean adapterCanVectorize = adapter.canVectorize(filter, query.getVirtualColumns(), false);
+    final boolean virtualColumnsCanVectorize;
+    if (query.getVirtualColumns().getVirtualColumns().length > 0) {

Review comment:
       moved into a static method `VirtualColumns.shouldVectorize` which I'm not sure is the best place or not, but at least consolidates the logic




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org