You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/07/09 20:10:18 UTC

[GitHub] [incubator-druid] clintropolis opened a new pull request #8047: optimize single input column multi-value expressions

clintropolis opened a new pull request #8047: optimize single input column multi-value expressions
URL: https://github.com/apache/incubator-druid/pull/8047
 
 
   ### Description
   This PR follows up on the changes in #8014, which actually opened the door to using the same optimized selector to evaluate single input column expressions for multi-value string columns, using native Druid behavior to do the `map` of the expression across the values. This works provided a few conditions are met: 
   * the expression _does not_ operate on the multi-value input as an array
   * the expression _does not_ produce an array
   * the single multi-value column is used _at most once_ in the expression
   
   To achieve this, functions and apply functions now also report if they have array inputs or outputs which is collected on `Expr.BindingDetails`, which is already used to analyze the expression in `ExpressionSelectors` in order to determine the correct type of selector to create.
   
   This change also allows multi-value expressions that meet the above conditions to work with group by v1, since the cardinality can be known, however wider multi-value expression support for group by v1 is not available (or really my priority).
   
   This PR has also uncovered some inconsistencies in the null value handling for multi-value expressions, [specifically the part that coerces `null`, `[]`, and `[null]` all into `[null]`](https://github.com/apache/incubator-druid/blob/master/processing/src/main/java/org/apache/druid/segment/virtual/ExpressionSelectors.java#L446), which creates some minor differences between using the `SingleStringInputDimensionSelector` to "map" an expression across the row values compared to using a `map` expression. Ideally I will modify the coercion to not convert `[]` to `[null]`, which seems to be the problematic issue, but it creates mass of failing tests in the `ExpressionFilterTest` due to differences in behavior in the underlying selectors, so I intend to investigate this deeper in follow-up work The difference in behavior is apparent in the modified calcite test, where `[]` is grouped as null with default null handling behavior instead of `""`.
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [x] added unit tests or modified existing tests to cover new code paths.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org