You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/01/08 08:50:46 UTC

[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r553814648



##########
File path: processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
##########
@@ -305,17 +305,29 @@ private static ColumnCapabilities getEffectiveCapabilities(
         originalCapabilities == null || ValueType.COMPLEX.equals(originalCapabilities.getType());
 
     if (type == ValueType.STRING) {
-      if (!forceSingleValue && effectiveCapabilites.hasMultipleValues().isMaybeTrue()) {
-        return strategyFactory.makeMultiValueDimensionProcessor(
-            effectiveCapabilites,
-            selectorFactory.makeMultiValueDimensionSelector(dimensionSpec)
-        );
-      } else {
-        return strategyFactory.makeSingleValueDimensionProcessor(
-            effectiveCapabilites,
-            selectorFactory.makeSingleValueDimensionSelector(dimensionSpec)
-        );
+      if (!forceSingleValue) {
+        // if column is not dictionary encoded (and not non-existent or complex), use object selector
+        if (effectiveCapabilites.isDictionaryEncoded().isFalse() ||
+            effectiveCapabilites.areDictionaryValuesUnique().isFalse()

Review comment:
       by regular string selector do you mean dimension selector? The answer is yes we _could_, but if the dictionary values are not unique, i came to the conclusion that they aren't especially useful since in all the cases I can think of you can't use them as is and you're going to have to lookup the string values anyway (e.g. filter matching and vector group by).
   
   I was trying to provide an avenue to avoid the extra steps with expression virtual columns, and allow an object selector to be created directly since the string is the useable part in this case, and the dictionary ids just feel like an obstruction. But maybe this is overly broad and there are cases where we would want it to act like a dictionary encoded column for some reason that I have missed?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org