You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/01/24 20:17:53 UTC

[GitHub] [druid] clintropolis opened a new pull request #12194: fix StringAnyAggregatorFactory to use single value selector for non-existent columns

clintropolis opened a new pull request #12194:
URL: https://github.com/apache/druid/pull/12194


   ### Description
   Fixes an issue with `StringAnyAggregatorFactory` which was incorrectly using a multi-value dimension selector when `ColumnCapabilities` were unavailable for a column, treating it as "unknown", which is correct for the non-vectorized engines, but incorrect for vector engines where missing capabilities means that the column doesn't exist and so is definitely nulls, which are always represented with a single value dimension selector.
   
   This PR fixes the issue and adds additional javadoc and comments to try to help avoid making this mistake. Longer term, it might be wise to consider moving aggregator factorize methods to using the `ColumnProcessors` factories, so that we can re-use the same code for deciding what type of aggregators to make for a given input, but I did not do this in this PR.
   
   <hr>
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist below are strictly necessary, but it would be very helpful if you at least self-review the PR. -->
   
   This PR has:
   - [x] been self-reviewed.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #12194: fix StringAnyAggregatorFactory to use single value selector for non-existent columns

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #12194:
URL: https://github.com/apache/druid/pull/12194#discussion_r791162190



##########
File path: processing/src/main/java/org/apache/druid/segment/ColumnProcessors.java
##########
@@ -342,7 +342,7 @@ private static ColumnCapabilities computeDimensionSpecCapabilities(
           );
         }
 
-        if (mayBeMultiValue(capabilities)) {
+        if (capabilities.hasMultipleValues().isMaybeTrue()) {

Review comment:
       nit: perhaps it would be useful to add a util method like `shouldProcessMultiValue()` that can be shared by here and `StringAnyAggregatorFactory`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #12194: fix StringAnyAggregatorFactory to use single value selector for non-existent columns

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #12194:
URL: https://github.com/apache/druid/pull/12194#discussion_r791166239



##########
File path: processing/src/main/java/org/apache/druid/segment/ColumnProcessors.java
##########
@@ -342,7 +342,7 @@ private static ColumnCapabilities computeDimensionSpecCapabilities(
           );
         }
 
-        if (mayBeMultiValue(capabilities)) {
+        if (capabilities.hasMultipleValues().isMaybeTrue()) {

Review comment:
       NVM. I misread the change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12194: fix StringAnyAggregatorFactory to use single value selector for non-existent columns

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12194:
URL: https://github.com/apache/druid/pull/12194#discussion_r791167301



##########
File path: processing/src/main/java/org/apache/druid/segment/ColumnProcessors.java
##########
@@ -342,7 +342,7 @@ private static ColumnCapabilities computeDimensionSpecCapabilities(
           );
         }
 
-        if (mayBeMultiValue(capabilities)) {
+        if (capabilities.hasMultipleValues().isMaybeTrue()) {

Review comment:
       there isn't really anything to be shared here, `StringAnyAggregatorFactory` needs a null check (different from this shared method since it is only applicable to non-vectorized engines) while this part of the code has already checked null, so I think the shared method would only really be used by `StringAnyAggregatorFactory`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #12194: fix StringAnyAggregatorFactory to use single value selector for non-existent columns

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #12194:
URL: https://github.com/apache/druid/pull/12194#discussion_r791162190



##########
File path: processing/src/main/java/org/apache/druid/segment/ColumnProcessors.java
##########
@@ -342,7 +342,7 @@ private static ColumnCapabilities computeDimensionSpecCapabilities(
           );
         }
 
-        if (mayBeMultiValue(capabilities)) {
+        if (capabilities.hasMultipleValues().isMaybeTrue()) {

Review comment:
       nit: perhaps it would be useful to add a util method like `shouldProcessMultiValue()` that can be shared by here and `StringAnyAggregatorFactory`.

##########
File path: processing/src/main/java/org/apache/druid/segment/ColumnProcessors.java
##########
@@ -342,7 +342,7 @@ private static ColumnCapabilities computeDimensionSpecCapabilities(
           );
         }
 
-        if (mayBeMultiValue(capabilities)) {
+        if (capabilities.hasMultipleValues().isMaybeTrue()) {

Review comment:
       NVM. I misread the change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #12194: fix StringAnyAggregatorFactory to use single value selector for non-existent columns

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #12194:
URL: https://github.com/apache/druid/pull/12194#discussion_r791167301



##########
File path: processing/src/main/java/org/apache/druid/segment/ColumnProcessors.java
##########
@@ -342,7 +342,7 @@ private static ColumnCapabilities computeDimensionSpecCapabilities(
           );
         }
 
-        if (mayBeMultiValue(capabilities)) {
+        if (capabilities.hasMultipleValues().isMaybeTrue()) {

Review comment:
       there isn't really anything to be shared here, `StringAnyAggregatorFactory` needs a null check (different from this shared method since it is only applicable to non-vectorized engines) while this part of the code has already checked null, so I think the shared method would only really be used by `StringAnyAggregatorFactory`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis merged pull request #12194: fix StringAnyAggregatorFactory to use single value selector for non-existent columns

Posted by GitBox <gi...@apache.org>.
clintropolis merged pull request #12194:
URL: https://github.com/apache/druid/pull/12194


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org