You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "somu-imply (via GitHub)" <gi...@apache.org> on 2023/08/29 20:34:19 UTC

[PR] Fix for schema mismatch to go down using the non vectorize path till … (druid)

somu-imply opened a new pull request, #14924:
URL: https://github.com/apache/druid/pull/14924

   …we update the vectorized aggs properly
   
   Polaris defined a fixed schema where a column say `instant` is long, but from the segment metadata instant was a string. If we run the query
   ```
   {
     "queryType":"segmentMetadata",
     "dataSource":"test_agent_usage_snapshots",
     "intervals":["2023-01-01/2024-01-01"]
   }
   ```
   We could see the schema to be STRING for segments.
   ```
   {
                   "typeSignature": "STRING",
                   "type": "STRING",
                   "hasMultipleValues": false,
                   "hasNulls": false,
                   "size": 0,
                   "cardinality": 1,
                   "minValue": "0",
                   "maxValue": "0",
                   "errorMessage": null
               }
    ```
    
    This causes queries with aggregation such as
    ```
    "aggregations": [
       {
         "type": "floatLast",
         "name": "a0",
         "fieldName": "<string_col>",
         "timeColumn": "__time"
       }
     ],
    ```  
   to fail when used in vectorized mode. This is because vectorized aggs need these cast supports. Till that is done, this PR fixes the issue by going down the non-vectorized route      
   
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Fix for schema mismatch to go down using the non vectorize path till … (druid)

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #14924:
URL: https://github.com/apache/druid/pull/14924#discussion_r1323478479


##########
processing/src/main/java/org/apache/druid/query/aggregation/any/NumericNilVectorAggregator.java:
##########
@@ -69,6 +69,11 @@ public static NumericNilVectorAggregator longNilVectorAggregator()
   @Nullable
   private final Object returnValue;
 
+  public static NumericNilVectorAggregator of(Object returnValue)

Review Comment:
   side note: I wonder if we should rename `NumericNilVectorAggregator` if we are going to be using it for non-numbers, maybe just `NilVectorAggregator`. I guess this doesn't need to be done in this PR, but it does seem kind of funny exposing it for other purposes



##########
processing/src/main/java/org/apache/druid/query/aggregation/first/DoubleFirstAggregatorFactory.java:
##########
@@ -149,7 +150,7 @@ public VectorAggregator factorizeVector(
           timeColumn);
       return new DoubleFirstVectorAggregator(timeSelector, valueSelector);
     }
-    return NumericNilVectorAggregator.doubleNilVectorAggregator();
+    return NumericNilVectorAggregator.of(new SerializablePair<>(0L, NullHandling.defaultDoubleValue()));

Review Comment:
   this could be saved as a static variable instead of making a new one each time since its not going to change over the lifetime of the service, same for all the other impls



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Fix for schema mismatch to go down using the non vectorize path till … (druid)

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on PR #14924:
URL: https://github.com/apache/druid/pull/14924#issuecomment-1716514800

   Addressed the review comments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Fix for schema mismatch to go down using the non vectorize path till … (druid)

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #14924:
URL: https://github.com/apache/druid/pull/14924#discussion_r1316153587


##########
processing/src/main/java/org/apache/druid/query/aggregation/last/DoubleLastAggregatorFactory.java:
##########
@@ -116,7 +116,8 @@ public Aggregator factorize(ColumnSelectorFactory metricFactory)
   @Override
   public boolean canVectorize(ColumnInspector columnInspector)
   {
-    return true;
+    final ColumnCapabilities capabilities = columnInspector.getColumnCapabilities(fieldName);
+    return capabilities != null && capabilities.isNumeric();

Review Comment:
   I did the same initially, but the get of NumericNilVectorAggregator would only return a null and not a 0 as the non-vectorized parts do.  One solution would be to update the NumercNilVectorAggregator to return a serializablePair with the rhs as 0 on case of a null, but I am unsure if that will break anything else. I'll try to go down that path too



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Fix for schema mismatch to go down using the non vectorize path till … (druid)

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #14924:
URL: https://github.com/apache/druid/pull/14924#discussion_r1309698237


##########
processing/src/main/java/org/apache/druid/query/aggregation/last/LongLastAggregatorFactory.java:
##########
@@ -128,7 +128,8 @@ public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
   @Override
   public boolean canVectorize(ColumnInspector columnInspector)
   {
-    return true;
+    final ColumnCapabilities capabilities = columnInspector.getColumnCapabilities(fieldName);
+    return capabilities != null && capabilities.isNumeric();

Review Comment:
   same comment about returning true



##########
processing/src/main/java/org/apache/druid/query/aggregation/last/FloatLastAggregatorFactory.java:
##########
@@ -128,7 +128,8 @@ public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
   @Override
   public boolean canVectorize(ColumnInspector columnInspector)
   {
-    return true;
+    final ColumnCapabilities capabilities = columnInspector.getColumnCapabilities(fieldName);
+    return capabilities != null && capabilities.isNumeric();

Review Comment:
   same comment about returning true



##########
processing/src/main/java/org/apache/druid/query/aggregation/last/DoubleLastAggregatorFactory.java:
##########
@@ -116,7 +116,8 @@ public Aggregator factorize(ColumnSelectorFactory metricFactory)
   @Override
   public boolean canVectorize(ColumnInspector columnInspector)
   {
-    return true;
+    final ColumnCapabilities capabilities = columnInspector.getColumnCapabilities(fieldName);
+    return capabilities != null && capabilities.isNumeric();

Review Comment:
   i know i advised differently offline earlier, but i now think this can still return true, since the value selector creation is moved inside of the check in factorizeVector. looking closer at factorize and factorizeBuffer those will sort of end up using a 'nil' selector because the non-vectorize column value selector on these numeric aggs will call `getDouble` or `getLong` on the value selectors which will be null/zero for string inputs.



##########
processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastAggregatorFactory.java:
##########
@@ -149,7 +150,8 @@ public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
   @Override
   public boolean canVectorize(ColumnInspector columnInspector)
   {
-    return true;
+    final ColumnCapabilities capabilities = columnInspector.getColumnCapabilities(fieldName);
+    return capabilities != null && capabilities.getType().equals(ValueType.STRING);

Review Comment:
   this one probably is necessary to keep though, the string vector aggregator calls `DimensionHandlerUtils.convertObjectToString` which can handle any type of value, however it is using a `VectorObjectSelector` which isn't guaranteed to be implemented for numbers, which typically need to use `VectorValueSelector`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Fix for schema mismatch to go down using the non vectorize path till … (druid)

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on PR #14924:
URL: https://github.com/apache/druid/pull/14924#issuecomment-1711973233

   Added the casting from valueSelector to ObjectSelector for non-string columns when used with stringLast/First in vectorized mode. Currently this fails on coverage only, adding tests soon


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Fix for schema mismatch to go down using the non vectorize path till … (druid)

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #14924:
URL: https://github.com/apache/druid/pull/14924#discussion_r1316153587


##########
processing/src/main/java/org/apache/druid/query/aggregation/last/DoubleLastAggregatorFactory.java:
##########
@@ -116,7 +116,8 @@ public Aggregator factorize(ColumnSelectorFactory metricFactory)
   @Override
   public boolean canVectorize(ColumnInspector columnInspector)
   {
-    return true;
+    final ColumnCapabilities capabilities = columnInspector.getColumnCapabilities(fieldName);
+    return capabilities != null && capabilities.isNumeric();

Review Comment:
   I did the same initially, but the get of NumericNilVectorAggregator does not return a serializablePair which would throw a NPE. One solution would be to update the NumercNilVectorAggregator, but I am unsure if that will break anything else. I'll try to go down that path too



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Fix for schema mismatch to go down using the non vectorize path till we update the vectorized aggs properly (druid)

Posted by "soumyava (via GitHub)" <gi...@apache.org>.
soumyava merged PR #14924:
URL: https://github.com/apache/druid/pull/14924


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org