You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/03/25 11:09:42 UTC

[GitHub] [pinot] richardstartin opened a new issue #8409: Aggregation functions fail on String columns with unfriendly error messages

richardstartin opened a new issue #8409:
URL: https://github.com/apache/pinot/issues/8409


   e.g. on airline stats when a dictionary can be used the operator just assumes the metadata is numeric, which causes a hard to diagnose NPE:
   
   ```sql
   select max(Carrier) from airlineStats
   ```
   
   ```
   [
     {
       "errorCode": 200,
       "message": "QueryExecutionError:\njava.lang.NumberFormatException: For input string: \"WN\"\n\tat java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2054)\n\tat java.base/jdk.internal.math.FloatingDecimal.parseDouble(FloatingDecimal.java:110)\n\tat java.base/java.lang.Double.parseDouble(Double.java:543)\n\tat org.apache.pinot.core.operator.query.DictionaryBasedAggregationOperator.toDouble(DictionaryBasedAggregationOperator.java:129)"
     },
     {
       "errorCode": 200,
       "message": "QueryExecutionError:\njava.lang.NumberFormatException: For input string: \"AA\"\n\tat java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2054)\n\tat java.base/jdk.internal.math.FloatingDecimal.parseDouble(FloatingDecimal.java:110)\n\tat java.base/java.lang.Double.parseDouble(Double.java:543)\n\tat org.apache.pinot.core.operator.query.DictionaryBasedAggregationOperator.toDouble(DictionaryBasedAggregationOperator.java:129)"
     }
   ]
   ```
   
   It's actually worse when the dictionary can't be used because it does some work before failing:
   
   ```sql
   select max(Carrier) from airlineStats where AirTime > 10
   ```
   
   
   ```
   [
     {
       "errorCode": 200,
       "message": "QueryExecutionError:\njava.lang.IllegalStateException: Cannot compute max for non-numeric type: STRING\n\tat org.apache.pinot.core.query.aggregation.function.MaxAggregationFunction.aggregate(MaxAggregationFunction.java:96)\n\tat org.apache.pinot.core.query.aggregation.DefaultAggregationExecutor.aggregate(DefaultAggregationExecutor.java:47)\n\tat org.apache.pinot.core.operator.query.AggregationOperator.getNextBlock(AggregationOperator.java:70)\n\tat org.apache.pinot.core.operator.query.AggregationOperator.getNextBlock(AggregationOperator.java:38)"
     },
     {
       "errorCode": 200,
       "message": "QueryExecutionError:\njava.lang.NumberFormatException: For input string: \"MQ\"\n\tat java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2054)\n\tat java.base/jdk.internal.math.FloatingDecimal.parseDouble(FloatingDecimal.java:110)\n\tat java.base/java.lang.Double.parseDouble(Double.java:543)\n\tat org.apache.pinot.core.operator.query.DictionaryBasedAggregationOperator.toDouble(DictionaryBasedAggregationOperator.java:129)"
     }
   ]
   ```
   
   Postgres can produce the max over a string column so aggregation functions should not assume the result is a double, but while it does make this assumption, type checking should be done early.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #8409: Aggregation functions fail on String columns with unfriendly error messages

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #8409:
URL: https://github.com/apache/pinot/issues/8409#issuecomment-1079439498


   Ideally, the check should happen on the broker side since it has access to the table schema, then we may add a string version aggregation (e.g. `StringMax`) to avoid the overhead of overloading `Max` to do non-numeric types. The challenge here is that for transform function, currently we cannot derive the result data type from the input data types.
   
   Another approach is to change `Max` function to return an `Object` instead of `double`. It might involve overhead of storing boxed value, but the result can be of different types (e.g. return `Integer` for int columns instead of double). Currently the final result type must be fixed for aggregation function, so in order to make this change, we need to loose this restriction. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org