You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/10/11 23:46:22 UTC

[GitHub] [pinot] amrishlal opened a new issue #7560: COUNT DISTINCT on multiple columns produces wrong result.

amrishlal opened a new issue #7560:
URL: https://github.com/apache/pinot/issues/7560


   The query `SELECT count(DISTINCT name, score) FROM scores` produces wrong results. This happens happens because all arguments except the first one are ignored while constructing DISTINCTCOUNT Function in `AggregationFunctionFactory.java`.
   
   One way to fix this is to using CONCAT function to contact all arguments into a single argument before creating the DISTINCTCOUNT function. Does this sound good? If so I will go ahead and put this in a PR. Any other suggestions?
   
   ```
   diff --git a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunctionFactory.java b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunctionFactory.java
   index 328560739d..b9e78fe098 100644
   --- a/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunctionFactory.java
   +++ b/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunctionFactory.java
   @@ -159,7 +159,13 @@ public class AggregationFunctionFactory {
              case MINMAXRANGE:
                return new MinMaxRangeAggregationFunction(firstArgument);
              case DISTINCTCOUNT:
   -            return new DistinctCountAggregationFunction(firstArgument);
   +            if (arguments.size() == 1) {
   +              return new DistinctCountAggregationFunction(firstArgument);
   +            }
   +
   +            arguments.add(ExpressionContext.forLiteral(""));
   +            return new DistinctCountAggregationFunction(ExpressionContext
   +                .forFunction(new FunctionContext(FunctionContext.Type.TRANSFORM, "CONCAT", arguments)));
              case DISTINCTCOUNTBITMAP:
                return new DistinctCountBitmapAggregationFunction(firstArgument);
              case SEGMENTPARTITIONEDDISTINCTCOUNT:
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mayankshriv commented on issue #7560: COUNT DISTINCT on multiple columns produces wrong result.

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on issue #7560:
URL: https://github.com/apache/pinot/issues/7560#issuecomment-949960320


   +1 to @Jackie-Jiang's suggestion. With the Theta-Sketches implementations we had enhanced the query execution engine to support aggregation functions multiple arguments, and this one falls under that category.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #7560: COUNT DISTINCT on multiple columns produces wrong result.

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #7560:
URL: https://github.com/apache/pinot/issues/7560#issuecomment-941534897


   The clean fix would be to support multiple arguments for `DistinctCount` function family, or model this query as `COUNT` over distinct query.
   
   Using `CONCAT` can be a temporary work-around, but it is suboptimal for:
   - Only can handle 2 arguments case
   - Not efficient for non-string columns
   - Not efficient because it is a scalar function
   
   We might be able to provide a transform function to combine multiple values more efficiently.
   
   Adding @xiangfu0 to the discussion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #7560: COUNT DISTINCT on multiple columns produces wrong result.

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #7560:
URL: https://github.com/apache/pinot/issues/7560#issuecomment-941534897


   The clean fix would be to support multiple arguments for `DistinctCount` function family, or model this query as `COUNT` over distinct query.
   
   Using `CONCAT` can be a temporary work-around, but it is suboptimal for:
   - Only can handle 2 arguments case
   - Not efficient for non-string columns
   - Not efficient because it is a scalar function
   
   We might be able to provide a transform function to combine multiple values more efficiently.
   
   Adding @xiangfu0 to the discussion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org