You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/08/18 18:34:22 UTC

[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #5893: Support for segmentPartitionedDistinctCount in Star Tree Index Pre aggregation Functions.

Jackie-Jiang commented on issue #5893:
URL: https://github.com/apache/incubator-pinot/issues/5893#issuecomment-675644651


   @mr-agrwal It might not be efficient to support star-tree on `SegmentPartitionedDistinctCount` because:
   - In order to generate star-tree, we need to generate the intermediate aggregated values for each dimension combinations. For `SegmentPartitionedDistinctCount`, that is a `Set`, which contains all the distinct values and has unbounded size.
   - Storing all these sets could cause memory issue during segment creation, and the segment size could be huge
   - At query time, deserializing these set could be slow, and we won't get much performance gain as we still need to process all these distinct values
   
   We usually add star-tree support for functions that has limited-sized intermediate aggregated values (e.g. Double, HyperLogLog, TDigest, etc.). For distinct count family, we have star-tree support on `DistinctCountHLL` and `DistinctCountBitmap`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org