You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/09/04 06:50:41 UTC

[GitHub] [incubator-pinot] mr-agrwal opened a new issue #5893: Support for segmentPartitionedDistinctCount in Star Tree Index Pre aggregation Functions.

mr-agrwal opened a new issue #5893:
URL: https://github.com/apache/incubator-pinot/issues/5893


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg edited a comment on issue #5893: Support for segmentPartitionedDistinctCount in Star Tree Index Pre aggregation Functions.

Posted by GitBox <gi...@apache.org>.
kishoreg edited a comment on issue #5893:
URL: https://github.com/apache/incubator-pinot/issues/5893#issuecomment-686951445


   @Jackie-Jiang
   
   > which contains all the distinct values and has unbounded size
   This cannot be more than the threshold right?
   
   In the intermediate star-tree nodes, we should be able to store the final results.
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #5893: Support for segmentPartitionedDistinctCount in Star Tree Index Pre aggregation Functions.

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #5893:
URL: https://github.com/apache/incubator-pinot/issues/5893#issuecomment-675644651


   @mr-agrwal It might not be efficient to support star-tree on `SegmentPartitionedDistinctCount` because:
   - In order to generate star-tree, we need to generate the intermediate aggregated values for each dimension combinations. For `SegmentPartitionedDistinctCount`, that is a `Set`, which contains all the distinct values and has unbounded size.
   - Storing all these sets could cause memory issue during segment creation, and the segment size could be huge
   - At query time, deserializing these set could be slow, and we won't get much performance gain as we still need to process all these distinct values
   
   We usually add star-tree support for functions that has limited-sized intermediate aggregated values (e.g. Double, HyperLogLog, TDigest, etc.). For distinct count family, we have star-tree support on `DistinctCountHLL` and `DistinctCountBitmap`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mr-agrwal commented on issue #5893: Support for segmentPartitionedDistinctCount in Star Tree Index Pre aggregation Functions.

Posted by GitBox <gi...@apache.org>.
mr-agrwal commented on issue #5893:
URL: https://github.com/apache/incubator-pinot/issues/5893#issuecomment-688370476


   @Jackie-Jiang Our specific use case is 
   `SELECT SEGMENT_PARTITIONED_DISTINCT_COUNT(colA) FROM table WHERE colB IN (1, 2)`
   where we are creating a star tree with `t=1` on `colB`. In that case, we should be able to use the star tree.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #5893: Support for segmentPartitionedDistinctCount in Star Tree Index Pre aggregation Functions.

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #5893:
URL: https://github.com/apache/incubator-pinot/issues/5893#issuecomment-689154234


   @mr-agrwal In order to aggregate the values, we have to store the serialized `Set` into the star-tree (same as what we need to store for `DistinctCount`. The size of this `Set` is unbounded, and is storing all the distinct values under a tree node. I don't think it will work properly for 2 reasons:
   1. The star-tree size could be huge if there are many distinct values under a tree node, which can leads to memory issue
   2. Reading and deserializing the set could be very expensive (even more expensive than scanning the raw values and creating a new set)
   
   It might work for low cardinality columns (e.g. colA has <1000 distinct values), but that is not very common IMO


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #5893: Support for segmentPartitionedDistinctCount in Star Tree Index Pre aggregation Functions.

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #5893:
URL: https://github.com/apache/incubator-pinot/issues/5893#issuecomment-687324697


   > In the intermediate star-tree nodes, we should be able to store the final results.
   
   You mean the long values? There is no way to aggregate them inside the segment.
   E.g. there is no way to solve this query: `SELECT SEGMENT_PARTITIONED_DISTINCT_COUNT(colA) FROM table WHERE colB IN (1, 2)`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on issue #5893: Support for segmentPartitionedDistinctCount in Star Tree Index Pre aggregation Functions.

Posted by GitBox <gi...@apache.org>.
kishoreg commented on issue #5893:
URL: https://github.com/apache/incubator-pinot/issues/5893#issuecomment-686953093


   I think it's worth giving it a shot and compare the results.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on issue #5893: Support for segmentPartitionedDistinctCount in Star Tree Index Pre aggregation Functions.

Posted by GitBox <gi...@apache.org>.
kishoreg commented on issue #5893:
URL: https://github.com/apache/incubator-pinot/issues/5893#issuecomment-686951445


   @Jackie-Jiang is it possible to 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mr-agrwal closed issue #5893: Support for segmentPartitionedDistinctCount in Star Tree Index Pre aggregation Functions.

Posted by GitBox <gi...@apache.org>.
mr-agrwal closed issue #5893:
URL: https://github.com/apache/incubator-pinot/issues/5893


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org