You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "jasperjiaguo (via GitHub)" <gi...@apache.org> on 2023/03/29 03:34:54 UTC

[GitHub] [pinot] jasperjiaguo opened a new issue, #10499: Partitioned Distinct/DistinctCount

jasperjiaguo opened a new issue, #10499:
URL: https://github.com/apache/pinot/issues/10499

   For high cardinality columns, the local/intermediate/global merging phase of distinct(count) can be pretty memory/cpu heavy as the merger will need to ser/de and merge multiple large sets from the responses. In this case, if the distinct(count) column is partitioned into disjoint sets, then the merger can simply concat (for distinct) or add (for distinctcount) the intermediate results. This change can significantly reduce the set ser/de, transmission, and merge time/memory footprint. Meanwhile, it can be applicable to different levels of the processing depending on the partition granularity.
   
   <img width="757" alt="Screenshot 2023-03-28 at 8 34 39 PM" src="https://user-images.githubusercontent.com/10736840/228420057-f4957793-1820-4a6b-9974-45ec0fc80190.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #10499: Partitioned Distinct/DistinctCount

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #10499:
URL: https://github.com/apache/pinot/issues/10499#issuecomment-1489450187

   For some context, #9304 can be leveraged to achieve this as long as broker knows the segments are partitioned
   
   cc @61yao 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on issue #10499: Partitioned Distinct/DistinctCount

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on issue #10499:
URL: https://github.com/apache/pinot/issues/10499#issuecomment-1492023817

   as a longer-term solution we are also planning to change the dispatch mechanism in routing manager to include partition to segment list mapping so this decision can be made on server. see #9611 (partition dispatch)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org