You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "jasperjiaguo (via GitHub)" <gi...@apache.org> on 2023/03/29 03:38:30 UTC

[GitHub] [pinot] jasperjiaguo opened a new issue, #10500: Offheap distinct(count)

jasperjiaguo opened a new issue, #10500:
URL: https://github.com/apache/pinot/issues/10500

   - Current distinct(count) functions create in-memory sets.
   -- Increased chances of OOM
   -- Incurs gc pressure
   -- Cannot handle high cardinality
   -- Hard to utilize disk for spilling
   - Off-heap (direct buffer) hash table based solution can help here.
   - Can be extended by supporting spilling over to disk
   - Off heap hash-table can potentially be extended to group-by queries.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia commented on issue #10500: Offheap execution for DISTINCT, DISTINCT_COUNT, DISTINCT_SUM and DISTINCT_AVG

Posted by "siddharthteotia (via GitHub)" <gi...@apache.org>.
siddharthteotia commented on issue #10500:
URL: https://github.com/apache/pinot/issues/10500#issuecomment-1707497786

   Quick update - We have POC / prototype code working. Tried it on production cluster where we were earlier seeing OOM / query killed due to excessive heap usage for a very memory intensive DISTINCT_COUNT query. Query completes with off heap implementation
   
   @vvivekiyer  will share our design, performance evaluation and other numbers we have gathered 
   
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia commented on issue #10500: Offheap distinct(count)

Posted by "siddharthteotia (via GitHub)" <gi...@apache.org>.
siddharthteotia commented on issue #10500:
URL: https://github.com/apache/pinot/issues/10500#issuecomment-1666286609

   @vvivekiyer  has started the work on this.
   
   - Currently evaluating a couple of available off heap implementations
   - Also having a design in progress for in-house off-heap hashset that can also be extended in future for off-heap group by with the accumulator buffer
   - POC implementation in progress.
   
   cc @Jackie-Jiang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] vvivekiyer commented on issue #10500: Offheap distinct(count)

Posted by "vvivekiyer (via GitHub)" <gi...@apache.org>.
vvivekiyer commented on issue #10500:
URL: https://github.com/apache/pinot/issues/10500#issuecomment-1664797671

   @siddharthteotia can you please assign this to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org