You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@pinot.apache.org by "wirybeaver (via GitHub)" <gi...@apache.org> on 2023/12/01 07:37:37 UTC

[I] Support parallel combine and disk spill for groupBy execution [pinot]

wirybeaver opened a new issue, #12080:
URL: https://github.com/apache/pinot/issues/12080

When I read the source code Pinot's GroupByExecutor, I found out it lacks of the following features of Druid's GroupByV2Engine:
1. Spill to disk for merging buffer. [Druid ParallelCombiner](https://github.com/apache/druid/blob/9f3b26676d30f90599a7d55e43549617e0cee082/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/ParallelCombiner.java#L64)
2. Parallel combine when merging sorted aggregation result. Druid will create a combining tree thread for local historical nodes. [Druid SpillingGrouper](https://github.com/apache/druid/blob/9f3b26676d30f90599a7d55e43549617e0cee082/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/SpillingGrouper.java)
// o <- non-leaf node
// / / \ \ <- ICD = 4
// o o o o <- non-leaf nodes
// / \ / \ / \ / \ <- LCD = 2
// o o o o o o o o <- leaf nodes

Reference: [Druid GroupBy Tuning Guide](https://druid.apache.org/docs/latest/querying/groupbyquery/)

Druid seems to always sort the aggregate result by default when the Limit pushdown is not enabled as the tuning guide mentioned. I have a strong feeling that integrating DiskSpill feature allows Pinot to process large scale of data and resolve the issue of indeterministic result for groupBy without orderBy, i.e. https://github.com/apache/pinot/issues/11706. In addition, the NonLeaf stage in Multistage V2 can also adopts those two features for partitioned aggregation.

Raise this issue to solicit opinions from folks. If there's sufficient support, I will write a design doc for leaf stage group by execution.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

Re: [I] [feature] Support parallel combine and disk spill for groupBy execution [pinot]

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.

wirybeaver commented on issue #12080:
URL: https://github.com/apache/pinot/issues/12080#issuecomment-1874365405

   I have read Druid's related source code and will share details in recent days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

Re: [I] [feature] Support parallel combine and disk spill for groupBy execution [pinot]

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.

wirybeaver commented on issue #12080:
URL: https://github.com/apache/pinot/issues/12080#issuecomment-2089118697

   https://www.databend.com/blog/2024-04-12-towards-efficient-distributed-group-aggregation.md/
   
   dataend execution is incredibly fast. they follow duckdb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

Re: [I] [feature] Support parallel combine and disk spill for groupBy execution [pinot]

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.

wirybeaver commented on issue #12080:
URL: https://github.com/apache/pinot/issues/12080#issuecomment-1854764439

   disk spill: currently the num of group limit is used to avoid OOM and the return result is not accurate. Enable disk spill can get deterministic result. The Apache Druid / Doris and Velox already support disk based group by.
   parallel combine: speed up the combine process
   @chenboat 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

Re: [I] [feature] Support parallel combine and disk spill for groupBy execution [pinot]

Posted by "chenboat (via GitHub)" <gi...@apache.org>.

chenboat commented on issue #12080:
URL: https://github.com/apache/pinot/issues/12080#issuecomment-1843844937

   What is the main benefits of this feature on group-by? In what cases it will improve the status-quo Can you be more specific? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

Re: [I] [feature] Support parallel combine and disk spill for groupBy execution [pinot]

Posted by "kishoreg (via GitHub)" <gi...@apache.org>.

kishoreg commented on issue #12080:
URL: https://github.com/apache/pinot/issues/12080#issuecomment-1854771451

   This is a good feature to have. Would love to review the design proposal. it will be good to write a stand-alone algorithm that allows us to leverage it in multiple places/operators


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

Re: [I] [feature] Support parallel combine and disk spill for groupBy execution [pinot]

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.

wirybeaver commented on issue #12080:
URL: https://github.com/apache/pinot/issues/12080#issuecomment-2041750481

   Personally, I prefer DuckDB's approach. Each worker has its own disk-based linear probing hash table to do pre-aggregation without interacting with other workers. And then do combination.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

Re: [I] [feature] Support parallel combine and disk spill for groupBy execution [pinot]

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.

wirybeaver commented on issue #12080:
URL: https://github.com/apache/pinot/issues/12080#issuecomment-2041746962

I did' have bandwidth to fully interpret the Druid groupBy algorithm. But here is the headline version: let say there are 8 workers. The segments are distributed across 8 workers. Each worker has a local hash table which use linear probing to resolve hash collision. Each worker is in charge of storing aggregated result for one partition. If a worker find out the aggregated key doesn't belong to the local partition, the worker would insert the aggregated row to other worker's hash table. If any one local hash table doesn't trigger the disk spilled condition, then there's no need to merge partition. However, if any of table trigger the disk spilled condition, then each worker would not distribute the aggregated row anymore. Each worker just do pre-aggregation internally. In the combination phase, the result would be merged on the fly, combining the result of each hash table's in-memory and on-disk result.

The DuckDB's has similar idea even though there are nuances: https://duckdb.org/2024/03/29/external-aggregation.html Each thread do pre-aggregation without distributing result at all and then combine result. They also user linear probing with salt. linear probing is more disk friendly. To support resize efficiently, they implement a two-part aggregate hash table https://duckdb.org/2022/03/07/aggregate-hashtable

The Pinot groupBy currently doesn't support external aggregation and all threads write into a in-memory single concurrent hash map.

The bustub implements a disk based hash table and use latch crabbing to increase concurrency. https://15445.courses.cs.cmu.edu/spring2024/project2/

@chenboat For your question, I would say achieve low-memory usage purpose for groupby. To be general, each operator can support disk based mode, e.g., external sorting.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

Re: [I] [feature] Support parallel combine and disk spill for groupBy execution [pinot]

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.

wirybeaver commented on issue #12080:
URL: https://github.com/apache/pinot/issues/12080#issuecomment-2092026759

   Discuss with Jackie and he suggested to single out parallel aggregation from the disk spilled feature.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

Re: [I] [feature] Support parallel combine and disk spill for groupBy execution [pinot]

Posted by "wirybeaver (via GitHub)" <gi...@apache.org>.

wirybeaver commented on issue #12080:
URL: https://github.com/apache/pinot/issues/12080#issuecomment-2103191442

   The doc is in progress: https://docs.google.com/document/d/1GViPFE-wEpz2LIH79a1epr0u1AGcZ4QZzcWIWW77AlM/edit?usp=sharing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org