You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "Dandandan (via GitHub)" <gi...@apache.org> on 2023/07/13 06:53:23 UTC
[GitHub] [arrow-datafusion] Dandandan commented on issue #6937: Improve Memory usage with large numbers of groups
Dandandan commented on issue #6937:
URL: https://github.com/apache/arrow-datafusion/issues/6937#issuecomment-1633666862
I did some tests just based on a heuristic (e.g. number of columns in input / group by) in https://github.com/apache/arrow-datafusion/pull/6938 but saw both perf. improvements (likely the high cardinality queries) and degradations.
Also for distributed systems like Ballista, the partial / final approach probably works better in most cases (even for higher cardinality ones), so I think we would have to make this behaviour configurable.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org