You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "Dandandan (via GitHub)" <gi...@apache.org> on 2023/07/13 06:53:23 UTC

[GitHub] [arrow-datafusion] Dandandan commented on issue #6937: Improve Memory usage with large numbers of groups

Dandandan commented on issue #6937:
URL: https://github.com/apache/arrow-datafusion/issues/6937#issuecomment-1633666862

   I did some tests just based on a heuristic (e.g. number of columns in input / group by) in https://github.com/apache/arrow-datafusion/pull/6938 but saw both perf. improvements (likely the high cardinality queries) and degradations.
   
   Also for distributed systems like Ballista, the partial / final approach probably works better in most cases (even for higher cardinality ones), so I think we would have to make this behaviour configurable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org