You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "ozankabak (via GitHub)" <gi...@apache.org> on 2023/04/24 16:37:53 UTC

[GitHub] [arrow-datafusion] ozankabak commented on pull request #6034: Implement Streaming Aggregation: Do not break pipeline in aggregation if group by columns are ordered

ozankabak commented on PR #6034:
URL: https://github.com/apache/arrow-datafusion/pull/6034#issuecomment-1520496156

   > For reference, here is the same benchmark run against `main` itself:
   
   This is very helpful. The variance seems larger than one expects.
   
   It seems there may be a tiny slow-down of magnitude noise variance / 2 (in high cardinality cases?). @mustafasrepo and I just had a meeting to go over why it could be. He will respond explaining our theory in greater detail, answer your other questions and maybe even suggest a fix/improvement.
   
   Based on the discussion afterwards and the final numbers, we can reach a consensus on whether we should have two impls with some code duplication, or use the current structure -- we will then take the necessary steps accordingly.
   
   Thanks for all the reviews!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org