You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "ozankabak (via GitHub)" <gi...@apache.org> on 2023/03/02 22:27:33 UTC

[GitHub] [arrow-datafusion] ozankabak commented on issue #5230: Use Arrow Row Format in SortExec

ozankabak commented on issue #5230:
URL: https://github.com/apache/arrow-datafusion/issues/5230#issuecomment-1452643344

   @jaylmiller, we recently ran into something similar to your observation. We are improving `PARTITION BY` clauses in window calculations to avoid pipeline-breaking sorts for non-sorted data (by using hashing instead), and we utilized row converter to see if/how much it helps.
   
   In test cases with a single partition, it definitely helps. In test cases where we have multiple partitions, batch sizes get smaller (since there is no automatic batch coalescing) and it results in a slowdown. This is in agreement with your theory, right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org