You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/02/14 21:02:00 UTC

[jira] [Created] (ARROW-15681) [C++] Allow the write node to respect sorting

Weston Pace created ARROW-15681:
-----------------------------------

             Summary: [C++] Allow the write node to respect sorting
                 Key: ARROW-15681
                 URL: https://issues.apache.org/jira/browse/ARROW-15681
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace


A user should be able to sort by some criteria and then write out the dataset in a sorted fashion.  Partitions would not be sorted in any way (they are essentially outer sort keys).  However, the chunks inside a partition should be sorted such that chunk-N comes before chunk-X if N < X.

Assuming we come up with some kind of mid-plan sorting approach (will likely be needed by window functions) then this should be pretty straightforward to implement efficiently as the dataset writer already assigns chunk ids on a serialized path.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)