You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Vibhatha Lakmal Abeykoon (Jira)" <ji...@apache.org> on 2022/07/22 10:02:00 UTC

[jira] [Created] (ARROW-17183) [C++] Adding ExecNode with Sort and Fetch capability

Vibhatha Lakmal Abeykoon created ARROW-17183:
------------------------------------------------

             Summary: [C++] Adding ExecNode with Sort and Fetch capability
                 Key: ARROW-17183
                 URL: https://issues.apache.org/jira/browse/ARROW-17183
             Project: Apache Arrow
          Issue Type: New Feature
          Components: C++
            Reporter: Vibhatha Lakmal Abeykoon
            Assignee: Vibhatha Lakmal Abeykoon


In Substrait integrations with ACERO, a functionality required is the ability to fetch records sorted and unsorted.

Fetch operation is defined as selecting `K` number of records with an offset. For instance pick 10 records skipping the first 5 elements. Here we can define this as a Slice operation and records can be easily extracted in a sink-node. 

Sort and Fetch operation applies when we need to execute a Fetch operation on sorted data. The main issue is we cannot have a sort node followed by a fetch. The reason is that all existing node definitions supporting sort are based on sink nodes. Since there cannot be a node followed by sink, this functionality has to take place in a single node. 

But this is not a perfect solution for fetch and sort, but one way to do this is define a sink node where the records are sorted and then a set of items are fetched. 

Another dilema is what if sort is followed by a fetch. In that case, there has to be a flag to enable the order of the operations. 

The objective of this ticket is to discuss a viable efficient solution and include new nodes or a method to execute such a logic.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)