You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/24 03:09:30 UTC

[GitHub] [arrow] zifengyu commented on pull request #14158: ARROW-17762: [C++] WIP: Add ordering information to Acero

zifengyu commented on PR #14158:
URL: https://github.com/apache/arrow/pull/14158#issuecomment-1325893750

   This feature is exactly what we need to adapt Acero. I tried to add ExecBatch ordering and implemented the limit operator in our product. Here is what we saw in the tests. 
   
   1. It seems a little difficult to finish the node (and notify downstream node) as the input / output batch counts are not the same. In our case, the finish may happen either when having the limit number of rows or upstream node is finished producing (but not generated limit rows). The former occurs in Queue's deliver task while latter occurs in FetchNode's InputFinished. We did not find an easy way to sync these two components so we moved the queue part inside node and added a counter to track sent rows.
   
   2. We also need the `offset` setting to skip the first a few rows in the limit operator. Can this be included in FetchNode so we may switch back to Acero node in future?
   
   Anyway, this proposal is critical to our using Acero. We are looking forward to its release.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org