You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2021/04/14 21:11:00 UTC

[jira] [Commented] (ARROW-12392) [C++] Restore asynchronous streaming scanner as a mirror API

    [ https://issues.apache.org/jira/browse/ARROW-12392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17321322#comment-17321322 ] 

Weston Pace commented on ARROW-12392:
-------------------------------------

[~apitrou] Do you have any opinions on where to draw the line with using RunInSerialExecutor?  If we use it here then we will need to call RunInSerialExecutor every time we call ReadNext.  On the other hand, if we don't use it here then we need to implement two code-paths.

My hunch (and preference) would be that the overhead is small enough we can use RunInSerialExecutor.  This is already at the per-batch level and we are already making at an I/O call per batch and then paying the cost of the handoff from the background reader to the processing thread.

> [C++] Restore asynchronous streaming scanner as a mirror API
> ------------------------------------------------------------
>
>                 Key: ARROW-12392
>                 URL: https://issues.apache.org/jira/browse/ARROW-12392
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Priority: Major
>
> In order to support the AsyncScanner we need the asynchronous streaming CSV reader back (added in ARROW-11887 but reverted later).  However, it will either need to be implemented as a mirror API (so the sync and async implementations are side-by-side) or the async-API must be wrapped with RunInSerialExecutor when called synchronously.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)