You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Todd Farmer (Jira)" <ji...@apache.org> on 2022/07/12 14:05:03 UTC

[jira] [Assigned] (ARROW-14524) [C++] Create plugging/coalescing filesystem wrapper

     [ https://issues.apache.org/jira/browse/ARROW-14524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Farmer reassigned ARROW-14524:
-----------------------------------

    Assignee:     (was: Weston Pace)

This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.

> [C++] Create plugging/coalescing filesystem wrapper
> ---------------------------------------------------
>
>                 Key: ARROW-14524
>                 URL: https://issues.apache.org/jira/browse/ARROW-14524
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Weston Pace
>            Priority: Major
>
> We have I/O optimizations scattered across some of our readers.  The most prominent example is prebuffering in the parquet reader.  However, these techniques are rather general purpose and will apply in IPC (see ARROW-14229) as well as other readers (e.g. Orc, maybe even CSV)
> This filesystem wrapper will not generally be necessary for local filesystems as the OS' filesystem schedulers are sufficient.  Most of these we can accomplish by simply aiming for some configurable degree of parallelism (e.g. if there are already X requests in progress then start batching).
> Goals:
>  * Batch consecutive small requests into fewer large requests
>    * We could plug (configurably) small holes in read ranges as well
>  * Potentially split large requests into concurrent small requests
>  * Support for the RandomAccessFile::WillNeed command by prefetching ranges



--
This message was sent by Atlassian Jira
(v8.20.10#820010)