You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "David Li (Jira)" <ji...@apache.org> on 2021/11/03 14:26:01 UTC

[jira] [Resolved] (ARROW-12683) [C++] Enable fine-grained I/O (coalescing) in IPC reader

     [ https://issues.apache.org/jira/browse/ARROW-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Li resolved ARROW-12683.
------------------------------
    Fix Version/s: 7.0.0
       Resolution: Fixed

Issue resolved by pull request 11486
[https://github.com/apache/arrow/pull/11486]

> [C++] Enable fine-grained I/O (coalescing) in IPC reader
> --------------------------------------------------------
>
>                 Key: ARROW-12683
>                 URL: https://issues.apache.org/jira/browse/ARROW-12683
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: David Li
>            Assignee: Yue Ni
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 7.0.0
>
>          Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> ARROW-11772 enables I/O coalescing in the IPC reader, but the reader operates at the granularity of an entire record batch; even if you're loading only a few columns, the entire record batch is read. When on a high-latency file system (e.g. S3), we may be able to get further performance improvement by traversing the schema and reading only the buffers we need to read. This can be combined with coalescing to reduce the number of I/O calls that need to be made.
> (Maybe there's another savings here in that instead of traversing the schema every time to figure out the buffer layout, we can do that only once up front and then reuse the layout subsequently?)
> While ArrayLoader already appears to perform this optimization, it's being handed an in-memory buffer in the first place, so no savings are accomplished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)