You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2020/09/28 16:19:00 UTC

[jira] [Updated] (ARROW-9633) [C++] Do not toggle memory mapping globally in LocalFileSystem

     [ https://issues.apache.org/jira/browse/ARROW-9633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Antoine Pitrou updated ARROW-9633:
----------------------------------
    Fix Version/s:     (was: 2.0.0)
                   3.0.0

> [C++] Do not toggle memory mapping globally in LocalFileSystem
> --------------------------------------------------------------
>
>                 Key: ARROW-9633
>                 URL: https://issues.apache.org/jira/browse/ARROW-9633
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>             Fix For: 3.0.0
>
>
> In the context of the Datasets API, some file formats benefit greatly from memory mapping (like Arrow IPC files) while other less so. Additionally, in some scenarios, memory mapping could fail when used on network-attached storage devices. Since a filesystem may be used to read different kinds of files and use both memory mapping and non-memory mapping, and additionally the Datasets API should be able to fall back on non-memory mapping if the attempt to memory map fails, it would make sense to have a non-global option for this:
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/localfs.h
> I would suggest adding a new filesystem API with something like {{OpenMappedInputFile}} with some options to control the behavior when memory mapping is not possible. These options may be among:
> * Falling back on a normal RandomAccessFile
> * Reading the entire file into memory (or even tmpfs?) and then wrapping it in a BufferReader
> * Failing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)