You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Alessandro Molina (Jira)" <ji...@apache.org> on 2021/05/04 15:09:00 UTC

[jira] [Created] (ARROW-12650) [Python] Improve documentation regarding dealing with memory mapped files

Alessandro Molina created ARROW-12650:
-----------------------------------------

             Summary: [Python] Improve documentation regarding dealing with memory mapped files
                 Key: ARROW-12650
                 URL: https://issues.apache.org/jira/browse/ARROW-12650
             Project: Apache Arrow
          Issue Type: Improvement
            Reporter: Alessandro Molina


While one of the Arrow promises is that it makes easy to read/write data bigger than memory, it's not immediately obvious from the pyarrow documentation how to deal with memory mapped files.

We hint that you can open files as memory mapped ( [https://arrow.apache.org/docs/python/memory.html?highlight=memory_map#on-disk-and-memory-mapped-files] ) but then we don't explain how to read/write Arrow Arrays or Tables from there.

While most high level functions to read/write formats (pqt, feather, ...) have an easy to guess {{memory_map=True}} option, we don't have any example of how that is meant to work for Arrow format itself. For example how you can do that using {{RecordBatchFile*}}. 

An addition to the memory mapping section that makes a more meaningful example that reads/writes actual arrow data (instead of plain bytes) would probably be more helpful



--
This message was sent by Atlassian Jira
(v8.3.4#803005)