You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by simba nyatsanga <si...@gmail.com> on 2018/02/08 21:20:06 UTC

Memory mapping error on pq.read_table

Hi Everyone,

I've encountered a memory mapping error when attempting to read a parquet
file to a Pandas DataFrame. It seems to be happening intermittently though,
I've so far encountered it once. In my case the pq.read_table code is being
invoked in a Linux docker container. I had a look at the docs for the
PyArrow memory and IO management here:
https://arrow.apache.org/docs/python/memory.html

What could give rise to the stacktrace below?

File "read_file.py", line 173, in load_chunked_data return
pq.read_table(data_obj_path, columns=columns).to_pandas()File
"/opt/anaconda-python-5.0.1/lib/python2.7/site-packages/pyarrow/parquet.py",
line 890, in read_table pf = ParquetFile (source,
metadata=metadata)File
"/opt/anaconda-python-5.0.1/lib/python2.7/site-packages/pyarrow/parquet.py",
line 56, in __init__ self.reader.open(source, metadata=metadata)File
"pyarrow/_parquet.pyx", line 624, in
pyarrow._parquet.ParquetReader.open
(/arrow/python/build/temp.linux-x86_64-2.7/_parquet.cxx:11558)
get_reader(source, &rd_handle)File "pyarrow/io.pxi", line 798, in
pyarrow.lib.get_reader
(/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:58504) source =
memory_map(source, mode='r')File "pyarrow/io.pxi", line 473, in
pyarrow.lib.memory_map
(/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:54834)
mmap._open(path, mode)File "pyarrow/io.pxi", line 452, in
pyarrow.lib.MemoryMappedFile ._open
(/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:54613)
check_status(CMemoryMappedFile .Open(c_path, c_mode, &handle))File
"pyarrow/error.pxi", line 79, in pyarrow.lib.check_status
(/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:8345) raise
ArrowIOError(message) ArrowIOError: Memory mapping file failed, errno:
22



Thanks for the help.

Kind Regards
Simba

Re: Memory mapping error on pq.read_table

Posted by Wes McKinney <we...@gmail.com>.
hi Simba,

is it possible the file has zero length?

$ touch foo
$ ipython

In [1]: import pyarrow

In [2]: pyarrow.memory_map('foo')
---------------------------------------------------------------------------
ArrowIOError                              Traceback (most recent call last)
<ipython-input-2-1111f1c5d786> in <module>()
----> 1 pyarrow.memory_map('foo')

/home/wesm/code/arrow/python/pyarrow/io.pxi in pyarrow.lib.memory_map
(/home/wesm/code/arrow/python/build/temp.linux-x86_64-3.5/lib.cxx:55830)()

/home/wesm/code/arrow/python/pyarrow/io.pxi in
pyarrow.lib.MemoryMappedFile._open
(/home/wesm/code/arrow/python/build/temp.linux-x86_64-3.5/lib.cxx:55609)()

/home/wesm/code/arrow/python/pyarrow/error.pxi in
pyarrow.lib.check_status
(/home/wesm/code/arrow/python/build/temp.linux-x86_64-3.5/lib.cxx:8379)()

ArrowIOError: /home/wesm/code/arrow/cpp/src/arrow/io/file.cc:690 code:
result->memory_map_->Open(path, mode)
Memory mapping file failed, errno: 22

In [3]: import pyarrow.parquet as pq

In [4]: pq.read_table('foo')
<SNIP>

ArrowIOError: /home/wesm/code/arrow/cpp/src/arrow/io/file.cc:690 code:
result->memory_map_->Open(path, mode)
Memory mapping file failed, errno: 22

That's admittedly not the best error message, opening a JIRA to
improve that: https://issues.apache.org/jira/browse/ARROW-2118

- Wes

On Thu, Feb 8, 2018 at 4:20 PM, simba nyatsanga <si...@gmail.com> wrote:
> Hi Everyone,
>
> I've encountered a memory mapping error when attempting to read a parquet
> file to a Pandas DataFrame. It seems to be happening intermittently though,
> I've so far encountered it once. In my case the pq.read_table code is being
> invoked in a Linux docker container. I had a look at the docs for the
> PyArrow memory and IO management here:
> https://arrow.apache.org/docs/python/memory.html
>
> What could give rise to the stacktrace below?
>
> File "read_file.py", line 173, in load_chunked_data return
> pq.read_table(data_obj_path, columns=columns).to_pandas()File
> "/opt/anaconda-python-5.0.1/lib/python2.7/site-packages/pyarrow/parquet.py",
> line 890, in read_table pf = ParquetFile (source,
> metadata=metadata)File
> "/opt/anaconda-python-5.0.1/lib/python2.7/site-packages/pyarrow/parquet.py",
> line 56, in __init__ self.reader.open(source, metadata=metadata)File
> "pyarrow/_parquet.pyx", line 624, in
> pyarrow._parquet.ParquetReader.open
> (/arrow/python/build/temp.linux-x86_64-2.7/_parquet.cxx:11558)
> get_reader(source, &rd_handle)File "pyarrow/io.pxi", line 798, in
> pyarrow.lib.get_reader
> (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:58504) source =
> memory_map(source, mode='r')File "pyarrow/io.pxi", line 473, in
> pyarrow.lib.memory_map
> (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:54834)
> mmap._open(path, mode)File "pyarrow/io.pxi", line 452, in
> pyarrow.lib.MemoryMappedFile ._open
> (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:54613)
> check_status(CMemoryMappedFile .Open(c_path, c_mode, &handle))File
> "pyarrow/error.pxi", line 79, in pyarrow.lib.check_status
> (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:8345) raise
> ArrowIOError(message) ArrowIOError: Memory mapping file failed, errno:
> 22
>
>
>
> Thanks for the help.
>
> Kind Regards
> Simba