You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2019/09/11 13:36:00 UTC
[jira] [Created] (ARROW-6529) [C++] Feather: slow writing of
NullArray
Joris Van den Bossche created ARROW-6529:
--------------------------------------------
Summary: [C++] Feather: slow writing of NullArray
Key: ARROW-6529
URL: https://issues.apache.org/jira/browse/ARROW-6529
Project: Apache Arrow
Issue Type: Bug
Components: C++
Reporter: Joris Van den Bossche
From https://stackoverflow.com/questions/57877017/pandas-feather-format-is-slow-when-writing-a-column-of-none
Smaller example with just using pyarrow, it seems that writing an array of nulls takes much longer than an array of for example ints, which seems a bit strange:
{code}
In [93]: arr = pa.array([1]*1000)
In [94]: %%timeit
...: w = pyarrow.feather.FeatherWriter('__test.feather')
...: w.writer.write_array('x', arr)
...: w.writer.close()
31.4 µs ± 464 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [95]: arr = pa.array([None]*1000)
In [96]: arr
Out[96]:
<pyarrow.lib.NullArray object at 0x7fa47a23ca40>
1000 nulls
In [97]: %%timeit
...: w = pyarrow.feather.FeatherWriter('__test.feather')
...: w.writer.write_array('x', arr)
...: w.writer.close()
3.75 ms ± 64.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
{code}
So writing the same length NullArray takes ca 100x more time.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)