You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2021/06/04 17:09:00 UTC

[jira] [Updated] (ARROW-12976) [Python] Arrow-to-Python conversion is slow

     [ https://issues.apache.org/jira/browse/ARROW-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Antoine Pitrou updated ARROW-12976:
-----------------------------------
    Description: 
It seems that we are almost 10x slower than Numpy for converting the exact same data to a Python list.

With integers:
{code:python}
>>> arr = np.arange(0,1000, dtype=np.int64)
>>> %timeit arr.tolist()
9.68 µs ± 9.53 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> parr = pa.array(arr)
>>> %timeit parr.to_pylist()
846 µs ± 4.25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
{code}
With floats:
{code:python}
>>> arr = np.arange(0,1000, dtype=np.float64)
>>> %timeit arr.tolist()
10.3 µs ± 289 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> parr = pa.array(arr)
>>> %timeit parr.to_pylist()
878 µs ± 2.75 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
{code}

  was:
It seems that we are almost 10x slower for converting the exact same data to a Python list.

With integers:
{code:python}
>>> arr = np.arange(0,1000, dtype=np.int64)
>>> %timeit arr.tolist()
9.68 µs ± 9.53 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> parr = pa.array(arr)
>>> %timeit parr.to_pylist()
846 µs ± 4.25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
{code}

With floats:
{code:python}
>>> arr = np.arange(0,1000, dtype=np.float64)
>>> %timeit arr.tolist()
10.3 µs ± 289 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> parr = pa.array(arr)
>>> %timeit parr.to_pylist()
878 µs ± 2.75 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
{code}



> [Python] Arrow-to-Python conversion is slow
> -------------------------------------------
>
>                 Key: ARROW-12976
>                 URL: https://issues.apache.org/jira/browse/ARROW-12976
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Antoine Pitrou
>            Priority: Major
>
> It seems that we are almost 10x slower than Numpy for converting the exact same data to a Python list.
> With integers:
> {code:python}
> >>> arr = np.arange(0,1000, dtype=np.int64)
> >>> %timeit arr.tolist()
> 9.68 µs ± 9.53 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
> >>> parr = pa.array(arr)
> >>> %timeit parr.to_pylist()
> 846 µs ± 4.25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
> {code}
> With floats:
> {code:python}
> >>> arr = np.arange(0,1000, dtype=np.float64)
> >>> %timeit arr.tolist()
> 10.3 µs ± 289 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
> >>> parr = pa.array(arr)
> >>> %timeit parr.to_pylist()
> 878 µs ± 2.75 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)