You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Micah Kornfield (Jira)" <ji...@apache.org> on 2021/10/10 04:47:00 UTC

[jira] [Commented] (ARROW-12976) [Python] Arrow-to-Python conversion is slow

    [ https://issues.apache.org/jira/browse/ARROW-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17426744#comment-17426744 ] 

Micah Kornfield commented on ARROW-12976:
-----------------------------------------

[~apitrou] [~jorisvandenbossche] going to see if I consolidate this logic in C++ (unless you were thinking of taking it up).  Any preference for trying to split up into smaller PRs or one large one to migrate all types to C++ code?

> [Python] Arrow-to-Python conversion is slow
> -------------------------------------------
>
>                 Key: ARROW-12976
>                 URL: https://issues.apache.org/jira/browse/ARROW-12976
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Antoine Pitrou
>            Assignee: Micah Kornfield
>            Priority: Major
>
> It seems that we are 20x slower than Numpy for converting the exact same data to a Python list.
> With integers:
> {code:python}
> >>> arr = np.arange(0,1000, dtype=np.int64)
> >>> %timeit arr.tolist()
> 8.24 µs ± 3.46 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
> >>> parr = pa.array(arr)
> >>> %timeit parr.to_pylist()
> 218 µs ± 2.39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
> {code}
> With floats:
> {code:python}
> >>> arr = np.arange(0,1000, dtype=np.float64)
> >>> %timeit arr.tolist()
> 10.2 µs ± 25.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
> >>> parr = pa.array(arr)
> >>> %timeit parr.to_pylist()
> 199 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)