You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2021/09/06 14:09:00 UTC

[jira] [Commented] (ARROW-13914) [C++][Python] Optimize type inference when converting from python values

    [ https://issues.apache.org/jira/browse/ARROW-13914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17410623#comment-17410623 ] 

Antoine Pitrou commented on ARROW-13914:
----------------------------------------

It seems there is no difference for many types, simply because {{make_unions_}} is always false and therefore the inference finishes almost immediately.
(note that {{make_unions_ = true}} is not implemented at all!)

What remains is a couple types such as integers and lists:
{code:python}
>>> d = [1] * 10000
>>> %timeit pa.array(d)
257 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %%timeit ty = pa.array(d).type
...: pa.array(d, type=ty)
...: 
...: 
200 µs ± 458 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
{code}

{code:python}
>>> d = [[1, 2]] * 10000
>>> %timeit pa.array(d)
1.41 ms ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %%timeit ty = pa.array(d).type
...: pa.array(d, type=ty)
...: 
...: 
962 µs ± 9.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
{code}

Therefore, optimizing this sounds low-priority to me.

> [C++][Python] Optimize type inference when converting from python values
> ------------------------------------------------------------------------
>
>                 Key: ARROW-13914
>                 URL: https://issues.apache.org/jira/browse/ARROW-13914
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Krisztian Szucs
>            Priority: Major
>
> Currently we use an extensive set of checks to infer arrow type from python sequences. 
> Last time I checked using asv, the inference part had a significant overhead. 
> We could try other approaches to speed-up the type inference, see comments: https://github.com/apache/arrow/pull/11076#discussion_r702808196



--
This message was sent by Atlassian Jira
(v8.3.4#803005)