You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2021/09/06 14:09:00 UTC
[jira] [Commented] (ARROW-13914) [C++][Python] Optimize type
inference when converting from python values
[ https://issues.apache.org/jira/browse/ARROW-13914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17410623#comment-17410623 ]
Antoine Pitrou commented on ARROW-13914:
----------------------------------------
It seems there is no difference for many types, simply because {{make_unions_}} is always false and therefore the inference finishes almost immediately.
(note that {{make_unions_ = true}} is not implemented at all!)
What remains is a couple types such as integers and lists:
{code:python}
>>> d = [1] * 10000
>>> %timeit pa.array(d)
257 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %%timeit ty = pa.array(d).type
...: pa.array(d, type=ty)
...:
...:
200 µs ± 458 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
{code}
{code:python}
>>> d = [[1, 2]] * 10000
>>> %timeit pa.array(d)
1.41 ms ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %%timeit ty = pa.array(d).type
...: pa.array(d, type=ty)
...:
...:
962 µs ± 9.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
{code}
Therefore, optimizing this sounds low-priority to me.
> [C++][Python] Optimize type inference when converting from python values
> ------------------------------------------------------------------------
>
> Key: ARROW-13914
> URL: https://issues.apache.org/jira/browse/ARROW-13914
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Krisztian Szucs
> Priority: Major
>
> Currently we use an extensive set of checks to infer arrow type from python sequences.
> Last time I checked using asv, the inference part had a significant overhead.
> We could try other approaches to speed-up the type inference, see comments: https://github.com/apache/arrow/pull/11076#discussion_r702808196
--
This message was sent by Atlassian Jira
(v8.3.4#803005)