You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2019/09/17 10:45:00 UTC

[jira] [Commented] (ARROW-6578) [Python] Casting int64 to string columns

    [ https://issues.apache.org/jira/browse/ARROW-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931282#comment-16931282 ] 

Antoine Pitrou commented on ARROW-6578:
---------------------------------------

A CSV file is made of strings. So you're saying that converting the CSV values to int64 and then converting them back to strings would be faster that reading them directly as strings? I doubt it.

Regardless, it could be useful to have a int64 \-> string cast implementation. But I don't think it is the right solution in your case :-)

> [Python] Casting int64 to string columns
> ----------------------------------------
>
>                 Key: ARROW-6578
>                 URL: https://issues.apache.org/jira/browse/ARROW-6578
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>    Affects Versions: 0.14.1
>            Reporter: Igor Yastrebov
>            Priority: Major
>
> I wanted to cast a list of a tables to the same schema so I could use concat_tables later. However, I encountered ArrowNotImplementedError:
> {code:java}
> ---------------------------------------------------------------------------
> ArrowNotImplementedError                  Traceback (most recent call last)
> <ipython-input-11-bd4916c221bf> in <module>
> ----> 1 list_tb = [i.cast(mts_schema, safe = True) for i in list_tb]
> <ipython-input-11-bd4916c221bf> in <listcomp>(.0)
> ----> 1 list_tb = [i.cast(mts_schema, safe = True) for i in list_tb]
> ~\AppData\Local\Continuum\miniconda3\envs\cyclone\lib\site-packages\pyarrow\table.pxi in itercolumns()
> ~\AppData\Local\Continuum\miniconda3\envs\cyclone\lib\site-packages\pyarrow\table.pxi in pyarrow.lib.Column.cast()
> ~\AppData\Local\Continuum\miniconda3\envs\cyclone\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.check_status()
> ArrowNotImplementedError: No cast implemented from int64 to string
> {code}
> Some context: I want to read and concatenate a bunch of csv files that come from partitioning of the same table. Using cast after reading csv is usually significantly faster than specifying column_types in ConvertOptions. There are string columns that are mostly populated with integer-like values so a particular file can have an integer-only column. This situation is rather common so having an option to cast int64 column to string column would be helpful.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)