You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2019/08/21 03:34:00 UTC

[jira] [Updated] (ARROW-6222) [Python] Serialising numpy array yields `pyarrow.lib.ArrowNotImplementedError: list`

     [ https://issues.apache.org/jira/browse/ARROW-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney updated ARROW-6222:
--------------------------------
    Summary: [Python] Serialising numpy array yields `pyarrow.lib.ArrowNotImplementedError: list<item: float>`  (was: Serialising numpy array yields `pyarrow.lib.ArrowNotImplementedError: list<item: float>`)

> [Python] Serialising numpy array yields `pyarrow.lib.ArrowNotImplementedError: list<item: float>`
> -------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-6222
>                 URL: https://issues.apache.org/jira/browse/ARROW-6222
>             Project: Apache Arrow
>          Issue Type: Bug
>    Affects Versions: 0.14.1
>            Reporter: Marcel Ackermann
>            Priority: Major
>
> I want to serialize pytorch tensors, but as they are not implemented in arrow yet I convert them to a numpy array like this: {{t.numpy()}} ([https://pytorch.org/docs/stable/tensors.html?highlight=numpy#torch.Tensor.numpy)] which returns an {{ndarray{{. My tensors are 1-dimensional, the result is a 1-dimensional ndarray.
> Calling {{df.to_feather("fname.feather")}} yields {{pyarrow.lib.ArrowNotImplementedError: list<item: float>}}.
> Next I tried {{pyarrow.array(t.numpy())}} which results in {{pyarrow.lib.ArrowInvalid: ('Could not convert [\n  0.00500498,\n  -0.00732583,\n... with type pyarrow.lib.FloatArray: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column 0 with type object')}}.
> I would appreciate if this would work more out-of-the-box.
> Upon request a full example:
> {code:python}
> import torch
> import pyarrow
> import pandas as pd
> pd.DataFrame([[torch.ones(2)]], columns=["0"]).to_feather("fname.feather")
> pd.DataFrame([[torch.ones(2).numpy()]], columns=["0"]).to_feather("fname.feather")
> pd.DataFrame([[pyarrow.array(torch.ones(2).numpy())]], columns=["0"]).to_feather("fname.feather")
> {code}
> {code:python}
> ArrowInvalid: ('Could not convert tensor([1., 1.]) with type Tensor: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column 0 with type object')
> ArrowNotImplementedError: list<item: float>
> ArrowInvalid: ('Could not convert [\n  1,\n  1\n] with type pyarrow.lib.FloatArray: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column 0 with type object')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)