You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Thibaud Nesztler (JIRA)" <ji...@apache.org> on 2019/06/20 10:59:00 UTC
[jira] [Updated] (ARROW-5665) ArrowInvalid on converting Pandas Series with dtype float64

     [ https://issues.apache.org/jira/browse/ARROW-5665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thibaud Nesztler updated ARROW-5665:
------------------------------------
    Description: 
{code:java}
('Could not convert 0 70.699997\n0 73.000000\n0 0.000000\nName: fact_value, dtype: float64 with type Series: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column fact_value with type float64'){code}
We are experiencing a lot of random errors (will run the same code and not get the error at all) when converting Pandas Dataframe to parquet files using pyarrow.

We use this line of code for the convertion:
{code:java}
dataframe.to_parquet(filePath, compression="snappy", index=False){code}
Note: `filePath` is an AWS S3 URI.
ArrowInvalid: ('Could not convert 0    70.699997\n0    73.000000\n0     0.000000\nName: fact_value, dtype: float64 with type Series: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column fact_value with type float64')
  File "store_manager.py", line 25, in _write_files_and_partitions
    dataframe.to_parquet(filePath, compression="snappy", index=False)
  File "pandas/core/frame.py", line 2203, in to_parquet
    partition_cols=partition_cols, **kwargs)
  File "pandas/io/parquet.py", line 252, in to_parquet
    partition_cols=partition_cols, **kwargs)
  File "pandas/io/parquet.py", line 113, in write
    table = self.api.Table.from_pandas(df, **from_pandas_kwargs)
  File "pyarrow/table.pxi", line 1139, in pyarrow.lib.Table.from_pandas
    names, arrays, metadata = dataframe_to_arrays(
  File "pyarrow/pandas_compat.py", line 474, in dataframe_to_arrays
    convert_types))
  File "concurrent/futures/_base.py", line 586, in result_iterator
    yield fs.pop().result()
  File "concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "pyarrow/pandas_compat.py", line 463, in convert_column
    raise e
  File "pyarrow/pandas_compat.py", line 457, in convert_column
    return pa.array(col, type=ty, from_pandas=True, safe=safe)
  File "pyarrow/array.pxi", line 173, in pyarrow.lib.array
    return _sequence_to_array(obj, mask, size, type, pool, from_pandas)
  File "pyarrow/array.pxi", line 36, in pyarrow.lib._sequence_to_array
    check_status(ConvertPySequence(sequence, mask, options, &out))
  File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
    raise ArrowInvalid(message)

  was:
{code:java}
('Could not convert 0 70.699997\n0 73.000000\n0 0.000000\nName: fact_value, dtype: float64 with type Series: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column fact_value with type float64'){code}
We are experiencing a lot of random errors (will run the same code and not get the error at all) when converting Pandas Dataframe to parquet files using pyarrow.

We use this line of code for the convertion:
{code:java}
dataframe.to_parquet(filePath, compression="snappy", index=False)
{code}


> ArrowInvalid on converting Pandas Series with dtype float64
> -----------------------------------------------------------
>
>                 Key: ARROW-5665
>                 URL: https://issues.apache.org/jira/browse/ARROW-5665
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Thibaud Nesztler
>            Priority: Minor
>
> {code:java}
> ('Could not convert 0 70.699997\n0 73.000000\n0 0.000000\nName: fact_value, dtype: float64 with type Series: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column fact_value with type float64'){code}
> We are experiencing a lot of random errors (will run the same code and not get the error at all) when converting Pandas Dataframe to parquet files using pyarrow.
> We use this line of code for the convertion:
> {code:java}
> dataframe.to_parquet(filePath, compression="snappy", index=False){code}
> Note: `filePath` is an AWS S3 URI.
> ArrowInvalid: ('Could not convert 0    70.699997\n0    73.000000\n0     0.000000\nName: fact_value, dtype: float64 with type Series: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column fact_value with type float64')
>   File "store_manager.py", line 25, in _write_files_and_partitions
>     dataframe.to_parquet(filePath, compression="snappy", index=False)
>   File "pandas/core/frame.py", line 2203, in to_parquet
>     partition_cols=partition_cols, **kwargs)
>   File "pandas/io/parquet.py", line 252, in to_parquet
>     partition_cols=partition_cols, **kwargs)
>   File "pandas/io/parquet.py", line 113, in write
>     table = self.api.Table.from_pandas(df, **from_pandas_kwargs)
>   File "pyarrow/table.pxi", line 1139, in pyarrow.lib.Table.from_pandas
>     names, arrays, metadata = dataframe_to_arrays(
>   File "pyarrow/pandas_compat.py", line 474, in dataframe_to_arrays
>     convert_types))
>   File "concurrent/futures/_base.py", line 586, in result_iterator
>     yield fs.pop().result()
>   File "concurrent/futures/_base.py", line 425, in result
>     return self.__get_result()
>   File "concurrent/futures/_base.py", line 384, in __get_result
>     raise self._exception
>   File "concurrent/futures/thread.py", line 57, in run
>     result = self.fn(*self.args, **self.kwargs)
>   File "pyarrow/pandas_compat.py", line 463, in convert_column
>     raise e
>   File "pyarrow/pandas_compat.py", line 457, in convert_column
>     return pa.array(col, type=ty, from_pandas=True, safe=safe)
>   File "pyarrow/array.pxi", line 173, in pyarrow.lib.array
>     return _sequence_to_array(obj, mask, size, type, pool, from_pandas)
>   File "pyarrow/array.pxi", line 36, in pyarrow.lib._sequence_to_array
>     check_status(ConvertPySequence(sequence, mask, options, &out))
>   File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
>     raise ArrowInvalid(message)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)