You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (JIRA)" <ji...@apache.org> on 2019/06/20 14:32:00 UTC
[jira] [Updated] (ARROW-5665) [Python] ArrowInvalid on converting
Pandas Series with dtype float64
[ https://issues.apache.org/jira/browse/ARROW-5665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joris Van den Bossche updated ARROW-5665:
-----------------------------------------
Summary: [Python] ArrowInvalid on converting Pandas Series with dtype float64 (was: ArrowInvalid on converting Pandas Series with dtype float64)
> [Python] ArrowInvalid on converting Pandas Series with dtype float64
> --------------------------------------------------------------------
>
> Key: ARROW-5665
> URL: https://issues.apache.org/jira/browse/ARROW-5665
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Reporter: Thibaud Nesztler
> Priority: Minor
>
> {code:java}
> ('Could not convert 0 70.699997\n0 73.000000\n0 0.000000\nName: fact_value, dtype: float64 with type Series: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column fact_value with type float64'){code}
> We are experiencing a lot of random errors (will run the same code and not get the error at all) when converting Pandas Dataframe to parquet files using pyarrow.
> We use this line of code for the convertion:
> {code:java}
> dataframe.to_parquet(filePath, compression="snappy", index=False){code}
> Note: `filePath` is an AWS S3 URI.
> {code:java}
> ArrowInvalid: ('Could not convert 0 70.699997\n0 73.000000\n0 0.000000\nName: fact_value, dtype: float64 with type Series: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column fact_value with type float64')
> File "store_manager.py", line 25, in _write_files_and_partitions
> dataframe.to_parquet(filePath, compression="snappy", index=False)
> File "pandas/core/frame.py", line 2203, in to_parquet
> partition_cols=partition_cols, **kwargs)
> File "pandas/io/parquet.py", line 252, in to_parquet
> partition_cols=partition_cols, **kwargs)
> File "pandas/io/parquet.py", line 113, in write
> table = self.api.Table.from_pandas(df, **from_pandas_kwargs)
> File "pyarrow/table.pxi", line 1139, in pyarrow.lib.Table.from_pandas
> names, arrays, metadata = dataframe_to_arrays(
> File "pyarrow/pandas_compat.py", line 474, in dataframe_to_arrays
> convert_types))
> File "concurrent/futures/_base.py", line 586, in result_iterator
> yield fs.pop().result()
> File "concurrent/futures/_base.py", line 425, in result
> return self.__get_result()
> File "concurrent/futures/_base.py", line 384, in __get_result
> raise self._exception
> File "concurrent/futures/thread.py", line 57, in run
> result = self.fn(*self.args, **self.kwargs)
> File "pyarrow/pandas_compat.py", line 463, in convert_column
> raise e
> File "pyarrow/pandas_compat.py", line 457, in convert_column
> return pa.array(col, type=ty, from_pandas=True, safe=safe)
> File "pyarrow/array.pxi", line 173, in pyarrow.lib.array
> return _sequence_to_array(obj, mask, size, type, pool, from_pandas)
> File "pyarrow/array.pxi", line 36, in pyarrow.lib._sequence_to_array
> check_status(ConvertPySequence(sequence, mask, options, &out))
> File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
> raise ArrowInvalid(message){code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)