You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Thibaud Nesztler (JIRA)" <ji...@apache.org> on 2019/06/20 10:59:00 UTC
[jira] [Updated] (ARROW-5665) ArrowInvalid on converting Pandas
Series with dtype float64
[ https://issues.apache.org/jira/browse/ARROW-5665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thibaud Nesztler updated ARROW-5665:
------------------------------------
Description:
{code:java}
('Could not convert 0 70.699997\n0 73.000000\n0 0.000000\nName: fact_value, dtype: float64 with type Series: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column fact_value with type float64'){code}
We are experiencing a lot of random errors (will run the same code and not get the error at all) when converting Pandas Dataframe to parquet files using pyarrow.
We use this line of code for the convertion:
{code:java}
dataframe.to_parquet(filePath, compression="snappy", index=False){code}
Note: `filePath` is an AWS S3 URI.
ArrowInvalid: ('Could not convert 0 70.699997\n0 73.000000\n0 0.000000\nName: fact_value, dtype: float64 with type Series: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column fact_value with type float64')
File "store_manager.py", line 25, in _write_files_and_partitions
dataframe.to_parquet(filePath, compression="snappy", index=False)
File "pandas/core/frame.py", line 2203, in to_parquet
partition_cols=partition_cols, **kwargs)
File "pandas/io/parquet.py", line 252, in to_parquet
partition_cols=partition_cols, **kwargs)
File "pandas/io/parquet.py", line 113, in write
table = self.api.Table.from_pandas(df, **from_pandas_kwargs)
File "pyarrow/table.pxi", line 1139, in pyarrow.lib.Table.from_pandas
names, arrays, metadata = dataframe_to_arrays(
File "pyarrow/pandas_compat.py", line 474, in dataframe_to_arrays
convert_types))
File "concurrent/futures/_base.py", line 586, in result_iterator
yield fs.pop().result()
File "concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "pyarrow/pandas_compat.py", line 463, in convert_column
raise e
File "pyarrow/pandas_compat.py", line 457, in convert_column
return pa.array(col, type=ty, from_pandas=True, safe=safe)
File "pyarrow/array.pxi", line 173, in pyarrow.lib.array
return _sequence_to_array(obj, mask, size, type, pool, from_pandas)
File "pyarrow/array.pxi", line 36, in pyarrow.lib._sequence_to_array
check_status(ConvertPySequence(sequence, mask, options, &out))
File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
raise ArrowInvalid(message)
was:
{code:java}
('Could not convert 0 70.699997\n0 73.000000\n0 0.000000\nName: fact_value, dtype: float64 with type Series: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column fact_value with type float64'){code}
We are experiencing a lot of random errors (will run the same code and not get the error at all) when converting Pandas Dataframe to parquet files using pyarrow.
We use this line of code for the convertion:
{code:java}
dataframe.to_parquet(filePath, compression="snappy", index=False)
{code}
> ArrowInvalid on converting Pandas Series with dtype float64
> -----------------------------------------------------------
>
> Key: ARROW-5665
> URL: https://issues.apache.org/jira/browse/ARROW-5665
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Reporter: Thibaud Nesztler
> Priority: Minor
>
> {code:java}
> ('Could not convert 0 70.699997\n0 73.000000\n0 0.000000\nName: fact_value, dtype: float64 with type Series: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column fact_value with type float64'){code}
> We are experiencing a lot of random errors (will run the same code and not get the error at all) when converting Pandas Dataframe to parquet files using pyarrow.
> We use this line of code for the convertion:
> {code:java}
> dataframe.to_parquet(filePath, compression="snappy", index=False){code}
> Note: `filePath` is an AWS S3 URI.
> ArrowInvalid: ('Could not convert 0 70.699997\n0 73.000000\n0 0.000000\nName: fact_value, dtype: float64 with type Series: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column fact_value with type float64')
> File "store_manager.py", line 25, in _write_files_and_partitions
> dataframe.to_parquet(filePath, compression="snappy", index=False)
> File "pandas/core/frame.py", line 2203, in to_parquet
> partition_cols=partition_cols, **kwargs)
> File "pandas/io/parquet.py", line 252, in to_parquet
> partition_cols=partition_cols, **kwargs)
> File "pandas/io/parquet.py", line 113, in write
> table = self.api.Table.from_pandas(df, **from_pandas_kwargs)
> File "pyarrow/table.pxi", line 1139, in pyarrow.lib.Table.from_pandas
> names, arrays, metadata = dataframe_to_arrays(
> File "pyarrow/pandas_compat.py", line 474, in dataframe_to_arrays
> convert_types))
> File "concurrent/futures/_base.py", line 586, in result_iterator
> yield fs.pop().result()
> File "concurrent/futures/_base.py", line 425, in result
> return self.__get_result()
> File "concurrent/futures/_base.py", line 384, in __get_result
> raise self._exception
> File "concurrent/futures/thread.py", line 57, in run
> result = self.fn(*self.args, **self.kwargs)
> File "pyarrow/pandas_compat.py", line 463, in convert_column
> raise e
> File "pyarrow/pandas_compat.py", line 457, in convert_column
> return pa.array(col, type=ty, from_pandas=True, safe=safe)
> File "pyarrow/array.pxi", line 173, in pyarrow.lib.array
> return _sequence_to_array(obj, mask, size, type, pool, from_pandas)
> File "pyarrow/array.pxi", line 36, in pyarrow.lib._sequence_to_array
> check_status(ConvertPySequence(sequence, mask, options, &out))
> File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
> raise ArrowInvalid(message)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)