You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2018/08/03 10:16:00 UTC

[jira] [Commented] (ARROW-2966) [Python] Data type conversion error

    [ https://issues.apache.org/jira/browse/ARROW-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568054#comment-16568054 ] 

Wes McKinney commented on ARROW-2966:
-------------------------------------

I'm in the midst of refactoring this code path in ARROW-2814. I made a note to add more informative error output for this case, to show both the type and the string repr of the invalid value.

In the future it would be useful to treat unconvertible values as null: ARROW-2967

> [Python] Data type conversion error
> -----------------------------------
>
>                 Key: ARROW-2966
>                 URL: https://issues.apache.org/jira/browse/ARROW-2966
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.9.0
>         Environment: linux
>            Reporter: Christopher Brooks
>            Priority: Major
>
> I have a big pandas dataframe. I try and convert that to a pyarrow table and it fails with a conversion error. Not sure if this is a bug or is expected? 
> I realize the code below showing the error is pretty useless as is. *What can I do to help identify the cause in my pandas dataframe?*
> Here's the error:
>  
> {code:java}
> In [17]: pa.Table.from_pandas(df)
> ---------------------------------------------------------------------------
> ArrowInvalid Traceback (most recent call last)
> <ipython-input-17-6eac5d0eec08> in <module>()
> ----> 1 pa.Table.from_pandas(df)
> table.pxi in pyarrow.lib.Table.from_pandas()
> ~/.local/share/virtualenvs/iq-si-grade-prediction-zHTZ6n2S/lib/python3.6/site-packages/pyarrow/pandas_compat.py in dataframe_to_arrays(df, schema, preserve_index, nthreads)
> 375 arrays = list(executor.map(convert_column,
> 376 columns_to_convert,
> --> 377 convert_types))
> 378 
> 379 types = [x.type for x in arrays]
> ~/anaconda3/lib/python3.6/concurrent/futures/_base.py in result_iterator()
> 584 # Careful not to keep a reference to the popped future
> 585 if timeout is None:
> --> 586 yield fs.pop().result()
> 587 else:
> 588 yield fs.pop().result(end_time - time.time())
> ~/anaconda3/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
> 423 raise CancelledError()
> 424 elif self._state == FINISHED:
> --> 425 return self.__get_result()
> 426 
> 427 self._condition.wait(timeout)
> ~/anaconda3/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
> 382 def __get_result(self):
> 383 if self._exception:
> --> 384 raise self._exception
> 385 else:
> 386 return self._result
> ~/anaconda3/lib/python3.6/concurrent/futures/thread.py in run(self)
> 54 
> 55 try:
> ---> 56 result = self.fn(*self.args, **self.kwargs)
> 57 except BaseException as exc:
> 58 self.future.set_exception(exc)
> ~/.local/share/virtualenvs/iq-si-grade-prediction-zHTZ6n2S/lib/python3.6/site-packages/pyarrow/pandas_compat.py in convert_column(col, ty)
> 364 
> 365 def convert_column(col, ty):
> --> 366 return pa.array(col, from_pandas=True, type=ty)
> 367 
> 368 if nthreads == 1:
> array.pxi in pyarrow.lib.array()
> error.pxi in pyarrow.lib.check_status()
> error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Error converting from Python objects to Double: Got Python object of type str but can only handle these types: float
> In [18]: pa.__version__
> Out[18]: '0.9.0'
> In [19]: pd.__version__
> Out[19]: '0.23.3'
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)