You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Phillip Cloud (JIRA)" <ji...@apache.org> on 2018/02/23 17:48:00 UTC
[jira] [Updated] (ARROW-2160) [C++/Python] Fix decimal precision
inference
[ https://issues.apache.org/jira/browse/ARROW-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phillip Cloud updated ARROW-2160:
---------------------------------
Summary: [C++/Python] Fix decimal precision inference (was: [C++/Python] Fix decimal precision inference)
> [C++/Python] Fix decimal precision inference
> --------------------------------------------
>
> Key: ARROW-2160
> URL: https://issues.apache.org/jira/browse/ARROW-2160
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Affects Versions: 0.8.0
> Reporter: Antony Mayi
> Assignee: Phillip Cloud
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.9.0
>
>
> {code}
> import pyarrow as pa
> import pandas as pd
> import decimal
> df = pd.DataFrame({'a': [decimal.Decimal('0.1'), decimal.Decimal('0.01')]})
> pa.Table.from_pandas(df)
> {code}
> raises:
> {code}
> pyarrow.lib.ArrowInvalid: Decimal type with precision 2 does not fit into precision inferred from first array element: 1
> {code}
> Looks arrow is inferring the highest precision for given column based on the first cell and expecting the rest fits in. I understand this is by design but from the point of view of pandas-arrow compatibility this is quite painful as pandas is more flexible (as demonstrated).
> What this means is that user trying to pass pandas {{DataFrame}} with {{Decimal}} column(s) to arrow {{Table}} would always have to first:
> # Find the highest precision used in (each of) that column(s)
> # Adjust the first cell of (each of) that column(s) so that it explicitly uses the highest precision of that column(s)
> # Only then pass such {{DataFrame}} to {{Table.from_pandas()}}
> So given this unavoidable procedure (and assuming arrow needs to be strict about the highest precision for a column) - shouldn't some similar logic be part of the {{Table.from_pandas()}} directly to make this transparent?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)