You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "abdel alfahham (Jira)" <ji...@apache.org> on 2021/03/30 13:27:00 UTC

[jira] [Created] (ARROW-12150) [Python] Invalid data when Decimal is exported to parquet

abdel alfahham created ARROW-12150:
--------------------------------------

             Summary: [Python] Invalid data when Decimal is exported to parquet 
                 Key: ARROW-12150
                 URL: https://issues.apache.org/jira/browse/ARROW-12150
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 3.0.0
         Environment: - macOS Big Sur 11.2.1
- python 3.8.2
            Reporter: abdel alfahham


Exporting pyarrow.table that contains mixed-precision Decimals using  parquet.write_table creates a parquet that contains invalid data/values.

In the example below the first value of data_decimal is turned from Decimal('579.11999511718795474735088646411895751953125000000000') in the pyarrow table to Decimal('-378.68971792399258172661600550482428224218070136475136') in the parquet.

 
import pyarrow
from decimal import Decimal

values_floats = [579.119995117188, 6.40999984741211, 2.0] # floats
decs_from_values = [Decimal(v) for v in values_floats] # Decimal
decs_from_float = [Decimal.from_float(v) for v in values_floats] # Decimal using from_float
decs_str = [Decimal(str(v)) for v in values_floats] # Decimal 

data_dict = \{"data_decimal": decs_from_values, # python Decimal
             "data_decimal_from_float": decs_from_float, # python Decimal using from_float
             "data_float":values_floats, # python floats
             "data_dec_str": decs_str}

table = pyarrow.table(data=data_dict)

print(table.to_pydict()) # before saving
pyarrow.parquet.write_table(table, "./pyarrow_decimal.parquet") # saving
print(pyarrow.parquet.read_table("./pyarrow_decimal.parquet").to_pydict()) # after saving



--
This message was sent by Atlassian Jira
(v8.3.4#803005)