You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/09/30 15:59:00 UTC

[jira] [Created] (ARROW-10145) [C++][Dataset] Integer-like partition field values outside int32 range error on reading

Joris Van den Bossche created ARROW-10145:
---------------------------------------------

             Summary: [C++][Dataset] Integer-like partition field values outside int32 range error on reading
                 Key: ARROW-10145
                 URL: https://issues.apache.org/jira/browse/ARROW-10145
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
            Reporter: Joris Van den Bossche


From https://stackoverflow.com/questions/64137664/how-to-override-type-inference-for-partition-columns-in-hive-partitioned-dataset

Small reproducer:

{code}
import pyarrow as pa
import pyarrow.parquet as pq

table = pa.table({'part': [3760212050]*10, 'col': range(10)})
pq.write_to_dataset(table, "test_int64_partition", partition_cols=['part'])

In [35]: pq.read_table("test_int64_partition/")
...
ArrowInvalid: error parsing '3760212050' as scalar of type int32
In ../src/arrow/scalar.cc, line 333, code: VisitTypeInline(*type_, this)
In ../src/arrow/dataset/partition.cc, line 218, code: (_error_or_value26).status()
In ../src/arrow/dataset/partition.cc, line 229, code: (_error_or_value27).status()
In ../src/arrow/dataset/discovery.cc, line 256, code: (_error_or_value17).status()

In [36]: pq.read_table("test_int64_partition/", use_legacy_dataset=True)
Out[36]: 
pyarrow.Table
col: int64
part: dictionary<values=int64, indices=int32, ordered=0>
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)