You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/10/15 19:30:00 UTC

[jira] [Resolved] (ARROW-10145) [C++][Dataset] Assert integer overflow in partitioning falls back to string

     [ https://issues.apache.org/jira/browse/ARROW-10145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joris Van den Bossche resolved ARROW-10145.
-------------------------------------------
    Fix Version/s:     (was: 3.0.0)
                   2.0.0
       Resolution: Fixed

Issue resolved by pull request 8462
[https://github.com/apache/arrow/pull/8462]

> [C++][Dataset] Assert integer overflow in partitioning falls back to string
> ---------------------------------------------------------------------------
>
>                 Key: ARROW-10145
>                 URL: https://issues.apache.org/jira/browse/ARROW-10145
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Joris Van den Bossche
>            Assignee: Ben Kietzman
>            Priority: Major
>              Labels: dataset, pull-request-available
>             Fix For: 2.0.0
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> From https://stackoverflow.com/questions/64137664/how-to-override-type-inference-for-partition-columns-in-hive-partitioned-dataset
> Small reproducer:
> {code}
> import pyarrow as pa
> import pyarrow.parquet as pq
> table = pa.table({'part': [3760212050]*10, 'col': range(10)})
> pq.write_to_dataset(table, "test_int64_partition", partition_cols=['part'])
> In [35]: pq.read_table("test_int64_partition/")
> ...
> ArrowInvalid: error parsing '3760212050' as scalar of type int32
> In ../src/arrow/scalar.cc, line 333, code: VisitTypeInline(*type_, this)
> In ../src/arrow/dataset/partition.cc, line 218, code: (_error_or_value26).status()
> In ../src/arrow/dataset/partition.cc, line 229, code: (_error_or_value27).status()
> In ../src/arrow/dataset/discovery.cc, line 256, code: (_error_or_value17).status()
> In [36]: pq.read_table("test_int64_partition/", use_legacy_dataset=True)
> Out[36]: 
> pyarrow.Table
> col: int64
> part: dictionary<values=int64, indices=int32, ordered=0>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)