You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2022/09/08 08:37:00 UTC

[jira] [Resolved] (ARROW-17583) [Python] File write visitor throws exception on large parquet file

     [ https://issues.apache.org/jira/browse/ARROW-17583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joris Van den Bossche resolved ARROW-17583.
-------------------------------------------
    Fix Version/s: 10.0.0
       Resolution: Fixed

Issue resolved by pull request 14032
[https://github.com/apache/arrow/pull/14032]

> [Python] File write visitor throws exception on large parquet file
> ------------------------------------------------------------------
>
>                 Key: ARROW-17583
>                 URL: https://issues.apache.org/jira/browse/ARROW-17583
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 9.0.0
>            Reporter: Joost Hoozemans
>            Assignee: Joost Hoozemans
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 10.0.0
>
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When writing a large parquet file (e.g. 5GB) using pyarrow.dataset, it throws an exception:
> Traceback (most recent call last):
>   File "pyarrow/_dataset_parquet.pyx", line 165, in pyarrow._dataset_parquet.ParquetFileFormat._finish_write
>   File "pyarrow/{_}dataset.pyx", line 2695, in pyarrow._dataset.WrittenFile.{_}{_}init{_}_
> OverflowError: value too large to convert to int
> Exception ignored in: 'pyarrow._dataset._filesystemdataset_write_visitor'
> The file is written succesfully though. It seems related to this issue https://issues.apache.org/jira/browse/ARROW-16761.
> I would guess the problem is the python field is an int while the C++ code returns an int64_t [https://github.com/apache/arrow/pull/13338/files#diff-4f2eb12337651b45bab2b03abe2552dd7fc9958b1fbbeb09a2a488804b097109R164] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)