You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2019/08/19 19:34:00 UTC

[jira] [Updated] (ARROW-5825) [Python] Exceptions swallowed in ParquetManifest._visit_directories

     [ https://issues.apache.org/jira/browse/ARROW-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney updated ARROW-5825:
--------------------------------
    Fix Version/s:     (was: 1.0.0)

> [Python] Exceptions swallowed in ParquetManifest._visit_directories
> -------------------------------------------------------------------
>
>                 Key: ARROW-5825
>                 URL: https://issues.apache.org/jira/browse/ARROW-5825
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: George Sakkis
>            Priority: Major
>              Labels: parquet
>
> {{ParquetManifest._visit_directories}} uses a {{ThreadPoolExecutor}} to visit partitioned parquet datasets concurrently, it waits for them to finish but doesn't check if the respective futures have failed or not. This is quite tricky to detect and debug as an exception is either raised later as a a side-effect or (perhaps worse) it passes silently.
> Observed on 0.12.1 but appears to be on latest master too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)