You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/02/25 02:37:00 UTC
[jira] [Created] (ARROW-15784) [C++][Python] Parallel parquet file reading disabled with single file reads
Weston Pace created ARROW-15784:
-----------------------------------
Summary: [C++][Python] Parallel parquet file reading disabled with single file reads
Key: ARROW-15784
URL: https://issues.apache.org/jira/browse/ARROW-15784
Project: Apache Arrow
Issue Type: Improvement
Components: C++, Python
Affects Versions: 7.0.0
Reporter: Weston Pace
Assignee: Weston Pace
Fix For: 7.0.1
There is a flag {{enable_parallel_column_conversion}} which was passed down from python to C++ when reading parquet datasets which controlled whether we would read columns in parallel. This was allowed for single files but not for reading multiple files. This was an old check to help prevent nested deadlock.
Nested deadlock is no longer an issue and the flag was mostly inert once we removed the synchronous scanner.
Unfortunately, when we removed the synchronous scanner we forgot to remove this flag and the result was that a single-file read ended up disabling parallelism.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)