You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/02/25 02:37:00 UTC

[jira] [Created] (ARROW-15784) [C++][Python] Parallel parquet file reading disabled with single file reads

Weston Pace created ARROW-15784:
-----------------------------------

             Summary: [C++][Python] Parallel parquet file reading disabled with single file reads
                 Key: ARROW-15784
                 URL: https://issues.apache.org/jira/browse/ARROW-15784
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++, Python
    Affects Versions: 7.0.0
            Reporter: Weston Pace
            Assignee: Weston Pace
             Fix For: 7.0.1


There is a flag {{enable_parallel_column_conversion}} which was passed down from python to C++ when reading parquet datasets which controlled whether we would read columns in parallel.  This was allowed for single files but not for reading multiple files.  This was an old check to help prevent nested deadlock.

Nested deadlock is no longer an issue and the flag was mostly inert once we removed the synchronous scanner.

Unfortunately, when we removed the synchronous scanner we forgot to remove this flag and the result was that a single-file read ended up disabling parallelism.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)