You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "David Li (Jira)" <ji...@apache.org> on 2021/04/02 21:31:00 UTC

[jira] [Resolved] (ARROW-12161) [C++] Async streaming CSV reader deadlocking when being run synchronously from datasets

     [ https://issues.apache.org/jira/browse/ARROW-12161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Li resolved ARROW-12161.
------------------------------
    Fix Version/s: 4.0.0
       Resolution: Fixed

Issue resolved by pull request 9868
[https://github.com/apache/arrow/pull/9868]

> [C++] Async streaming CSV reader deadlocking when being run synchronously from datasets
> ---------------------------------------------------------------------------------------
>
>                 Key: ARROW-12161
>                 URL: https://issues.apache.org/jira/browse/ARROW-12161
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Weston Pace
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> ARROW-11887 added async to the streaming CSV reader.  In order to keep backwards compatibility the old sync API simply calls the async API and waits for it to finish.  However, that wait cannot happen safely in a "nested" context (e.g. dataset reading).
> For example, imagine two cores.  The dataset read launches two CSV scans.  Each scan occupies a core waiting for a future.  Those futures are being filled by I/O threads.  The I/O threads finish and go to transfer.  The transfer cannot happen because the CPU executor is filled.
> This will be fixed as part of ARROW-7001 but that still some ways away.  An easier change might be to take some of the 7001 changes and include them as part of the 11887 feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)