You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/03/26 13:10:00 UTC

[jira] [Updated] (ARROW-12097) [C++] Modify BackgroundGenerator so it creates fewer threads

     [ https://issues.apache.org/jira/browse/ARROW-12097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated ARROW-12097:
-----------------------------------
    Labels: pull-request-available  (was: )

> [C++] Modify BackgroundGenerator so it creates fewer threads
> ------------------------------------------------------------
>
>                 Key: ARROW-12097
>                 URL: https://issues.apache.org/jira/browse/ARROW-12097
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Weston Pace
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The current implementation creates a thread per block and in the CSV reader this hurts performance just a bit.  However, in the IPC reader this hurts performance even more.
> Instead the readahead can move inside the background generator and the background generator task can keep running until the queue fills up and then restart when the queue has drained enough for a substantial amount of work to be done.
> In my test CSV case this dropped the # of thread tasks created from ~2.5k to ~100.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)