You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2021/05/25 23:13:00 UTC

[jira] [Created] (ARROW-12879) [C++] Thread pool leaks memory when forking (and could maybe deadlock) if threads exist at the time of fork

Weston Pace created ARROW-12879:
-----------------------------------

             Summary: [C++] Thread pool leaks memory when forking (and could maybe deadlock) if threads exist at the time of fork
                 Key: ARROW-12879
                 URL: https://issues.apache.org/jira/browse/ARROW-12879
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
    Affects Versions: 4.0.0
            Reporter: Weston Pace


While working on ARROW-12878 I have made the leak more obvious.  When we fork we cannot delete any remaining std::thread.  In addition, we cannot safely use any mutexes that might have been claimed by child threads.

 

The existing implementation works around this by creating a new ThreadPool::State instance.  However, shared_ptr's to the old instance are still held by (now defunct) std::thread instances and so the state object will never be deleted (valgrind confirms this).

 

Furthermore, if the fork were to happen while a thread task was running and had captured some mutex (e.g. any of the ones used in the datasets API) then that mutex will never be released.

 

A more correct workaround would be to hook into pthread_atfork and shut down all threads (don't have to wait for all jobs to complete), forking, then restarting all the threads on BOTH the child and the parent (today we restart on just the child and we leave the parent running).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)