You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Luke Cwik <lc...@google.com> on 2019/10/11 16:38:07 UTC

Python thread pool executor for Apache Beam

I'm looking for a thread pool that re-uses threads that are idle before
creating new ones and has an API that is compatible with the
concurrent.futures ThreadPoolExecutor[1].

To my knowledge, the concurrent.futures ThreadPool creates new threads for
tasks up until the thread pool limit before re-using existing ones for all
Python versions prior to 3.8.

I tried using CollapsingThreadPoolExecutor within pr/9477[2] but after
testing it with Apache Beam, I found that it has some pool shutdown
issues[3].

Does anyone have any suggestions for a good Python library that contains a
stable thread pool implementation?

Preferably the library that provides the thread pool would have no
dependencies and be compatible with the same Python versions that Apache
Beam is compatible with today.

1:
https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor
1: https://github.com/apache/beam/pull/9477
2: https://github.com/ftpsolutions/collapsing-thread-pool-executor/issues/3

Re: Python thread pool executor for Apache Beam

Posted by Luke Cwik <lc...@google.com>.
Many tests rely on running a pipeline or invoking a portion of the codebase
which instantiates one of these thread pools which leads to each test
having this 1s wait which becomes even worse since we run so many variants
(py2, py2gcp, py35, p35gcp, ...) of the tests. I think we'll need to have
the fix upstream first.

On Fri, Oct 11, 2019 at 12:01 PM Robert Bradshaw <ro...@google.com>
wrote:

> Can we use a lower default timeout to mitigate this issue in the short
> term (I'd imagine one second or possibly smaller would be sufficient
> for our use), and get a fix upstream in the long term?
>
> On Fri, Oct 11, 2019 at 9:38 AM Luke Cwik <lc...@google.com> wrote:
> >
> > I'm looking for a thread pool that re-uses threads that are idle before
> creating new ones and has an API that is compatible with the
> concurrent.futures ThreadPoolExecutor[1].
> >
> > To my knowledge, the concurrent.futures ThreadPool creates new threads
> for tasks up until the thread pool limit before re-using existing ones for
> all Python versions prior to 3.8.
> >
> > I tried using CollapsingThreadPoolExecutor within pr/9477[2] but after
> testing it with Apache Beam, I found that it has some pool shutdown
> issues[3].
> >
> > Does anyone have any suggestions for a good Python library that contains
> a stable thread pool implementation?
> >
> > Preferably the library that provides the thread pool would have no
> dependencies and be compatible with the same Python versions that Apache
> Beam is compatible with today.
> >
> > 1:
> https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor
> > 1: https://github.com/apache/beam/pull/9477
> > 2:
> https://github.com/ftpsolutions/collapsing-thread-pool-executor/issues/3
>

Re: Python thread pool executor for Apache Beam

Posted by Robert Bradshaw <ro...@google.com>.
Can we use a lower default timeout to mitigate this issue in the short
term (I'd imagine one second or possibly smaller would be sufficient
for our use), and get a fix upstream in the long term?

On Fri, Oct 11, 2019 at 9:38 AM Luke Cwik <lc...@google.com> wrote:
>
> I'm looking for a thread pool that re-uses threads that are idle before creating new ones and has an API that is compatible with the concurrent.futures ThreadPoolExecutor[1].
>
> To my knowledge, the concurrent.futures ThreadPool creates new threads for tasks up until the thread pool limit before re-using existing ones for all Python versions prior to 3.8.
>
> I tried using CollapsingThreadPoolExecutor within pr/9477[2] but after testing it with Apache Beam, I found that it has some pool shutdown issues[3].
>
> Does anyone have any suggestions for a good Python library that contains a stable thread pool implementation?
>
> Preferably the library that provides the thread pool would have no dependencies and be compatible with the same Python versions that Apache Beam is compatible with today.
>
> 1: https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor
> 1: https://github.com/apache/beam/pull/9477
> 2: https://github.com/ftpsolutions/collapsing-thread-pool-executor/issues/3