You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Maximilian Michels (Jira)" <ji...@apache.org> on 2020/04/01 12:05:01 UTC

[jira] [Commented] (BEAM-8944) Python SDK harness performance degradation with UnboundedThreadPoolExecutor

    [ https://issues.apache.org/jira/browse/BEAM-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072693#comment-17072693 ] 

Maximilian Michels commented on BEAM-8944:
------------------------------------------

 !checkpoint-duration.png! 

yellow: 2.18.0 as released
turquoise: 2.18.0 with BEAM-8944 reverted

> Python SDK harness performance degradation with UnboundedThreadPoolExecutor
> ---------------------------------------------------------------------------
>
>                 Key: BEAM-8944
>                 URL: https://issues.apache.org/jira/browse/BEAM-8944
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-harness
>    Affects Versions: 2.18.0
>            Reporter: Yichi Zhang
>            Priority: Critical
>             Fix For: 2.18.0
>
>         Attachments: checkpoint-duration.png, profiling.png, profiling_one_thread.png, profiling_twelve_threads.png
>
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> We are seeing a performance degradation for python streaming word count load tests.
>  
> After some investigation, it appears to be caused by swapping the original ThreadPoolExecutor to UnboundedThreadPoolExecutor in sdk worker. Suspicion is that python performance is worse with more threads on cpu-bounded tasks.
>  
> A simple test for comparing the multiple thread pool executor performance:
>  
> {code:python}
> def test_performance(self):
>    def run_perf(executor):
>      total_number = 1000000
>      q = queue.Queue()
>     def task(number):
>        hash(number)
>        q.put(number + 200)
>        return number
>     t = time.time()
>      count = 0
>      for i in range(200):
>        q.put(i)
>     while count < total_number:
>        executor.submit(task, q.get(block=True))
>        count += 1
>      print('%s uses %s' % (executor, time.time() - t))
>    with UnboundedThreadPoolExecutor() as executor:
>      run_perf(executor)
>    with futures.ThreadPoolExecutor(max_workers=1) as executor:
>      run_perf(executor)
>    with futures.ThreadPoolExecutor(max_workers=12) as executor:
>      run_perf(executor)
> {code}
> Results:
> <apache_beam.utils.thread_pool_executor.UnboundedThreadPoolExecutor object at 0x7fab400dbe50> uses 268.160675049
>  <concurrent.futures.thread.ThreadPoolExecutor object at 0x7fab40096290> uses 79.904583931
>  <concurrent.futures.thread.ThreadPoolExecutor object at 0x7fab400dbe50> uses 191.179054976
>  ```
> Profiling:
> UnboundedThreadPoolExecutor:
>  !profiling.png! 
> 1 Thread ThreadPoolExecutor:
>  !profiling_one_thread.png! 
> 12 Threads ThreadPoolExecutor:
>  !profiling_twelve_threads.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)