You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/07/13 17:49:02 UTC

[GitHub] [beam] kw2542 removed a comment on pull request #15081: [BEAM-12503] Process Support for ExternalWorkerService

kw2542 removed a comment on pull request #15081:
URL: https://github.com/apache/beam/pull/15081#issuecomment-879281777


   > Each Python SDK process instance is capable of running multiple work items in parallel already. The issue is that the Python GIL will limit it to use a single CPU core which is why multiple Python SDK process instances are launched. Whether they are launched by boot.go or someone else isn't too important. The prepare step sounds great for the external pool mode as well since that is what we want for docker for Apache Beam as well.
   > […](#)
   > On Mon, Jul 12, 2021 at 11:39 AM Ke Wu ***@***.***> wrote: I am curious why artifact staging does not work with threads? I wonder if we should fix that instead of introducing yet more complexity to this already complex API. In Python, I thought we used processes instead of threads because of the GIL. But Java has no GIL, so I'm not sure there is an advantage to using processes. Using threads still makes sense for IO bound tasks in Python since Python can parallelize IO effectively. Python's GIL is problematic for CPU bound tasks. @lukecwik <https://github.com/lukecwik> @ibzib <https://github.com/ibzib> Correct me if I am wrong, my understanding here is that we use process mode mainly because we can simplify the workflow by reusing the boot executable, which can only be executed in a sub process instead of thread. In addition, the boot executable starts the actual worker in a sub process too. It is true that we may implement a new workflow to support thread mode instead of relyin
 g boot executable but it could be much more significant work, let me know if you think it is worth the effort. In addition, I am wondering if we could add a prepare step in external pool mode, then we may not need to run artifact staging for each start worker request then. WDYT. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <[#15081 (comment)](https://github.com/apache/beam/pull/15081#issuecomment-878505102)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACM4V3DCDIPDOMUTYY4IWT3TXMZF7ANCNFSM47JB6KQA> .
   
   Is your suggestion to stick with thread mode in Java and implement prepare/artifact staging separately from the existing boot script ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org