You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/12/18 21:02:03 UTC

[GitHub] [beam] psobot commented on pull request #13475: Do not add unnecessary experiment use_multiple_sdk_containers.

psobot commented on pull request #13475:
URL: https://github.com/apache/beam/pull/13475#issuecomment-748317475


   Hi @tvalentyn and @chamikaramj!
   
   > Dataflow service may still recognize --experiment=no_use_multiple_sdk_containers for some time but it is NOT RECOMMENDED to use this knob: in the future Dataflow may have better algorithms for deciding how many SDK containers to start, and specifying this knob may interfere with these algorithms.
   >
   > Users can control the number of cores on the VMs by setting an appropriate --machine_type. Note that there are custom machine types, where users can select number of cores and number of memory GBs, such as --machine_type=custom-1-13312-ext which will have 1 core and 13GB memory.
   
   While this is true (that it is possible to control the number of cores with `machine-type`), there are many situations in which it's desirable for a job to use multiple cores while processing one element for performance reasons. (e.g.: running ML inference within Dataflow.) Is there a proposed alternative for workloads that benefit from multi-core parallelism without multiple SDK workers?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org