You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/08/19 16:57:31 UTC

[GitHub] [spark] akirillov opened a new pull request #25500: [MESOS] Fixed executors advertised address when running in virtual network

akirillov opened a new pull request #25500: [MESOS] Fixed executors advertised address when running in virtual network
URL: https://github.com/apache/spark/pull/25500
 
 
   ### What changes were proposed in this pull request?
   This patch fixes a bug which occurs when shuffle jobs are launched by Mesos in a virtual network. Mesos scheduler sets executor `--hostname` parameter to `0.0.0.0` in the case when `spark.mesos.network.name` is provided. This makes executors use `0.0.0.0` as their advertised address and, in the presence of shuffle, executors fail to fetch shuffle blocks from each other using `0.0.0.0` as the origin. When a virtual network is used the hostname or IP address is not known upfront and assigned to a container at its start time so the executor process needs to advertise the correct dynamically assigned address to be reachable by other executors.
   
   Changes:
   - added a fallback to `Utils.localHostName()` in Spark Executors when `--hostname` is not provided
   - removed setting executor address to `0.0.0.0` from Mesos scheduler
   - refactored the code related to building executor command in Mesos scheduler
   - added network configuration support to Docker containerizer
   - added unit tests
   
   ### Why are the changes needed?
   The bug described above prevents Mesos users from running any jobs which involve shuffle due to the inability of executors to fetch shuffle blocks because of incorrect advertised address when virtual network is used.
   
   ### Does this PR introduce any user-facing change?
   No
   
   ### How was this patch tested?
   - added unit test to `MesosCoarseGrainedSchedulerBackendSuite` which verifies the absence of `--hostname` parameter  when `spark.mesos.network.name` is provided and its presence otherwise
   - added unit test to `MesosSchedulerBackendUtilSuite` which verifies that `MesosSchedulerBackendUtil.buildContainerInfo` sets network-related properties for Docker containerizer
   - integration tests from [DCOS Spark repo](https://github.com/mesosphere/spark-build), more specifically - [test_spark_cni.py](https://github.com/mesosphere/spark-build/blob/master/tests/test_spark_cni.py) which runs a specific [shuffle job](https://github.com/mesosphere/spark-build/blob/master/tests/jobs/scala/src/main/scala/ShuffleApp.scala) and verifies job successful completion, Mesos task network configuration, and ip addresses for both Mesos and Docker containerizers

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org