You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gerashegalov <gi...@git.apache.org> on 2018/01/19 06:20:28 UTC

[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

GitHub user gerashegalov opened a pull request:

    https://github.com/apache/spark/pull/20327

    [SPARK-12963][CORE] NM host for driver end points

    ## What changes were proposed in this pull request?
    
    Driver end points on YARN in the cluster mode are potentially bound to incorrect network interfaces because the host name is not retrieved from YARN as in the executor container case.
    
    ## How was this patch tested?
    
    On a cluster previously experiencing `Service 'Driver' failed after 16 retries  (on a random free port) ...`

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gerashegalov/spark gera/driver-host-on-in-yarn-cluster-mode

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20327.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20327
    
----
commit 016323a8d163e31052776a50590b47d9a38b6cdb
Author: Gera Shegalov <ge...@...>
Date:   2018-01-19T06:05:32Z

    [SPARK-12963][CORE] NM host for driver end points

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r175545093
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
    @@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
       def bind(): Unit = {
         assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!")
         try {
    -      val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
    +      val host = if (Utils.isClusterMode(conf)) {
    --- End diff --
    
    Also, same question as before: why not just leave the code as is? It will still work on YARN.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r175873355
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
    @@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
       def bind(): Unit = {
         assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!")
         try {
    -      val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
    +      val host = if (Utils.isClusterMode(conf)) {
    --- End diff --
    
    You can set `SPARK_LOCAL_IP`.
    
    If you really want to change this, you must not change the current default behavior, which is to bind to all interfaces. If you manage to do that without doing cluster-specific checks, then I'm ok with it. But none of your current solutions really pass that bar.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    **[Test build #88409 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88409/testReport)** for PR 20327 at commit [`b470e93`](https://github.com/apache/spark/commit/b470e93030b1a754fffe90e471aa0a6ef9bfd9f2).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov closed the pull request at:

    https://github.com/apache/spark/pull/20327


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88409/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    **[Test build #88142 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88142/testReport)** for PR 20327 at commit [`ae4ad4a`](https://github.com/apache/spark/commit/ae4ad4a7568cf5845861237d848468c4dc8cf840).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r175544171
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
    @@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
       def bind(): Unit = {
         assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!")
         try {
    -      val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
    +      val host = if (Utils.isClusterMode(conf)) {
    --- End diff --
    
    I can make the change YARN-only. 
    However, I'd like to point out I don't see any documentation stating `0.0.0.0` as the default.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88283/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88514/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r176827527
  
    --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
         checkResult(finalState, result)
       }
     
    +  private def testClusterDriverBind(
    +      uiEnabled: Boolean,
    +      localHost: String,
    +      localIp: String,
    +      success: Boolean): Unit = {
    +    val result = File.createTempFile("result", null, tempDir)
    +    val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass),
    +      appArgs = Seq(result.getAbsolutePath()),
    +      extraConf = Map(
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
    +        "spark.ui.enabled" -> uiEnabled.toString
    +      ))
    +    if (success) {
    +      checkResult(finalState, result, "success")
    +    } else {
    +      finalState should be (SparkAppHandle.State.FAILED)
    +    }
    +  }
    +
    +  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
    --- End diff --
    
    Yes, we are, sorry. I'm referring to your change in YarnRMClient.scala.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r176822894
  
    --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
         checkResult(finalState, result)
       }
     
    +  private def testClusterDriverBind(
    +      uiEnabled: Boolean,
    +      localHost: String,
    +      localIp: String,
    +      success: Boolean): Unit = {
    +    val result = File.createTempFile("result", null, tempDir)
    +    val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass),
    +      appArgs = Seq(result.getAbsolutePath()),
    +      extraConf = Map(
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
    +        "spark.ui.enabled" -> uiEnabled.toString
    +      ))
    +    if (success) {
    +      checkResult(finalState, result, "success")
    +    } else {
    +      finalState should be (SparkAppHandle.State.FAILED)
    +    }
    +  }
    +
    +  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
    --- End diff --
    
    correct, SPARK_LOCAL_IP/HOSTNAME and SPARK_PUBLIC_DNS play a role in how tracking URL is generated


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r173611826
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
    @@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
       def bind(): Unit = {
         assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!")
         try {
    -      val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
    +      val host = if (Utils.isClusterMode(conf)) {
    --- End diff --
    
    This is the same issue as with the RPC address. the preference should be given to the setting by YARN admins. Opening a port on an expected network is an additional vulnerability. That said, YARN's default is also 0.0.0.0 which users will get with this patch as well.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    **[Test build #87866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87866/testReport)** for PR 20327 at commit [`a674863`](https://github.com/apache/spark/commit/a674863b8243fddc065270d292a6be18e38fbc30).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r176597452
  
    --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
         checkResult(finalState, result)
       }
     
    +  private def testClusterDriverBind(
    +      uiEnabled: Boolean,
    +      localHost: String,
    +      localIp: String,
    +      success: Boolean): Unit = {
    +    val result = File.createTempFile("result", null, tempDir)
    +    val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass),
    +      appArgs = Seq(result.getAbsolutePath()),
    +      extraConf = Map(
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
    +        "spark.ui.enabled" -> uiEnabled.toString
    +      ))
    +    if (success) {
    +      checkResult(finalState, result, "success")
    +    } else {
    +      finalState should be (SparkAppHandle.State.FAILED)
    +    }
    +  }
    +
    +  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
    --- End diff --
    
    > it assumes that both the RM and the Spark app have the same configuration w.r.t. which interfaces they're binding to.
    RM is not actively involved here. The driver executes on the NM. The launch context of the driver prescribes to assign `SPARK_LOCAL_IP` and `SPARK_LOCAL_HOSTNAME` on the worker node. Then AM rpcs the tracking URL computed on the NM back to RM.  
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    cc @vanzin since you are expert on this topic.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88142/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r171381824
  
    --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -123,6 +123,10 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
         checkResult(finalState, result)
       }
     
    +  test("yarn-cluster should use NM's public address instead  of SPARK_LOCAL_*") {
    +    testBasicYarnApp(false, conf = Map("spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> "1.1.1.1"))
    --- End diff --
    
    1.1.1.1 is a valid IP address (try it out, ping it!); you should probably use something invalid here instead (like "0.1.1.1", which you shouldn't be able to use as a target IP address).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r169257050
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -744,7 +744,9 @@ object SparkSubmit extends CommandLineUtils with Logging {
         }
     
         // Ignore invalid spark.driver.host in cluster modes.
    -    if (deployMode == CLUSTER) {
    +    if (isYarnCluster) {
    +      sparkConf.set("spark.driver.host", "${env:NM_HOST}")
    --- End diff --
    
    Sorry for the delay. I verified that moving setting `${env:NM_HOST}` to `yarn/Client.scala` works. However, the most robust method is to use the YARN cluster-side config by replicating how NodeManager determines the public address. I added a unit test failing before the PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87789/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r173031021
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -68,6 +69,20 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends
     
       private val securityMgr = new SecurityManager(sparkConf)
     
    +  private val yarnConf = new YarnConfiguration(SparkHadoopUtil.newConfiguration(sparkConf))
    +
    +  if (isClusterMode) {
    +    // If the public address of NM does not adhere Utils.localCanonicalHostname
    +    // there is no way to correctly configure it from the client. To this end,
    +    // this logic replicates the way NM determines the address to bind listeners to
    +    // from yarnConf. It has to be executed on the NM node itself in order to resolve correctly.
    +    //
    --- End diff --
    
    nit: remove this line


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    **[Test build #88478 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88478/testReport)** for PR 20327 at commit [`e10dea3`](https://github.com/apache/spark/commit/e10dea332023b5a435ee0ea277347fcfed38adfd).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    **[Test build #88514 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88514/testReport)** for PR 20327 at commit [`73f9df2`](https://github.com/apache/spark/commit/73f9df224246e04163847419ef082278014c6a2a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    **[Test build #87789 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87789/testReport)** for PR 20327 at commit [`0c4ed66`](https://github.com/apache/spark/commit/0c4ed6612f1d75d73aeb862f4f373a62d343db3c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r171748638
  
    --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -123,6 +123,10 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
         checkResult(finalState, result)
       }
     
    +  test("yarn-cluster should use NM's public address instead  of SPARK_LOCAL_*") {
    +    testBasicYarnApp(false, conf = Map("spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> "1.1.1.1"))
    --- End diff --
    
    They are both valid addresses. `ping`-ability is orthogonal. the point was that you can't bind to it.
    ```
    $ nc -l 1.1.1.1 4040
    nc: Can't assign requested address
    ```
    your comment inspired me though to use 0.0.0.1 which can be used only as a [source address](https://tools.ietf.org/rfc/rfc3330.txt) to signify the issue .
     


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r176817667
  
    --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
         checkResult(finalState, result)
       }
     
    +  private def testClusterDriverBind(
    +      uiEnabled: Boolean,
    +      localHost: String,
    +      localIp: String,
    +      success: Boolean): Unit = {
    +    val result = File.createTempFile("result", null, tempDir)
    +    val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass),
    +      appArgs = Seq(result.getAbsolutePath()),
    +      extraConf = Map(
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
    +        "spark.ui.enabled" -> uiEnabled.toString
    +      ))
    +    if (success) {
    +      checkResult(finalState, result, "success")
    +    } else {
    +      finalState should be (SparkAppHandle.State.FAILED)
    +    }
    +  }
    +
    +  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
    --- End diff --
    
    Isn't that the `trackingUrl` which is a separate parameter to the call?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    **[Test build #88142 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88142/testReport)** for PR 20327 at commit [`ae4ad4a`](https://github.com/apache/spark/commit/ae4ad4a7568cf5845861237d848468c4dc8cf840).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r176584238
  
    --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
         checkResult(finalState, result)
       }
     
    +  private def testClusterDriverBind(
    +      uiEnabled: Boolean,
    +      localHost: String,
    +      localIp: String,
    +      success: Boolean): Unit = {
    +    val result = File.createTempFile("result", null, tempDir)
    +    val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass),
    +      appArgs = Seq(result.getAbsolutePath()),
    +      extraConf = Map(
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
    +        "spark.ui.enabled" -> uiEnabled.toString
    +      ))
    +    if (success) {
    +      checkResult(finalState, result, "success")
    +    } else {
    +      finalState should be (SparkAppHandle.State.FAILED)
    +    }
    +  }
    +
    +  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
    --- End diff --
    
    > So I'm not sure the tests are actually that useful. The way they're written, as yarn apps, actually makes them very expensive, and this is testing basic networking config that we know will not work if the IPs are invalid.
    
    I agree the tests are not cheap. However, they show 
    - the regression of not being able to bind webUI to a specific interface is fixed 
    - they demonstrate how to bind RPC and webUI to different interfaces
    
    > The actual change you're introducing - using the bind address as the address of the NM, is actually wrong if you think about it. It just happens that the default value of that config is the local host name.
    
    It's not wrong. it's one of the reasonable choices. Moreover it's one that is consistent with the setting executors use to bind RPC. Obviously there can be others.
    
    > So basically if setting those env variables in the AM fixes the issue I'm not sure there's any need to change anything in Spark.
    
    We need the fix in JettyUtils. After thinking more the change YarnRMClient.scala should use NM_HOST for registerApplicationMaster because it's the right host part of YARN's NodeId.
    
    
    
     
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r175629947
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
    @@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
       def bind(): Unit = {
         assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!")
         try {
    -      val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
    +      val host = if (Utils.isClusterMode(conf)) {
    --- End diff --
    
    There's a big different between the RPC endpoint and the web UI.
    
    For the RPC endpoint, your change is fixing a potential issue. For the web UI, this change is potentially introducing a new issue in non-YARN mode. So please, revert this one change and all the rest is good to go.
    
    Otherwise I can't merge this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r170682155
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -115,6 +116,19 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends
         }
       }
     
    +  if (isClusterMode) {
    +    val nmHostPort = WebAppUtils.getWebAppBindURL(yarnConf, YarnConfiguration.NM_BIND_HOST,
    --- End diff --
    
    Can this be done above where the configuration is prepared? Then you wouldn't need to manually add this to the system properties since it would be done by the existing code.
    
    Also it would be good to add a comment explaining why this is needed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r173611565
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala ---
    @@ -342,13 +342,13 @@ private[spark] object JettyUtils extends Logging {
               -1,
               -1,
               connectionFactories: _*)
    +        connector.setHost(hostName)
             connector.setPort(port)
             connector.start()
     
             // Currently we only use "SelectChannelConnector"
             // Limit the max acceptor number to 8 so that we don't waste a lot of threads
             connector.setAcceptQueueSize(math.min(connector.getAcceptors, 8))
    --- End diff --
    
    good point


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r175623955
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
    @@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
       def bind(): Unit = {
         assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!")
         try {
    -      val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
    +      val host = if (Utils.isClusterMode(conf)) {
    --- End diff --
    
    I formulated the problem more broadly in the title of the PR: "NM host for driver end points". It's not a intuitive default behavior to bind to `0.0.0.0` if the backend (YARN) is explicitly configured not to, and we need a mechanism to allow Spark to inherit the YARN-determined bind address on NM.
    You convinced me that the client mode is less critical, and it's easy to override via environment of spark submit (after the bug fix). Although I'd prefer using bindAddress everywhere for consistency and as it's documented.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r176599395
  
    --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
         checkResult(finalState, result)
       }
     
    +  private def testClusterDriverBind(
    +      uiEnabled: Boolean,
    +      localHost: String,
    +      localIp: String,
    +      success: Boolean): Unit = {
    +    val result = File.createTempFile("result", null, tempDir)
    +    val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass),
    +      appArgs = Seq(result.getAbsolutePath()),
    +      extraConf = Map(
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
    +        "spark.ui.enabled" -> uiEnabled.toString
    +      ))
    +    if (success) {
    +      checkResult(finalState, result, "success")
    +    } else {
    +      finalState should be (SparkAppHandle.State.FAILED)
    +    }
    +  }
    +
    +  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
    --- End diff --
    
    > RM is not actively involved here. 
    
    Right, I meant NM. The rest of the comment for why using Spark's bind address is not correct is still valid though.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r175533480
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
    @@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
       def bind(): Unit = {
         assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!")
         try {
    -      val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
    +      val host = if (Utils.isClusterMode(conf)) {
    --- End diff --
    
    Sorry, but this is still bothering me. It probably works fine in YARN mode because the UI is proxied by the RM. But that's not true for any other mode, and the default before was to bind to `0.0.0.0` and now it is binding to a specific hostname. And that may not be the public one in a multi-homed machine, so now the UI wouldn't be accessible without extra configuration.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r171749148
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -79,6 +80,19 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends
     
       private val yarnConf = new YarnConfiguration(SparkHadoopUtil.newConfiguration(sparkConf))
     
    +  if (isClusterMode) {
    +    // this logic replicates the way YARN NM determines the address to bind listeners to
    --- End diff --
    
    please review my updated comment


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    **[Test build #88283 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88283/testReport)** for PR 20327 at commit [`ae4ad4a`](https://github.com/apache/spark/commit/ae4ad4a7568cf5845861237d848468c4dc8cf840).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    **[Test build #88409 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88409/testReport)** for PR 20327 at commit [`b470e93`](https://github.com/apache/spark/commit/b470e93030b1a754fffe90e471aa0a6ef9bfd9f2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r176585546
  
    --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
         checkResult(finalState, result)
       }
     
    +  private def testClusterDriverBind(
    +      uiEnabled: Boolean,
    +      localHost: String,
    +      localIp: String,
    +      success: Boolean): Unit = {
    +    val result = File.createTempFile("result", null, tempDir)
    +    val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass),
    +      appArgs = Seq(result.getAbsolutePath()),
    +      extraConf = Map(
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
    +        "spark.ui.enabled" -> uiEnabled.toString
    +      ))
    +    if (success) {
    +      checkResult(finalState, result, "success")
    +    } else {
    +      finalState should be (SparkAppHandle.State.FAILED)
    +    }
    +  }
    +
    +  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
    --- End diff --
    
    > the regression of not being able to bind webUI to a specific interface is fixed
    
    This can be a unit test for `JettyUtils` if you really care about that.
    
    > they demonstrate how to bind RPC and webUI to different interfaces
    
    This can be added to the docs. Nobody is going to look at unit test code to figure out how to do that.
    
    > It's not wrong. it's one of the reasonable choices. 
    
    It actually is, because it assumes that both the RM and the Spark app have the same configuration w.r.t. which interfaces they're binding to.
    
    But as you say, it's better to use `$NM_HOST` to find the NM's address instead of the current code.
    
    > We need the fix in JettyUtils
    
    There's a separate PR with that fix too (the one I referenced above was closed and the correct one was opened as #20883).



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r170682734
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -115,6 +116,19 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends
         }
       }
     
    +  if (isClusterMode) {
    +    val nmHostPort = WebAppUtils.getWebAppBindURL(yarnConf, YarnConfiguration.NM_BIND_HOST,
    +      WebAppUtils.getNMWebAppURLWithoutScheme(yarnConf))
    +    val (nmHost, _) = Utils.parseHostPort(nmHostPort)
    +
    +    Seq(DRIVER_BIND_ADDRESS, DRIVER_HOST_ADDRESS)
    --- End diff --
    
    `DRIVER_BIND_ADDRESS` is something that is not set by Spark anywhere, so it feels a little weird to override it. If the user is setting that explicitly, the code should assume the user knows what he's doing.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r175869822
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
    @@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
       def bind(): Unit = {
         assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!")
         try {
    -      val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
    +      val host = if (Utils.isClusterMode(conf)) {
    --- End diff --
    
    you should not force users to bind to unintended addresses without providing reasonable means to change that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    **[Test build #88318 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88318/testReport)** for PR 20327 at commit [`ae4ad4a`](https://github.com/apache/spark/commit/ae4ad4a7568cf5845861237d848468c4dc8cf840).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r175551377
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
    @@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
       def bind(): Unit = {
         assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!")
         try {
    -      val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
    +      val host = if (Utils.isClusterMode(conf)) {
    --- End diff --
    
    If we don't bind all listener sockets consistently I don't see a point in this PR (modulo bug fixes). This is supposed to simplify deployment in restricted environments and make it consistent with the documentation. If we leave one of this untreated we force users to keep `SPARK_LOCAL_IP`  in sync with related YARN settings.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r173030160
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala ---
    @@ -342,13 +342,13 @@ private[spark] object JettyUtils extends Logging {
               -1,
               -1,
               connectionFactories: _*)
    +        connector.setHost(hostName)
             connector.setPort(port)
             connector.start()
     
             // Currently we only use "SelectChannelConnector"
             // Limit the max acceptor number to 8 so that we don't waste a lot of threads
             connector.setAcceptQueueSize(math.min(connector.getAcceptors, 8))
    --- End diff --
    
    Show this also be moved before the `start()` call? (Unrelated to your change, but since you're here...)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r165193521
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -744,7 +744,9 @@ object SparkSubmit extends CommandLineUtils with Logging {
         }
     
         // Ignore invalid spark.driver.host in cluster modes.
    -    if (deployMode == CLUSTER) {
    +    if (isYarnCluster) {
    +      sparkConf.set("spark.driver.host", "${env:NM_HOST}")
    --- End diff --
    
    This actually requires `spark.sql.variable.substitute` set to `true`, though by default it is, if some user accidentally changes it, the code won't work.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r171379388
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -79,6 +80,19 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends
     
       private val yarnConf = new YarnConfiguration(SparkHadoopUtil.newConfiguration(sparkConf))
     
    +  if (isClusterMode) {
    +    // this logic replicates the way YARN NM determines the address to bind listeners to
    --- End diff --
    
    This explain what it does, but it should be explaining why it does it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r171382335
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
    @@ -126,7 +126,7 @@ private[spark] abstract class WebUI(
       def bind(): Unit = {
         assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!")
         try {
    -      val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
    +      val host = conf.get(DRIVER_BIND_ADDRESS)
    --- End diff --
    
    This actually changes behavior, because now you'll always be listening on a specific interface instead of "0.0.0.0".
    
    Try it out: run spark-shell in local mode; it should try to figure out your non-loopback address for the driver, but you should still be able to connect to "127.0.0.1:4040" in your browser. With your changes that will probably stop working.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r171747629
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -79,6 +80,19 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends
     
       private val yarnConf = new YarnConfiguration(SparkHadoopUtil.newConfiguration(sparkConf))
     
    +  if (isClusterMode) {
    +    // this logic replicates the way YARN NM determines the address to bind listeners to
    +    // from yarnConf
    +    //
    +    val nmHostPort = WebAppUtils.getWebAppBindURL(yarnConf, YarnConfiguration.NM_BIND_HOST,
    +      WebAppUtils.getNMWebAppURLWithoutScheme(yarnConf))
    +    val (nmHost, _) = Utils.parseHostPort(nmHostPort)
    +
    +    sparkConf.set(DRIVER_HOST_ADDRESS, nmHost)
    +    // propagate to the user class
    +    System.setProperty(DRIVER_HOST_ADDRESS.key, nmHost)
    --- End diff --
    
    good catch. done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    **[Test build #87866 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87866/testReport)** for PR 20327 at commit [`a674863`](https://github.com/apache/spark/commit/a674863b8243fddc065270d292a6be18e38fbc30).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r175544922
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
    @@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
       def bind(): Unit = {
         assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!")
         try {
    -      val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
    +      val host = if (Utils.isClusterMode(conf)) {
    --- End diff --
    
    Even if it's not explicitly documented, that's what the previous code did.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88478/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    **[Test build #88318 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88318/testReport)** for PR 20327 at commit [`ae4ad4a`](https://github.com/apache/spark/commit/ae4ad4a7568cf5845861237d848468c4dc8cf840).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r176598095
  
    --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
         checkResult(finalState, result)
       }
     
    +  private def testClusterDriverBind(
    +      uiEnabled: Boolean,
    +      localHost: String,
    +      localIp: String,
    +      success: Boolean): Unit = {
    +    val result = File.createTempFile("result", null, tempDir)
    +    val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass),
    +      appArgs = Seq(result.getAbsolutePath()),
    +      extraConf = Map(
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
    +        "spark.ui.enabled" -> uiEnabled.toString
    +      ))
    +    if (success) {
    +      checkResult(finalState, result, "success")
    +    } else {
    +      finalState should be (SparkAppHandle.State.FAILED)
    +    }
    +  }
    +
    +  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
    --- End diff --
    
    I will see what to do with this PR since #20883 is going faster


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    **[Test build #88478 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88478/testReport)** for PR 20327 at commit [`e10dea3`](https://github.com/apache/spark/commit/e10dea332023b5a435ee0ea277347fcfed38adfd).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r175938595
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
    @@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
       def bind(): Unit = {
         assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!")
         try {
    -      val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
    +      val host = if (Utils.isClusterMode(conf)) {
    --- End diff --
    
    Or, again, you can do nothing, leaving the existing code here, and nothing will be broken.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r176516514
  
    --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
         checkResult(finalState, result)
       }
     
    +  private def testClusterDriverBind(
    +      uiEnabled: Boolean,
    +      localHost: String,
    +      localIp: String,
    +      success: Boolean): Unit = {
    +    val result = File.createTempFile("result", null, tempDir)
    +    val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass),
    +      appArgs = Seq(result.getAbsolutePath()),
    +      extraConf = Map(
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
    +        "spark.ui.enabled" -> uiEnabled.toString
    +      ))
    +    if (success) {
    +      checkResult(finalState, result, "success")
    +    } else {
    +      finalState should be (SparkAppHandle.State.FAILED)
    +    }
    +  }
    +
    +  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
    --- End diff --
    
    `NM_HOST`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r176183580
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
    @@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
       def bind(): Unit = {
         assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!")
         try {
    -      val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
    +      val host = if (Utils.isClusterMode(conf)) {
    --- End diff --
    
    I am sorry, it does not make sense to me to treat RPC and http endpoints inconsistently. Therefore I am removing the logic borrowed from YARN for RPC. Now we have a simpler PR. We can achieve what I need either with `appMasterEnv.SPARK_LOCAL_IP` or cluster-side config. At  the same time we keep the prior behavior as you requested.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    I think you forgot to actually hit the "close" button...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r165496876
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -744,7 +744,9 @@ object SparkSubmit extends CommandLineUtils with Logging {
         }
     
         // Ignore invalid spark.driver.host in cluster modes.
    -    if (deployMode == CLUSTER) {
    +    if (isYarnCluster) {
    +      sparkConf.set("spark.driver.host", "${env:NM_HOST}")
    --- End diff --
    
    re: sql, this is not a sql conf object, so sql options have no effect.
    
    I think it would be better to have this in YARN's `Client.scala` instead, in the spirit of not adding more backend-specific logic to this class.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r176823881
  
    --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
         checkResult(finalState, result)
       }
     
    +  private def testClusterDriverBind(
    +      uiEnabled: Boolean,
    +      localHost: String,
    +      localIp: String,
    +      success: Boolean): Unit = {
    +    val result = File.createTempFile("result", null, tempDir)
    +    val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass),
    +      appArgs = Seq(result.getAbsolutePath()),
    +      extraConf = Map(
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
    +        "spark.ui.enabled" -> uiEnabled.toString
    +      ))
    +    if (success) {
    +      checkResult(finalState, result, "success")
    +    } else {
    +      finalState should be (SparkAppHandle.State.FAILED)
    +    }
    +  }
    +
    +  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
    --- End diff --
    
    Yes, but your change isn't touching that argument. I was wondering what is the effect of the code you're actually changing.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88318/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r173030832
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
    @@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
       def bind(): Unit = {
         assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!")
         try {
    -      val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
    +      val host = if (Utils.isClusterMode(conf)) {
    --- End diff --
    
    Is this change actually required by the rest of your changes? What is the advantage of binding the UI to a specific host instead of "0.0.0.0"?
    
    Leaving it as is would also avoid adding `isClusterMode`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r163491802
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -385,7 +386,13 @@ class SparkContext(config: SparkConf) extends Logging {
     
         // Set Spark driver host and port system properties. This explicitly sets the configuration
         // instead of relying on the default value of the config constant.
    -    _conf.set(DRIVER_HOST_ADDRESS, _conf.get(DRIVER_HOST_ADDRESS))
    +    _conf.set(DRIVER_HOST_ADDRESS,
    +      if (master == "yarn") {
    +        System.getenv(ApplicationConstants.Environment.NM_HOST.toString)
    --- End diff --
    
    The changes here will also affect yarn client mode.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    Thanks for catching the client mode issue @jerryshao . Please check updated PR.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    **[Test build #88514 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88514/testReport)** for PR 20327 at commit [`73f9df2`](https://github.com/apache/spark/commit/73f9df224246e04163847419ef082278014c6a2a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r175672010
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
    @@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
       def bind(): Unit = {
         assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!")
         try {
    -      val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
    +      val host = if (Utils.isClusterMode(conf)) {
    --- End diff --
    
    How about the compromise I was suggesting about making the webUI bind change only for YARN. YARN's default bind address at the NM is still `0.0.0.0` but if we change it at the YARN level we get it for both Spark rpc and webUI as well.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r176816848
  
    --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
         checkResult(finalState, result)
       }
     
    +  private def testClusterDriverBind(
    +      uiEnabled: Boolean,
    +      localHost: String,
    +      localIp: String,
    +      success: Boolean): Unit = {
    +    val result = File.createTempFile("result", null, tempDir)
    +    val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass),
    +      appArgs = Seq(result.getAbsolutePath()),
    +      extraConf = Map(
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
    +        "spark.ui.enabled" -> uiEnabled.toString
    +      ))
    +    if (success) {
    +      checkResult(finalState, result, "success")
    +    } else {
    +      finalState should be (SparkAppHandle.State.FAILED)
    +    }
    +  }
    +
    +  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
    --- End diff --
    
    The URL should have a reachable authority for proxying and direct use.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r171749008
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
    @@ -126,7 +126,7 @@ private[spark] abstract class WebUI(
       def bind(): Unit = {
         assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!")
         try {
    -      val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
    +      val host = conf.get(DRIVER_BIND_ADDRESS)
    --- End diff --
    
    This is a great point, which helped identify another bug where this `host` is entirely ignored. 
    
    reworked it to behave as was intended on the client node.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r176825696
  
    --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -136,6 +135,39 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
         checkResult(finalState, result)
       }
     
    +  private def testClusterDriverBind(
    +      uiEnabled: Boolean,
    +      localHost: String,
    +      localIp: String,
    +      success: Boolean): Unit = {
    +    val result = File.createTempFile("result", null, tempDir)
    +    val finalState = runSpark(false, mainClassName(YarnClusterDriver.getClass),
    +      appArgs = Seq(result.getAbsolutePath()),
    +      extraConf = Map(
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_HOSTNAME" -> localHost,
    +        "spark.yarn.appMasterEnv.SPARK_LOCAL_IP" -> localIp,
    +        "spark.ui.enabled" -> uiEnabled.toString
    +      ))
    +    if (success) {
    +      checkResult(finalState, result, "success")
    +    } else {
    +      finalState should be (SparkAppHandle.State.FAILED)
    +    }
    +  }
    +
    +  test("yarn-cluster driver should be able to bind listeners to MM_HOST") {
    --- End diff --
    
    Maybe we are talking about different parts of the patch. This thread is attached to my test code where I am demoing how this happens. It doesn't explicitly validate the generated URL's, only how bind works/fails.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    **[Test build #88283 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88283/testReport)** for PR 20327 at commit [`ae4ad4a`](https://github.com/apache/spark/commit/ae4ad4a7568cf5845861237d848468c4dc8cf840).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r175555284
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
    @@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
       def bind(): Unit = {
         assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!")
         try {
    -      val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
    +      val host = if (Utils.isClusterMode(conf)) {
    --- End diff --
    
    I thought the point of this PR is for the driver RPC endpoint to bind the correct address (since it doesn't support binding to 0.0.0.0)?
    
    I don't see how the http server changes that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87866/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    **[Test build #87789 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87789/testReport)** for PR 20327 at commit [`0c4ed66`](https://github.com/apache/spark/commit/0c4ed6612f1d75d73aeb862f4f373a62d343db3c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r171379498
  
    --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -79,6 +80,19 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends
     
       private val yarnConf = new YarnConfiguration(SparkHadoopUtil.newConfiguration(sparkConf))
     
    +  if (isClusterMode) {
    +    // this logic replicates the way YARN NM determines the address to bind listeners to
    +    // from yarnConf
    +    //
    +    val nmHostPort = WebAppUtils.getWebAppBindURL(yarnConf, YarnConfiguration.NM_BIND_HOST,
    +      WebAppUtils.getNMWebAppURLWithoutScheme(yarnConf))
    +    val (nmHost, _) = Utils.parseHostPort(nmHostPort)
    +
    +    sparkConf.set(DRIVER_HOST_ADDRESS, nmHost)
    +    // propagate to the user class
    +    System.setProperty(DRIVER_HOST_ADDRESS.key, nmHost)
    --- End diff --
    
    If you move the block in L77 after the code you're adding, you don't need this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    closing this PR since the bind bug is fixed, the rest is achievable per configuration. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r165206579
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -744,7 +744,9 @@ object SparkSubmit extends CommandLineUtils with Logging {
         }
     
         // Ignore invalid spark.driver.host in cluster modes.
    -    if (deployMode == CLUSTER) {
    +    if (isYarnCluster) {
    +      sparkConf.set("spark.driver.host", "${env:NM_HOST}")
    --- End diff --
    
    Good to know. It seems that substitute is unconditional in core for this `ConfigEntryWithDefaultString`. I liked the brevity of this approach. Pending @vanzin's feedback I can redo it in explicit code again.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on the issue:

    https://github.com/apache/spark/pull/20327
  
    It turns out `0.x.x.x` on Linux are bindable addresses unlike on my Mac. So changing it back to the original `1.1.1.1` that cannot be bound to on both Mac OS and Linux.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r175836822
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
    @@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
       def bind(): Unit = {
         assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!")
         try {
    -      val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
    +      val host = if (Utils.isClusterMode(conf)) {
    --- End diff --
    
    I'd rather not change it unless it's fixing a problem. I don't see a problem being fixed here. Also, we should avoid adding more and more cluster-specific logic.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points

Posted by gerashegalov <gi...@git.apache.org>.
Github user gerashegalov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20327#discussion_r175934086
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala ---
    @@ -126,7 +126,11 @@ private[spark] abstract class WebUI(
       def bind(): Unit = {
         assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!")
         try {
    -      val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0")
    +      val host = if (Utils.isClusterMode(conf)) {
    --- End diff --
    
    at best, the way you would do this is like this on each worker:
    ```
    <property>
    <name>yarn.nodemanager.admin-env</name>
    <value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX,SPARK_LOCAL_IP=$NM_HOST</value>
    </property>
    ```
    which builds on the fact that NM_HOST is defined earlier on the launch script or 
    or some other value or maybe 
    ```
    <property>
    <name>yarn.nodemanager.admin-env</name>
    <value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX,SPARK_LOCAL_IP=${yarn.nodemanager.bind-host}</value>
    </property>
    ```
    As previously mentioned if I have to break abstractions and intermix YARN worker settings with Spark environment (which will also unnecessarily passed to other apps)  the only thing we need from this patch is the fix of setting the hostname  in JettyUtils.scala


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org