You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cmccabe <gi...@git.apache.org> on 2014/05/22 00:49:09 UTC

[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

GitHub user cmccabe opened a pull request:

    https://github.com/apache/spark/pull/850

    SPARK-1898: In deploy.yarn.Client, use YarnClient not YarnClientImpl

    https://issues.apache.org/jira/browse/SPARK-1898

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cmccabe/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/850.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #850
    
----
commit 33e87eb4efc23156757e06fba4b79e4c0aed6903
Author: Colin Patrick Mccabe <cm...@cloudera.com>
Date:   2014-05-21T22:48:04Z

    SPARK-1898: In deploy.yarn.Client, use YarnClient rather than YarnClientImpl

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/850#issuecomment-45445910
  
    I've merged this - thanks @cmccabe definitely a good improvement!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/850#discussion_r12930693
  
    --- Diff: yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
    @@ -132,15 +134,19 @@ class Client(clientArgs: ClientArguments, hadoopConf: Configuration, spConf: Spa
       def submitApp(appContext: ApplicationSubmissionContext) = {
         // Submit the application to the applications manager.
         logInfo("Submitting application to ASM")
    -    super.submitApplication(appContext)
    +    yarnClient.submitApplication(appContext)
       }
     
    +  def getApplicationReport = yarnClient.getApplicationReport _
    --- End diff --
    
    I guess the same thing applies to stop() as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/850#discussion_r12930178
  
    --- Diff: yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
    @@ -132,15 +134,19 @@ class Client(clientArgs: ClientArguments, hadoopConf: Configuration, spConf: Spa
       def submitApp(appContext: ApplicationSubmissionContext) = {
         // Submit the application to the applications manager.
         logInfo("Submitting application to ASM")
    -    super.submitApplication(appContext)
    +    yarnClient.submitApplication(appContext)
       }
     
    +  def getApplicationReport = yarnClient.getApplicationReport _
    +
    +  def stop = yarnClient.stop _
    --- End diff --
    
    Should `yarnClient.stop` be called directly just like other yarnClient methods?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/850


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/850#issuecomment-44371936
  
    @tgravescs does this look okay to you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

Posted by cmccabe <gi...@git.apache.org>.
Github user cmccabe commented on the pull request:

    https://github.com/apache/spark/pull/850#issuecomment-44374458
  
    I should clarify that when I say "fixes the trunk build" I mean it fixes the trunk build when SPARK_YARN=1.  When not building Spark-on-YARN at all, you only need SPARK-1898


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/850#issuecomment-43828253
  
    +1 less private api usage yay!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

Posted by cmccabe <gi...@git.apache.org>.
Github user cmccabe commented on the pull request:

    https://github.com/apache/spark/pull/850#issuecomment-44374139
  
    So as I commented on the JIRA, using the public API here fixes Spark-on-YARN for CDH5.1.0.  That, plus the fact that this fixes the trunk (hadoop 3.0.0) build, is probably a good enough reason to do this, even if we didn't care about the deprecation issues.  If we don't have this in Spark 1.0, it's just not going to work against CDH 5.1.0.
    
    Also, all the YARN people I talked to were upset that we were using YarnClientImpl-- that class is not supposed to be used by outsiders at all...  it would be private, but we have the old "we want to put things from a single project in multiple java packages" issue.
    
    I still have not managed to figured out why this change is required to get cdh5.1.0 going.  I was pursuing a theory today that turned out to be wrong.  I also verified that both Hadoop 2.4 and a pre-release version of Hadoop 2.5 that I built work without this fix.
    
    Tomorrow I'm going to do a closer comparison of the working versus non-working versions.  CDH5.1.0 should be pretty close to the pre-release 2.5, so that might be a good place for me to look for differences.  Maybe turning up log4j on YARN will reveal something.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

Posted by cmccabe <gi...@git.apache.org>.
Github user cmccabe commented on a diff in the pull request:

    https://github.com/apache/spark/pull/850#discussion_r12930683
  
    --- Diff: yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
    @@ -132,15 +134,19 @@ class Client(clientArgs: ClientArguments, hadoopConf: Configuration, spConf: Spa
       def submitApp(appContext: ApplicationSubmissionContext) = {
         // Submit the application to the applications manager.
         logInfo("Submitting application to ASM")
    -    super.submitApplication(appContext)
    +    yarnClient.submitApplication(appContext)
       }
     
    +  def getApplicationReport = yarnClient.getApplicationReport _
    +
    +  def stop = yarnClient.stop _
    --- End diff --
    
    OK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/850#issuecomment-44415223
  
    +1 looks good. 
    
    I will note this again, the interfaces weren't cleaned up and tagged until Hadoop 2.2.  Hadoop 0.23 has been around for much longer then that and this is what the original port was done against.  The YarnClientImpl api was marked public in 0.23.  Unfortunately when the port of spark to hadoop 2.2 was done it wasn't caught.   



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/850#issuecomment-43831445
  
    +1.  There's a separate version of Client under yarn/alpha that handles 0.23, so this change shouldn't cause any problems on that front.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/850#discussion_r12930168
  
    --- Diff: yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
    @@ -132,15 +134,19 @@ class Client(clientArgs: ClientArguments, hadoopConf: Configuration, spConf: Spa
       def submitApp(appContext: ApplicationSubmissionContext) = {
         // Submit the application to the applications manager.
         logInfo("Submitting application to ASM")
    -    super.submitApplication(appContext)
    +    yarnClient.submitApplication(appContext)
       }
     
    +  def getApplicationReport = yarnClient.getApplicationReport _
    --- End diff --
    
    Is this necessary? You seem to be called yarnClient.getApplicationReport directly. 
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/850#issuecomment-43824695
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

Posted by cmccabe <gi...@git.apache.org>.
Github user cmccabe commented on the pull request:

    https://github.com/apache/spark/pull/850#issuecomment-44456990
  
    I made a few small style changes suggested by Aaron Davidson.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

Posted by cmccabe <gi...@git.apache.org>.
Github user cmccabe commented on the pull request:

    https://github.com/apache/spark/pull/850#issuecomment-44564644
  
    I tested this by running Spark on YARN against Hadoop 2.4 on a 4 node cluster.  I used the Pi example job.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/850#issuecomment-43831359
  
    Does this API also work on Hadoop 0.23? We have some constraints in that we want to support that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

Posted by cmccabe <gi...@git.apache.org>.
Github user cmccabe commented on the pull request:

    https://github.com/apache/spark/pull/850#issuecomment-44458395
  
    So I was unable to reproduce the failure without this patch on CDH5.1.0.  It looks like it might have been a bad build (I didn't clean enough).  
    
    On a more positive note, I confirmed that we cannot build against hadoop 3.0.0 without this patch.  We get errors like
    
        [error] /home/cmccabe/spark/b3.0/yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:29: object YarnClientImpl is not a member of package org.apache.hadoop.yarn.client
        [error] import org.apache.hadoop.yarn.client.YarnClientImpl
        [error]        ^
        [error]


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

Posted by cmccabe <gi...@git.apache.org>.
Github user cmccabe commented on a diff in the pull request:

    https://github.com/apache/spark/pull/850#discussion_r12930649
  
    --- Diff: yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
    @@ -132,15 +134,19 @@ class Client(clientArgs: ClientArguments, hadoopConf: Configuration, spConf: Spa
       def submitApp(appContext: ApplicationSubmissionContext) = {
         // Submit the application to the applications manager.
         logInfo("Submitting application to ASM")
    -    super.submitApplication(appContext)
    +    yarnClient.submitApplication(appContext)
       }
     
    +  def getApplicationReport = yarnClient.getApplicationReport _
    --- End diff --
    
    This is here so YarnClientSchedulerBackend can call Client#getApplicationReport.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1898: In deploy.yarn.Client, use YarnCli...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/850#issuecomment-44493998
  
    Okay seems like this is in good shape. @cmccabe have you tested this at all?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---