You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vanzin <gi...@git.apache.org> on 2015/04/11 01:26:15 UTC

[GitHub] spark pull request: [SPARK-5808] [build] Package pyspark files in ...

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/5461

    [SPARK-5808] [build] Package pyspark files in sbt assembly.

    This turned out to be more complicated than I wanted because the
    layout of python/ doesn't really follow the usual maven conventions.
    So some extra code is needed to copy just the right things.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-5808

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5461.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5461
    
----
commit ee90e843dc7d57e64b97b96dcc42f73e637f9da5
Author: Marcelo Vanzin <va...@cloudera.com>
Date:   2015-04-10T23:24:05Z

    [SPARK-5808] [build] Package pyspark files in sbt assembly.
    
    This turned out to be more complicated than I wanted because the
    layout of python/ doesn't really follow the usual maven conventions.
    So some extra code is needed to copy just the right things.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5808] [build] Package pyspark files in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5461#issuecomment-91715715
  
      [Test build #30057 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30057/consoleFull) for   PR 5461 at commit [`ee90e84`](https://github.com/apache/spark/commit/ee90e843dc7d57e64b97b96dcc42f73e637f9da5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5808] [build] Package pyspark files in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5461#issuecomment-91723254
  
      [Test build #30059 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30059/consoleFull) for   PR 5461 at commit [`7153dac`](https://github.com/apache/spark/commit/7153dac23426a6ea2ec81e7d76b3b3272f19a54d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5808] [build] Package pyspark files in ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5461#issuecomment-91736083
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30059/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5808] [build] Package pyspark files in ...

Posted by lianhuiwang <gi...@git.apache.org>.
Github user lianhuiwang commented on the pull request:

    https://github.com/apache/spark/pull/5461#issuecomment-91741359
  
    LGTM @andrewor14


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5808] [build] Package pyspark files in ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/5461


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5808] [build] Package pyspark files in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5461#issuecomment-91736077
  
      [Test build #30059 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30059/consoleFull) for   PR 5461 at commit [`7153dac`](https://github.com/apache/spark/commit/7153dac23426a6ea2ec81e7d76b3b3272f19a54d).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5808] [build] Package pyspark files in ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5461#discussion_r28368051
  
    --- Diff: project/SparkBuild.scala ---
    @@ -345,6 +349,60 @@ object Assembly {
       )
     }
     
    +object PySparkAssembly {
    +  import sbtassembly.Plugin._
    +  import AssemblyKeys._
    +
    +  lazy val settings = Seq(
    +    unmanagedJars in Compile += { BuildCommons.sparkHome / "python/lib/py4j-0.8.2.1-src.zip" },
    --- End diff --
    
    would be best to not hardcode this version here, but since we do that everywhere already (e.g. `bin/pyspark`, `PythonRunner`) this change by itself doesn't add much more technical debt.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5808] [build] Package pyspark files in ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/5461#issuecomment-93053023
  
    Merging this into master. It was somewhat arbitrary that PySpark on YARN could only by run if you build the jar specifically with Maven and with java 6. Good to have one of those requirements lifted. Thanks @vanzin.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5808] [build] Package pyspark files in ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/5461#issuecomment-91717573
  
    Also, I tested this on a YARN cluster without setting SPARK_HOME everywhere and it works. For the curious:
    
        $ jar tf assembly/target/scala-2.10/spark-assembly-1.4.0-SNAPSHOT-hadoop2.5.0.jar | grep \\.py     
        META-INF/maven/net.sf.py4j/
        META-INF/maven/net.sf.py4j/py4j/
        META-INF/maven/net.sf.py4j/py4j/pom.properties
        META-INF/maven/net.sf.py4j/py4j/pom.xml
        py4j/__init__.py
        py4j/compat.py
        py4j/finalizer.py
        py4j/java_collections.py
        py4j/java_gateway.py
        [snip]
        pyspark/tests.py
        pyspark/traceback_utils.py
        pyspark/worker.py



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5808] [build] Package pyspark files in ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5461#issuecomment-91717166
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30057/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5808] [build] Package pyspark files in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5461#issuecomment-91717163
  
      [Test build #30057 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30057/consoleFull) for   PR 5461 at commit [`ee90e84`](https://github.com/apache/spark/commit/ee90e843dc7d57e64b97b96dcc42f73e637f9da5).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5808] [build] Package pyspark files in ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5461#discussion_r28316737
  
    --- Diff: project/SparkBuild.scala ---
    @@ -345,6 +349,60 @@ object Assembly {
       )
     }
     
    +object PySparkAssembly {
    +  import sbtassembly.Plugin._
    +  import AssemblyKeys._
    +
    +  lazy val settings = Seq(
    +    unmanagedJars in Compile += { BuildCommons.sparkHome / "python/lib/py4j-0.8.2.1-src.zip" },
    +    // Use a resource generator to copy all .py files from python/pyspark into a managed directory
    +    // to be included in the assembly. We can't just add "python/" to the assembly's resource dir
    +    // list since that will copy unneeded / unwanted files.
    +    resourceGenerators in Compile <+= resourceManaged in Compile map { outDir: File =>
    +      val dst = new File(outDir, "pyspark")
    +      if (!dst.isDirectory()) {
    +        require(dst.mkdirs())
    +      }
    +
    +      val src = new File(BuildCommons.sparkHome, "python/pyspark")
    +      copy(src, dst)
    +    }
    +  )
    +
    +  private def copy(src: File, dst: File): Seq[File] = {
    --- End diff --
    
    Dang, I guess we can't use Guava or Java 7 nio here to just call an existing "copy" method, and shelling out is maybe error prone. I suppose we can't find any support in the assembly plugin. Well the result empirically looks like a correct fix, even if I'm not an SBT expert. LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org