You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vanzin <gi...@git.apache.org> on 2014/08/06 23:47:12 UTC

[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/1813

    [SPARK-2848] Shade Guava in uber-jars.

    For further discussion, please check the JIRA entry.
    
    This change moves Guava classes to a different package so that they don't conflict with the user-provided Guava (or the Hadoop-provided one). Since one class (Optional) was exposed through Spark's public API, that class was forked from Guava at the current dependency version (14.0.1) so that it can be kept going forward (until the API is cleaned).
    
    Since sbt doesn't provide the relocation feature maven-shade-plugin does, I implemented the functionality (based on the souce for maven's plugin).
    
    Note this change has a few implications:
    - *all* classes in the final jars will reference the relocated classes. If Hadoop classes are included (i.e. "-Phadoop-provided" is not activated), those will also reference the Guava 14 classes (instead of the Guava 11 classes from the Hadoop classpath).
    - if the Guava version in Spark is ever changed, the new Guava will still reference the forked Optional class; this may or may not be a problem, but in the long term it's better to think about removing Optional from the public API.
    
    For the end user, there are two visible implications:
    
    - Guava is not provided as a transitive dependency anymore (since it's "provided" in Spark)
    - At runtime, unless they provide their own, they'll either have no Guava or Hadoop's version of Guava (11), depending on how they set up their classpath.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-2848

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1813.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1813
    
----
commit 616998ee8431e5a9becaeaec7df2e44520abc624
Author: Marcelo Vanzin <va...@cloudera.com>
Date:   2014-08-05T20:34:22Z

    Shade Guava in the maven build, fork Guava's Optional.java.

commit 2fec9903c818d14ac79d3bcc2bf9d2ef7c1ca4fc
Author: Marcelo Vanzin <va...@cloudera.com>
Date:   2014-08-05T20:34:43Z

    Shade Guava in the sbt build.

commit 637189bd81ef2caf865292d2e7db7e7978d463c9
Author: Marcelo Vanzin <va...@cloudera.com>
Date:   2014-08-06T19:25:23Z

    Add hacky filter to prefer Spark's copy of Optional.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r16204071
  
    --- Diff: project/SparkBuild.scala ---
    @@ -276,6 +289,35 @@ object Assembly {
     
     }
     
    +object CoreAssembly {
    +  import sbtassembly.Plugin._
    +  import AssemblyKeys._
    +
    +  lazy val settings = assemblySettings ++ Seq(
    --- End diff --
    
    Please add a comment here explaining why we're doing this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-52566122
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18792/consoleFull) for   PR 1813 at commit [`9bdffb0`](https://github.com/apache/spark/commit/9bdffb01d13647fbd8e07eed800a210b9ce28a85).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin closed the pull request at:

    https://github.com/apache/spark/pull/1813


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r15908419
  
    --- Diff: pom.xml ---
    @@ -249,6 +249,7 @@
             <groupId>com.google.guava</groupId>
             <artifactId>guava</artifactId>
             <version>14.0.1</version>
    +        <scope>provided</scope>
    --- End diff --
    
    Well, if someone "accidentally" pulls Spark as a compile-time dependency, they'd run into what I described. This avoids that problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by liuzhongshu <gi...@git.apache.org>.
Github user liuzhongshu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r58981359
  
    --- Diff: assembly/pom.xml ---
    @@ -43,6 +43,12 @@
       </properties>
     
       <dependencies>
    +    <!-- Promote Guava to compile scope in this module so it's included while shading. -->
    +    <dependency>
    +      <groupId>com.google.guava</groupId>
    +      <artifactId>guava</artifactId>
    +      <scope>compile</scope>
    --- End diff --
    
    why use compile here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r16067286
  
    --- Diff: core/src/main/java/com/google/common/base/Optional.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Copyright (C) 2011 The Guava Authors
    --- End diff --
    
    I see, I would still include it because the copyright is from another organization. I don't see why adding this would hurt.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r16381095
  
    --- Diff: project/SparkBuild.scala ---
    @@ -259,12 +265,19 @@ object Assembly {
       import sbtassembly.Plugin._
       import AssemblyKeys._
     
    +  private val relocators = List(
    +    new Relocator("com.google", "org.spark-project.guava", Seq("com\\.google\\.common\\..*".r),
    +      Seq("com\\.google\\.common\\.base\\.Optional.*".r)))
    +  private val shade = new ShadeStrategy(relocators)
    +
       lazy val settings = assemblySettings ++ Seq(
         test in assembly := {},
         jarName in assembly <<= (version, moduleName) map { (v, mName) => mName + "-"+v + "-hadoop" +
           Option(System.getProperty("hadoop.version")).getOrElse("1.0.4") + ".jar" },
         mergeStrategy in assembly := {
           case PathList("org", "datanucleus", xs @ _*)             => MergeStrategy.discard
    +      case PathList("org", "objectweb", "asm", xs @ _*)        => MergeStrategy.discard
    --- End diff --
    
    Ah, no, this is leftover from a previous change in this patch. I'll remove.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r16034200
  
    --- Diff: core/src/main/java/com/google/common/base/Optional.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Copyright (C) 2011 The Guava Authors
    --- End diff --
    
    Not according to my reading of
    http://www.apache.org/dev/licensing-howto.html , in the part covering
    bundled AL2 dependencies, but this stuff is always less than intuitive to
    me.
    
    I suppose the reasoning is that Spark's AL2 license already exactly
    describes the terms of the licensing for that file. That's not quite true
    for BSD or MIT licenses.
    
    The copyright owner doesn't matter; there are hundreds of them. It matters
    just how whoever they are license their IP to be used.
    On Aug 11, 2014 12:22 AM, "Matei Zaharia" <no...@github.com> wrote:
    
    > In core/src/main/java/com/google/common/base/Optional.java:
    >
    > > @@ -0,0 +1,243 @@
    > > +/*
    > > + * Copyright (C) 2011 The Guava Authors
    >
    > Shouldn't we still list it in our LICENSE? We've listed all source files
    > that are not copyright Apache, which would include this one.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/1813/files#r16034129>.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by mfawzymkh <gi...@git.apache.org>.
Github user mfawzymkh commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-70446580
  
    do we have an ETA to get this pull request merged to master?  The guava shading issue is causing a problem for client libs that has a dependency on swift-service  when spark is compiled with hadoop-2.4 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-51407096
  
    QA tests have started for PR 1813. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18069/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-52549981
  
    Hey Marcelo,
    
    I took a look through this - I also discussed in some detail with @srowen this past weekend. If we want to fix this problem using shading in our build, this seems like as good a way as any to do it. It does substantially increase the complexity of the build which is not great, so maybe longer term we could consider moving towards just publishing a shaded version of guava upstream.
    
    Regarding this specific implementation, I noticed there is a lot of effort to mirror this logic in the sbt build. I wonder if maybe we should just leave this out of the sbt build entirely to reduce the complexiy and in the sbt build simply have guava be a compile time dependency like it was before. We've said in the past that our official reference build for packagers is maven, so I think it's fine if some of the more esoteric build configurations do not exist in sbt.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r15908355
  
    --- Diff: pom.xml ---
    @@ -249,6 +249,7 @@
             <groupId>com.google.guava</groupId>
             <artifactId>guava</artifactId>
             <version>14.0.1</version>
    +        <scope>provided</scope>
    --- End diff --
    
    Yeah I understand what the scopes mean. But Spark is already "provided" to any project compiling against Spark; you're not suppose to pull in Spark as compile-scope. This makes all its transitive dependencies provided in this scenario already.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-52859187
  
    Okay - let's give this a try. Merging it into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-51564110
  
    Hey so after thinking about it a bit more, maybe it's worth experimenting with this simpler approach of in-lining the shaded dependency in our jars using build plug-ins. The main reason I think it's worth trying is that, in the worst case, if this has some random unintended consequence we can just revert to our existing method of publishing fully shaded dependencies to maven central.
    
    One thought for @vanzin and @srowen - what if instead of having our own copy of Optional, we just continue advertise a full compile dependency on guava and we get Optional from that. As long as Optional does not change. Then basically we have a "public" use of Gauva which is for this Optional class and a "private" use of Guava which we shade away. I think that will work as long as Optional does not change in guava... but that is the exact scenario in which the current approach works as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r16381169
  
    --- Diff: project/SparkBuild.scala ---
    @@ -276,6 +289,41 @@ object Assembly {
     
     }
     
    +/**
    + * Settings for the spark-core artifact. We don't want to expose Guava as a compile-time dependency,
    + * but at the same time the Java API exposes a Guava type (Optional). So we package it with the
    + * spark-core jar using the assembly plugin, and use the assembly deliverable as the main artifact
    + * for that project, disabling the non-assembly jar.
    + */
    +object CoreAssembly {
    --- End diff --
    
    It's not that it does not create the spark-core jar anymore; it replaces it with the jar created by the assembly plugin.
    
    I'm pretty sure I tested publishLocal and the jar ended up in the correct place, but I'll run a build again just to be sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-51423938
  
    hey @vanzin - so yeah, in the past we've done this by creating separate dependencies and publishing them to maven central, this is a bit more work on our side, but it plays nicer for downstream users because the correct functioning of spark does not depend on our distribution artifacts. For instance, users who write applications that embed spark, they don't have to try to recreate or mimic our distribution artifacts. For this reason I have a slight bias for the existing approach, just because we've used it for a long time without any issues.
    
    The way we do this type of shading has been ad-hoc, we've typically cloned the relevant projects, re-written class names and then published them to maven central under the org.spark-project namespace.
    
    On the dev list I recently explained to @avati that it would be nice to include scripts in the Spark repo that do this. He's actually starting working on this and you can see an example for protobuf - I think the stuff he's done so far looks great:
    
    https://github.com/avati/spark-shaded/blob/master/0001-protobuf.sh
    
    I downloaded guava the other day and played around, I think it will be pretty straightforward to do this for guava as well.
    
    I actually know several other projects that have started to use our shaded artifacts (we tend to shade dependencies that are a common pain point) - so I think there is some value in continuing to do this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r16204228
  
    --- Diff: examples/pom.xml ---
    @@ -210,6 +216,12 @@
               </artifactSet>
               <filters>
                 <filter>
    +              <artifact>com.google.guava:guava</artifact>
    --- End diff --
    
    (I guess it's above too, but wouldn't hurt to place it exactly where we do the exclude).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r16067603
  
    --- Diff: core/src/main/java/com/google/common/base/Optional.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Copyright (C) 2011 The Guava Authors
    --- End diff --
    
    @srowen yes that's the alternative.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by qiaohaijun <gi...@git.apache.org>.
Github user qiaohaijun commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-61368980
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-51602338
  
    @pwendell This is complicated, so let me summarize what I think I understand and then give an opinion.
    
    The plan at hand concerned changes to the assembly JARs, pretty much exclusively. Spark continues to depend on Guava in order to compile, and so that consumers of Spark also see this dependency in their builds.
    
    (You can see the discussion with @vanzin about whether to alter how downstream consumers of the Maven build see the Guava dependency -- `provided` vs `compile` vs `optional` -- but this doesn't affect the assembly, and only affects whether Guava automatically becomes a part of the downstream consumer's classpath. I still have a minor issue with `provided`, but unless @pwendell has any opinion on this, consider me outvoted.)
    
    So I think the proposal was already to continue showing a dependency on Guava, but, to alter the assembly to include only `Optional`, and shaded copies of all else. And maybe to tweak the dependency to hide it by default for downstream consumers.
    
    Inlining the shaded JARs is all but like removing use of Guava, as far as the world is concerned. That certainly takes it out of the picture. The little snag is Optional. It will now inevitably collide with any other app that builds Guava in too. And downstream consumers can't manage away the conflict normally. It may either be a harmless warning or cause a real problem if Optional changes later. In general I don't like pushing the shading into the compile-time world. Remember the `hive-exec` awfulness? This would not be my first choice, versus:
    
    I think your second paragraph is basically the same thing proposed already, with one difference. Yes, the build still continues to say it depends on Guava. The concrete assembly is put together differently. I think you *can* use Guava's Optional rather than embedding a source copy of it and compiling that. Right @vanzin? with some shading cleverness?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r16493011
  
    --- Diff: pom.xml ---
    @@ -1017,6 +1018,21 @@
     
       <profiles>
     
    +    <!--
    +      This profile is enabled automatically by the sbt built. It changes the scope for the guava
    +      dependency, since we don't shade it in the artifacts generated by the sbt build.
    +    -->
    +    <profile>
    +      <id>sbt</id>
    +      <dependencies>
    +        <dependency>
    +          <groupId>com.google.guava</groupId>
    +          <artifactId>guava</artifactId>
    +          <scope>compile</scope>
    --- End diff --
    
    hey just wondering - since spark core lists this a compile dependency, and everything else depends on spark core, what makes this necessary?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-53352501
  
    @vanzin Would you mind to have a look at [SPARK-3217](https://issues.apache.org/jira/browse/SPARK-3217)? I think this PR breaks Spark Maven build. Jenkins didn't complain because it uses SBT and thus doesn't shade Guava.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r16034129
  
    --- Diff: core/src/main/java/com/google/common/base/Optional.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Copyright (C) 2011 The Guava Authors
    --- End diff --
    
    Shouldn't we still list it in our LICENSE? We've listed all source files that are not copyright Apache, which would include this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r15909040
  
    --- Diff: pom.xml ---
    @@ -249,6 +249,7 @@
             <groupId>com.google.guava</groupId>
             <artifactId>guava</artifactId>
             <version>14.0.1</version>
    +        <scope>provided</scope>
    --- End diff --
    
    Yes it's what "provided" means, but why do you assume the app provides Guava itself, but none of the others? I see no reason to think of Guava specially and not Commons Math or log4j or whatever. What does it solve to treat this exceptionally? It means extra declarations in Spark, and does not affect your ability to shade the dependency in the assembly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by mfawzymkh <gi...@git.apache.org>.
Github user mfawzymkh commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-70557289
  
    Sorry , I confused this with some other issue.  Thanks 
    
    Sent from my iPhone
    
    > On Jan 19, 2015, at 1:54 AM, Sean Owen <no...@github.com> wrote:
    > 
    > @mfawzymkh As you can see this was merged into Spark a long time ago. What are you waiting on?
    > 
    > —
    > Reply to this email directly or view it on GitHub.
    > 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r16377247
  
    --- Diff: core/pom.xml ---
    @@ -68,9 +68,15 @@
           <groupId>org.eclipse.jetty</groupId>
           <artifactId>jetty-server</artifactId>
         </dependency>
    +    <!--
    +      Promote Guava to "compile" so that maven-shade-plugin picks it up (for packaging the Optional
    +      class exposed in the Java API). The plugin will then remove this dependency from the published
    --- End diff --
    
    I just tested this locally and it appears that it does get excluded by virtue of the artifact set.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-51547991
  
    Does the shader plug-in work if we want to actually publish a jar to maven central? If that's the case including a POM seems like a solid option. We'd want to at least have some simple script though that downloads guava and then inserts that pom - rather than maintain it in a separate repo.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r16512573
  
    --- Diff: pom.xml ---
    @@ -1017,6 +1018,21 @@
     
       <profiles>
     
    +    <!--
    +      This profile is enabled automatically by the sbt built. It changes the scope for the guava
    +      dependency, since we don't shade it in the artifacts generated by the sbt build.
    +    -->
    +    <profile>
    +      <id>sbt</id>
    +      <dependencies>
    +        <dependency>
    +          <groupId>com.google.guava</groupId>
    +          <artifactId>guava</artifactId>
    +          <scope>compile</scope>
    --- End diff --
    
    I just meant that, since we are using the shade plug-in, will it just remove the entire guava from our published pom's anyways... then I realized that it won't remove guava from the root pom most likely, since we only shade inside of spark-core.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r16493693
  
    --- Diff: pom.xml ---
    @@ -1017,6 +1018,21 @@
     
       <profiles>
     
    +    <!--
    +      This profile is enabled automatically by the sbt built. It changes the scope for the guava
    +      dependency, since we don't shade it in the artifacts generated by the sbt build.
    +    -->
    +    <profile>
    +      <id>sbt</id>
    +      <dependencies>
    +        <dependency>
    +          <groupId>com.google.guava</groupId>
    +          <artifactId>guava</artifactId>
    +          <scope>compile</scope>
    --- End diff --
    
    See previous discussion I had with Sean in this PR.
    
    There's really no good scope in maven for what we want to do. "compile" means that guava will show up in the user's compilation classpath, and since we're shading guava in the assembly, it may lead to runtime errors since guava won't be there. "provided" means that guava is not exposed to the user at all; so if the user wants to use guava, he'll need to add an explicit dependency and package it himself.
    
    I like "provided" better because it means things fail for the user at build time, not runtime, if the user is using guava but hasn't declared the dependency. And that's why I had all those hacks in the sbt build that you asked me to remove.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-52553944
  
    Hi Patrick,
    
    Regarding having the compile-scoped dependency in sbt only, the only way I can think of doing it is to create a profile that is always enabled from the sbt build, since the dependencies always come from the poms. 
    
    I agree that the sbt build is a little hacky with this patch, but at least it generates the same artifacts as the maven one. So it would avoid "this works with maven but not with sbt" situations, even though they might be rare.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r15978313
  
    --- Diff: project/Relocator.scala ---
    @@ -0,0 +1,156 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +import java.io._
    +
    +import scala.util.matching.Regex
    +
    +import org.objectweb.asm._
    +import org.objectweb.asm.commons._
    +import sbtassembly.Plugin._
    +
    +/**
    + * Relocates classes that match the configuration to a new location. Tries to match the options
    + * available in the maven-shade-plugin.
    + *
    + * @param prefix Prefix that classes to be relocated must match.
    + * @param shaded New prefix for classes that match.
    + * @param includes Regexes for classes to include inside the matching package (empty = all).
    + * @param excludes Regexes for classes to exclude from the matching package (empty = none).
    + */
    +class Relocator(prefix: String, shaded: String, includes: Seq[Regex], excludes: Seq[Regex]) {
    --- End diff --
    
    This might be useful as a standalone sbt plug-in, I'm guessing a lot of other projects would use this - did you think at all about doing that? (of course that's totally up to you, just had the thought).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-51633944
  
    Lots of comments, let me try to write a coherent response to all of them. :-)
    
    ### Shaded jars and maven central
    
    Shouldn't be a problem. You're publishing your project's artifact to maven central, which just happens to include the shaded classes. This is similar to how assembly jars are published (http://search.maven.org/#browse|-1363484021).
    
    ### The Optional class
    
    It should be possible to use Guava's original Optional, without having to fork it. The only reason I forked it is to allow changing the version of Guava without affecting the Spark API. But that can be done on-demand if needed. I'd still package it in the spark-core jar, otherwise the guava dependency would need to have "compile" scope (see below).
    
    ### Compile vs. Provided
    
    My main argument for using "provided" is to avoid leaking guava into the user's *compilation classpath*. Users depend on spark-core (e.g.), and if spark-core has a compile dependency on guava, guava will be available in the user's compilation classpath (regadless of whether spark-core is set up as compile or provided). If that dependency is provided (and thus not transitive), then it never shows up; if the user needs guava, he needs to explicitly depend on it.
    
    This does have effects on packaging, though: for people using an existing spark assembly to run their apps, nothing changes. But if the user is creating his own uber-jar with spark and everything else in it, and running it in some custom environment, he'll need to explicitly package guava (even if his app doesn't use it). My belief is that the former case is the overwhelming majority, and the latter case means the user has to care about too many things already (e.g. choosing compatible versions of everything between his app and spark and hadoop et al) for this little thing to be a real concern.
    
    ### Shading at compile time
    
    @srowen let me know if I misunderstood, but do you mean creating a spark-core jar with the shaded guava classes in it? I actually started with that alternative, but it seems messy. Either you have all downstream projects reference the shaded classes (something that would be needed if going with the separate shaded jar approach), or you get into a weird place, where spark-core is referencing the shaded classes but all downstream projects are not. The assembly would fix everything (by shading again), but someone not using the assembly could get really weird errors.
    
    Also, having duplicate class names in the classpath really confuses some IDEs that automatically add imports. More manual configuration for people to add...
    
    ### sbt shading plugin
    
    I think the code I wrote can be added to the sbt-assembly plugin (with some modifications). It probably is the best place for that, anyway, since I created it as a merge strategy for that plugin. But that's for later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-52571483
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18792/consoleFull) for   PR 1813 at commit [`9bdffb0`](https://github.com/apache/spark/commit/9bdffb01d13647fbd8e07eed800a210b9ce28a85).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r15909585
  
    --- Diff: pom.xml ---
    @@ -249,6 +249,7 @@
             <groupId>com.google.guava</groupId>
             <artifactId>guava</artifactId>
             <version>14.0.1</version>
    +        <scope>provided</scope>
    --- End diff --
    
    Also note that if you leave it as a compile-time dependency, and people are using Guava in their app, and not packaging Guava, *their app will fail to run*. Because the uber jar in the cluster won't have those classes.
    
    So saying "provided" here moves that particular error to the build.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r16493141
  
    --- Diff: pom.xml ---
    @@ -1017,6 +1018,21 @@
     
       <profiles>
     
    +    <!--
    +      This profile is enabled automatically by the sbt built. It changes the scope for the guava
    +      dependency, since we don't shade it in the artifacts generated by the sbt build.
    +    -->
    +    <profile>
    +      <id>sbt</id>
    +      <dependencies>
    +        <dependency>
    +          <groupId>com.google.guava</groupId>
    +          <artifactId>guava</artifactId>
    +          <scope>compile</scope>
    --- End diff --
    
    sorry - I see - the issue is that elsewhere this is not in compile scope. However, why not just have it be compile scope by default in the root pom? Is the issue that we need to make sure it does not exist at compile scope in our published root pom?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-51415792
  
    @pwendell I think you may have suggested in jira at some point to create a separate deliverable for the shaded guava jar, and have the spark build depend on that. Is that what you had in mind?
    
    If that's the preferred approach, it would require code changes (change imports) and I'd need pointers for how to create this separate deliverable (I don't think it would live inside the spark build? how is it done for hive?).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r15907528
  
    --- Diff: pom.xml ---
    @@ -249,6 +249,7 @@
             <groupId>com.google.guava</groupId>
             <artifactId>guava</artifactId>
             <version>14.0.1</version>
    +        <scope>provided</scope>
    --- End diff --
    
    If you leave it as compile, it will pollute the compilation classpath of people using Spark in their projects (Guava will be pulled as a compile-scope dependency too).
    
    http://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#Dependency_Scope
    
    Adding it as compile-scope in the assembly project is then needed because maven-shade-plugin will only include those in the final jar (pretty sure there are setting to include other scopes, but didn't want to play with those).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-52817833
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18965/consoleFull) for   PR 1813 at commit [`9bdffb0`](https://github.com/apache/spark/commit/9bdffb01d13647fbd8e07eed800a210b9ce28a85).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-52566045
  
    Hi @pwendell, I reverted the sbt changes as we talked about offline; required a tiny change in the pom so that everything works as intended in both builds.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r15907274
  
    --- Diff: pom.xml ---
    @@ -249,6 +249,7 @@
             <groupId>com.google.guava</groupId>
             <artifactId>guava</artifactId>
             <version>14.0.1</version>
    +        <scope>provided</scope>
    --- End diff --
    
    I'll probably learn something here, but why is it necessary to mark it provided and then compile-scope again in the assembly? what does it matter if it's compile scope everywhere? I realize Hive on Spark may need it to be provided but it can change the nature of this dependency down-stream if needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r16377010
  
    --- Diff: project/SparkBuild.scala ---
    @@ -276,6 +289,41 @@ object Assembly {
     
     }
     
    +/**
    + * Settings for the spark-core artifact. We don't want to expose Guava as a compile-time dependency,
    + * but at the same time the Java API exposes a Guava type (Optional). So we package it with the
    + * spark-core jar using the assembly plugin, and use the assembly deliverable as the main artifact
    + * for that project, disabling the non-assembly jar.
    + */
    +object CoreAssembly {
    --- End diff --
    
    Does this approach no longer create the `spark-core` jar? I don't think that is doable, because many people will do things like `sbt/sbt publish-local` and then link applications against their local install of spark. Could we just name the jar identically to the normal spark-core package jar?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r16377042
  
    --- Diff: project/SparkBuild.scala ---
    @@ -276,6 +289,41 @@ object Assembly {
     
     }
     
    +/**
    + * Settings for the spark-core artifact. We don't want to expose Guava as a compile-time dependency,
    + * but at the same time the Java API exposes a Guava type (Optional). So we package it with the
    + * spark-core jar using the assembly plugin, and use the assembly deliverable as the main artifact
    + * for that project, disabling the non-assembly jar.
    + */
    +object CoreAssembly {
    --- End diff --
    
    Also, we would need to make sure it's published when someone does `publish-local`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r16376702
  
    --- Diff: project/SparkBuild.scala ---
    @@ -259,12 +265,19 @@ object Assembly {
       import sbtassembly.Plugin._
       import AssemblyKeys._
     
    +  private val relocators = List(
    +    new Relocator("com.google", "org.spark-project.guava", Seq("com\\.google\\.common\\..*".r),
    +      Seq("com\\.google\\.common\\.base\\.Optional.*".r)))
    +  private val shade = new ShadeStrategy(relocators)
    +
       lazy val settings = assemblySettings ++ Seq(
         test in assembly := {},
         jarName in assembly <<= (version, moduleName) map { (v, mName) => mName + "-"+v + "-hadoop" +
           Option(System.getProperty("hadoop.version")).getOrElse("1.0.4") + ".jar" },
         mergeStrategy in assembly := {
           case PathList("org", "datanucleus", xs @ _*)             => MergeStrategy.discard
    +      case PathList("org", "objectweb", "asm", xs @ _*)        => MergeStrategy.discard
    --- End diff --
    
    Is this related to SPARK-2848?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r15907395
  
    --- Diff: project/Relocator.scala ---
    @@ -0,0 +1,156 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    --- End diff --
    
    I wrote it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r15907200
  
    --- Diff: project/Relocator.scala ---
    @@ -0,0 +1,156 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    --- End diff --
    
    Marcelo was this imported from somewhere or something you wrote? just checking if there is any LICENSE or NOTICE implication.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r16376478
  
    --- Diff: core/pom.xml ---
    @@ -68,9 +68,15 @@
           <groupId>org.eclipse.jetty</groupId>
           <artifactId>jetty-server</artifactId>
         </dependency>
    +    <!--
    +      Promote Guava to "compile" so that maven-shade-plugin picks it up (for packaging the Optional
    +      class exposed in the Java API). The plugin will then remove this dependency from the published
    --- End diff --
    
    Is there logic in here that specifically tells maven to exclude this from our published pom? Does this automatically happen by virtue of including the `artifactSet` below?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-51402372
  
    I used a tool I wrote to check the references in the final jars (check it out: https://gist.github.com/vanzin/bd9057fadf4a296220b7). Verified only references to Optional point at the old "com.google.common" package.
    
    Currently running unit tests (don't think they'd really be affected) and will double-check the sbt-generated jar to make sure asm dependencies are not changed from before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r15907168
  
    --- Diff: core/src/main/java/com/google/common/base/Optional.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Copyright (C) 2011 The Guava Authors
    --- End diff --
    
    PS I checked and there is no additional change we need to make to LICENSE or NOTICE as a result of including this. Guava is already AL2 and has no NOTICE that pertains to this file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r15909412
  
    --- Diff: pom.xml ---
    @@ -249,6 +249,7 @@
             <groupId>com.google.guava</groupId>
             <artifactId>guava</artifactId>
             <version>14.0.1</version>
    +        <scope>provided</scope>
    --- End diff --
    
    Well, this is kind of a problem with the scopes in maven and the way things are generally distributed...
    
    For libraries, I believe all dependencies should be "provided". So commons-math and all others should, in my view, also be provided. Of course I won't make that change here.
    
    When you package libraries into an app (which is the case of the uber-jars, meant to be distributed in a cluster), then you need to package those dependencies. In maven that generally means making them "compile" scope.
    
    For a pure library, you wouldn't have this packaging step; that would be done by the users of the library.
    
    Leaving the library with a compile-scope dependency means you'll run into all the issues I pointed out in the bug:
    - people will depend on the transitive dependency
    - at some point they might want to update the version
    - and at that point things will fail if they forget to package the new dependency
    
    So this change solves that problem for Guava. If you use it, you have to explicitly declare and package it. For others, I'll leave it for the time when people decide to tackle other dependencies (I explicitly created a sub-task for Guava to avoid having to deal with other dependencies).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-52520416
  
    So, any more feedback on this patch? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-51964832
  
    QA tests have started for PR 1813. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18386/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r15908499
  
    --- Diff: pom.xml ---
    @@ -249,6 +249,7 @@
             <groupId>com.google.guava</groupId>
             <artifactId>guava</artifactId>
             <version>14.0.1</version>
    +        <scope>provided</scope>
    --- End diff --
    
    Yea, in that scenario, someone's trying to run Spark outside of a cluster providing Spark or something. But then this fails since Guava is said to be provided, but is not actually provided by anything! Unless I suppose the app also depends on Guava.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-51964451
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r15908910
  
    --- Diff: pom.xml ---
    @@ -249,6 +249,7 @@
             <groupId>com.google.guava</groupId>
             <artifactId>guava</artifactId>
             <version>14.0.1</version>
    +        <scope>provided</scope>
    --- End diff --
    
    Well, but that's what provided means, right? It's provided. Somehow. That "how" is not the concern of the library being pulled.
    
    Normally it's provided by the Spark uber-jar which contains the dependency. But if someone is not using those, it becomes their job to provide it (and make sure it's a compatible version).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-52827706
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18965/consoleFull) for   PR 1813 at commit [`9bdffb0`](https://github.com/apache/spark/commit/9bdffb01d13647fbd8e07eed800a210b9ce28a85).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-51647806
  
    @vanzin Yes there's no problem with publishing artifacts containing copies of other artifacts as far as Maven Central is concerned. It's just icky for downstream consumers. I think the non-assembly artifacts should be the 'pristine' raw ingredients and assemblies can be crazy remixes.
    
    I can see the argument for provided. It has the pros and cons you mention. It's a moot point for people that use Spark as `provided`, which is what they're supposed to do. So, meh. Just drop a comment in the POM to explain the scope.
    
    I had understood that the goal was to alter the Spark assembly in some way, such that with some certain footwork in the Hive on Spark build, the Hive on Spark assembly would work with the Spark assembly as deployed on people's clusters. That's mostly about getting Guava out of the way.
    
    I don't think this entails making anyone depend on a shaded anything in Maven. I think that's avoidable and undesirable. So I think we are all basically agreeing on that? So to be clear, Spark continues to express a dependency on Guava in its POM. It must. The shading affects deployed assemblies.
    
    The only delta with what @pwendell suggests seems to be shading in Guava's Optional rather than keeping a source copy. But that seems fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-51543154
  
    @vanzin yeah so the model I'd like to have is that we include scripts in the Spark source code that can generate these "from scratch" - i.e. take a look at the script that @avati wrote. Then these can be contributed to Spark directly, we can put then in `/dev/shaded-deps`/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-51543650
  
    @pwendell I looked at that script but it looks overly complicated, at least for guava. A simple pom that invokes the shade plugin is much, much simpler. Would it be ok to have the script just run the pom?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-52971556
  
    @pwendell any idea why this PR wasn't automatically closed after the merge?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r16067467
  
    --- Diff: core/src/main/java/com/google/common/base/Optional.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Copyright (C) 2011 The Guava Authors
    --- End diff --
    
    @vanzin You're referring to just not including this source file and instead getting it in binary form from the Guava artifact right? b/c yes if it stays in source form it certainly must keep its copyright statement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-51964431
  
    Updated to remove forked code. I'm not too proud of the code to package things in the sbt build, but it's how I got things to work without having to expose Guava in the "compile" scope.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r16204110
  
    --- Diff: project/SparkBuild.scala ---
    @@ -276,6 +289,35 @@ object Assembly {
     
     }
     
    +object CoreAssembly {
    +  import sbtassembly.Plugin._
    +  import AssemblyKeys._
    +
    +  lazy val settings = assemblySettings ++ Seq(
    +    publishArtifact in (Compile, packageBin) := false,
    +    test in assembly := {},
    +    jarName in assembly <<= (name, scalaVersion, version) map {
    +      _ + "_" + _ + "-" + _ + "-assembly.jar"
    +    },
    +    excludedJars in assembly := {
    +      val cp = (fullClasspath in assembly).value
    +      cp filter {!_.data.getName.startsWith("guava")}
    +    },
    +    mergeStrategy in assembly := {
    +      case PathList("com", "google", "common", xs @ _*) =>
    +        if (xs.size == 2 && xs(0) == "base" && xs(1).startsWith("Optional")) {
    --- End diff --
    
    Can't you pattern-match on "base" above as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r16065213
  
    --- Diff: core/src/main/java/com/google/common/base/Optional.java ---
    @@ -0,0 +1,243 @@
    +/*
    + * Copyright (C) 2011 The Guava Authors
    --- End diff --
    
    BTW if we're comfortable with not locking this file to the 14.0.1 version (or remembering to do this when we change guava versions), I can remove this fork (and clean up some code elsewhere in this patch too).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-51971257
  
    QA results for PR 1813:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds the following public classes (experimental):<br>class Relocator(prefix: String, shaded: String, includes: Seq[Regex], excludes: Seq[Regex]) {<br>class RelocatorRemapper(relocators: List[Relocator]) extends Remapper {<br>class ShadeStrategy(relocators: List[Relocator]) extends MergeStrategy {<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18386/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-51493279
  
    Here's a repo with a pom file to shade guava:
    https://github.com/vanzin/spark-shaded


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-52817201
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1813#discussion_r16204157
  
    --- Diff: examples/pom.xml ---
    @@ -210,6 +216,12 @@
               </artifactSet>
               <filters>
                 <filter>
    +              <artifact>com.google.guava:guava</artifact>
    --- End diff --
    
    Ditto here, add a comment on why we keep Optional


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-51485545
  
    @pwendell creating the shaded deliverable for Guava should be trivial, but where do I create a pull request for it? I can work on the pom file to enable that and publish it to my github in the meantime.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by avati <gi...@git.apache.org>.
Github user avati commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-51549053
  
    shade plug-in works in many cases but not in all. From what I understand, if the dependency is direct, shade plugin works fine. In case of protobuf it was indirect dependency - akka-2.3 specifically needs protobuf 2.5 and hadoop1 specifically needs protobuf 2.4. So a custom rebuilt package of either one was inevitable.
    
    I haven't looked deep into the guava situation, just provided some context.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-52817182
  
    Just a minor question but LGTM - we should merge this one into master only.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-70468806
  
    @mfawzymkh As you can see this was merged into Spark a long time ago. What are you waiting on?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-51411555
  
    QA results for PR 1813:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds the following public classes (experimental):<br>public abstract class Optional<T> implements Serializable {<br>class Relocator(prefix: String, shaded: String, includes: Seq[Regex], excludes: Seq[Regex]) {<br>class RelocatorRemapper(relocators: List[Relocator]) extends Remapper {<br>class ShadeStrategy(relocators: List[Relocator], preferLocal: List[Regex]) extends MergeStrategy {<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18069/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-51414756
  
    After a side-bar -- I still think `compile` scope is correct for Guava, but that it can be marked `optional` as an extra safe-guard against people linking to Spark's Guava accidentally. This seems to be the motivation for `provided` scope, since it achieves this as a secondary effect. But in the exceptional case where Spark is being used outside an environment where Spark is provided, this has the undesirable primary effect of making Spark not execute for lack of Guava, unless the packager also includes Guava for it. Either way can be made to work and are equivalent for the common case, where Spark is `provided`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org