You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by CodingCat <gi...@git.apache.org> on 2014/04/21 21:55:47 UTC

[GitHub] spark pull request: SPARK-1556: bump jet3st version to 0.9.0

GitHub user CodingCat opened a pull request:

    https://github.com/apache/spark/pull/468

    SPARK-1556: bump jet3st version to 0.9.0

    In Hadoop 2.2.x or newer, Jet3st 0.9.0 which defines S3ServiceException/ServiceException is introduced, however, Spark still relies on Jet3st 0.7.x which has no definition of these classes
    
    What I met is as
    
    ```
    14/04/21 19:30:53 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
    14/04/21 19:30:53 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
    14/04/21 19:30:53 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
    14/04/21 19:30:53 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
    14/04/21 19:30:53 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
    java.lang.NoClassDefFoundError: org/jets3t/service/S3ServiceException
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:280)
    at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:270)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2316)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:90)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2350)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2332)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:221)
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:140)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
    at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
    at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:891)
    at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:741)
    at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:692)
    at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:574)
    at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:900)
    at $iwC$$iwC$$iwC$$iwC.<init>(<console>:15)
    at $iwC$$iwC$$iwC.<init>(<console>:20)
    at $iwC$$iwC.<init>(<console>:22)
    at $iwC.<init>(<console>:24)
    at <init>(<console>:26)
    at .<init>(<console>:30)
    at .<clinit>(<console>)
    at .<init>(<console>:7)
    at .<clinit>(<console>)
    at $print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:772)
    at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1040)
    at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:609)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:640)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:604)
    at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:793)
    at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:838)
    at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:750)
    at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:598)
    at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:605)
    at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:608)
    at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:931)
    at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:881)
    at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:881)
    at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
    at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:881)
    at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:973)
    at org.apache.spark.repl.Main$.main(Main.scala:31)
    at org.apache.spark.repl.Main.main(Main.scala)
    Caused by: java.lang.ClassNotFoundException: org.jets3t.service.S3ServiceException
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 63 more
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/CodingCat/spark SPARK-1556

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/468.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #468
    
----
commit 398dbabd08ca46c0a5acfb035f25b637ecff63cd
Author: Nan Zhu <co...@users.noreply.github.com>
Date:   2014-04-21T19:54:43Z

    SPARK-1556: bump jet3st version to 0.9.0

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by CodingCat <gi...@git.apache.org>.

Github user CodingCat commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41734924
  
    I think the possible way to do that is compile a jets3t0.9.0-enabled version by yourself
    
    then compile your application against this version .... I think to access HDFS-compatible fs, we eventually call the code in application jar


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by LuqmanSahaf <gi...@git.apache.org>.

Github user LuqmanSahaf commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-96522017
  
    @darose I am facing the VerifyError you mentioned in one of the comments. Can you tell me how you solved that error?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41770038

@darose this can be patched downstream, but that would not fix this for any other distro. Ideally, the dependency is set to 0.9.0 when built against Hadoop 2.3.0+. As we've seen in other cases, it's possible to manage this with more profiles in the build -- a PITA, but certainly possible. (I don't know if this helps the SBT build though, but presumably some similar logic about the Hadoop version is possible there.)

The funny thing is that this dependency is only needed at runtime. (Really should be declared as <scope>runtime</scope>) I am still not sure why hadoop-client doesn't package it. However, I wonder if, in the context of a Hadoop cluster, it's going to be on the classpath anyway? and then it would be the right version in all cases. What if you change the scope, so jets3t is not even in the assembly?

I actually bet that works, and is simple. However I think it means that s3 would no longer work if running Spark by itself, so is probably a non-starter.

So, a new hadoop2.3.0 profile? that we try to trigger based on well-known hadoop.version values?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41748394
  
    @CodingCat the problem is that on worker nodes there will be the wrong jets3t in the Spark JAR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by CodingCat <gi...@git.apache.org>.

Github user CodingCat commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41079532
  
    Hi, @mateiz @srowen , if Spark built with Hadoop 1.0.4/2.x (x < 3)  and jets3t 0.9.0 can access S3 smoothly, does it also mean that bumping to 0.9.0 is safe?
     
    I'm going to give a manual test tonight or tomorrow


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41639344
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by darose <gi...@git.apache.org>.

Github user darose commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41764125
  
    So @srowen, I think @mateiz is right, the CDH5 spark-core package (on Ubuntu, it's version 0.9.0+cdh5.0.0+31-1.cdh5.0.0.p0.31~precise-cdh5.0.0) won't function correctly due to this issue and so would need to get rebuilt against jets3t 0.9.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41640658
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14551/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41073254
  
    Great, so there's no easy way to set it based on profiles and support all Hadoop versions :). Maybe for Hadoop 2.3+ users, we can just tell them to add a new version of jets3t to their own project's build? We can certainly have our pre-built binaries include the right one too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jet3st version to 0.9.0

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-40972317
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by CodingCat <gi...@git.apache.org>.

Github user CodingCat closed the pull request at:

    https://github.com/apache/spark/pull/468


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by darose <gi...@git.apache.org>.

Github user darose commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41797308
  
    What I can confirm is that trying to remove the jets3t 0.7 jars from the CDH spark-core package and replace them with 0.9 jars doesn't fix the issue.  (I'm guessing because spark was built against the 0.7 jars.)  Results in a verifier error:
    
      Location:
        org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore.initialize(Ljava/net/URI;Lorg/apache/hadoop/conf/Configuration;)V @38: invokespecial
      Reason:
        Type 'org/jets3t/service/security/AWSCredentials' (current frame, stack[3]) is not assignable to 'org/jets3t/service/security/ProviderCredentials'
    
    So what options do I have to get spark working on Hadoop 2.3 until SPARK-1556 gets fixed.  (And deployed to an update of CDH.)  I'm guessing my only recourse is to build spark from source?  (After tweaking the project/SparkBuild.scala file to update it to "net.java.dev.jets3t"      % "jets3t"           % "0.9.0")


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-96643172
  
    Agree but that doesn't exist in `master` anyway. Now the SBT build drives off the Maven build.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41804304
  
    @darose what about removing the library from the assembly entirely? so there is no copy in your app or in the deployed Spark jars? May not be a viable solution in general, but it may well work for you if it's picking up the jar from the Hadoop installation. Worth a shot?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41747427
  
    You can try adding jets3t 0.9 as a Maven dependency in your application, but unfortunately I think that goes after the Spark assembly JAR when running an app. In 1.0 there will be a setting to put the user's classpath first.
    
    It sounds like the Spark bundle for CDH needs to be updated with this; CCing @srowen.
    
    For this patch, we probably want to create a new Maven profile to use a new Jets3t when that's enabled.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41064211
  
    @mateiz It looks like it went to 0.8.1 in Hadoop 1.3.0 (https://issues.apache.org/jira/browse/HADOOP-8136) and 0.9.0 in 2.3.0 (https://issues.apache.org/jira/browse/HADOOP-9623)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jet3st version to 0.9.0

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-40976204
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41748362
  
    BTW the right way to do it would be to make hadoop-client have a Maven dependency on the right version of Jets3t. Then Spark would just build with the right version out of the box when it linked to the right Hadoop version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by CodingCat <gi...@git.apache.org>.

Github user CodingCat commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41639362
  
    I recovered the build files and updated the documents to indicate this situation for the user


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jet3st version to 0.9.0

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-40970391
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-42109781
  
    @witgo Hm, is there an example that comes up repeatedly? Is it ever intentional, or just some accident of someone's legacy deployment?  I don't know of a case of this, and it wouldn't come up with a distro or any semi-recent release of Hadoop, but maybe someone will say this comes up with the 1.x / 0.23.x lines somehow?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41059208
  
    In that case let's see exactly which Hadoop 2.x version bumped up the dependency, because I don't think 2.0 and 2.1 did it (could be wrong though).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41014340
  
    @mateiz I thought the same thing, that `hadoop-client` pulls this in, but it does not. Only things like `hadoop-hdfs`.
    
    I agree with updating the dependency, but to match the Hadoop version. So the 0.9.0 version belong in the Hadoop 2 profiles.
    
    (Also it should be a runtime scope dependency in Maven.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jet3st version to 0.9.0

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-40972304
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by darose <gi...@git.apache.org>.

Github user darose commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-42027309
  
    Man oh man, I cannot get this to work no way no how.  I tried rebuilding spark using the jets3t 0.9 jar, then tried rebuilding shark doing the same.  I keep getting a verify error - presumably because something in the call stack isn't compatible with the new jets3t version.  Anyone have any ideas/suggestions?  I'm at my wits' end on this.  Spent days, and still unable to get a working version of spark/shark running with CDH5.  Output below.
    
    ```
    14/05/02 06:34:14 WARN scheduler.TaskSetManager: Loss was due to java.lang.VerifyError
    java.lang.VerifyError: Bad type on operand stack
    Exception Details:
      Location:
        org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore.initialize(Ljava/net/URI;Lorg/apache/hadoop/conf/Configuration;)V @38: invokespecial
      Reason:
        Type 'org/jets3t/service/security/AWSCredentials' (current frame, stack[3]) is not assignable to 'org/jets3t/service/security/ProviderCredentials
    '
      Current Frame:
        bci: @38
        flags: { }
        locals: { 'org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore', 'java/net/URI', 'org/apache/hadoop/conf/Configuration', 'org/apache/hadoop
    /fs/s3/S3Credentials', 'org/jets3t/service/security/AWSCredentials' }
        stack: { 'org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore', uninitialized 32, uninitialized 32, 'org/jets3t/service/security/AWSCredent
    ials' }
      Bytecode:
        0000000: bb00 0259 b700 034e 2d2b 2cb6 0004 bb00
        0000010: 0559 2db6 0006 2db6 0007 b700 083a 042a
        0000020: bb00 0959 1904 b700 0ab5 000b a700 0b3a
        0000030: 042a 1904 b700 0d2a 2c12 0e03 b600 0fb5
        0000040: 0010 2a2c 1211 1400 12b6 0014 1400 15b8
        0000050: 0017 b500 182a 2c12 1914 0015 b600 1414
        0000060: 0015 b800 17b5 001a 2abb 001b 592b b600
        0000070: 1cb7 001d b500 1eb1                    
      Exception Handler Table:
        bci [14, 44] => handler: 47
      Stackmap Table:                                                                                                                          [344/1956]
        full_frame(@47,{Object[#176],Object[#177],Object[#178],Object[#179]},{Object[#180]})
        same_frame(@55)
    
            at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:280)
            at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:270)
            at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)
            at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
            at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
            at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
            at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
            at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
            at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:107)
            at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
            at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:156)
            at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149)
            at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64)
            at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
            at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
            at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
            at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
            at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
            at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
            at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
            at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
            at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
            at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
            at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
            at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
            at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
            at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
            at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
            at org.apache.spark.scheduler.Task.run(Task.scala:53)
            at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
            at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
            at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:415)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
            at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
            at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:744)
    
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jet3st version to 0.9.0

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-40970412
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-96611185
  
    @mag- if you're talking about what I think you are, it was a temporary thing that's long since gone already https://github.com/apache/spark/pull/629/files


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by darose <gi...@git.apache.org>.

Github user darose commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41841626
  
    FYI - I think I might have figured out why deleting the jets3t jar didn't fix the issue.  It looks like the spark build process bundles the jets3t classes into the spark assembly jar.  So I'm guessing that whacking the stand-alone jar file wouldn't fix the issue if there's still 0.7 classes bundled in another jar.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by darose <gi...@git.apache.org>.

Github user darose commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41804694
  
    Definitely worth a shot!  Will give that a try and report back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-42113320
  
    @srowen YARN version does need to be separate from hadoop version. Downstream consumers of our build sometimes do this. For instance, if they want to build against a custom HDFS distro (e.g. pivotal, IBM, or something) but want to link against the upstream apache YARN repo. It's not something we do in binaries we distribute but it would be good to support it.
    
    Think it's fine to remove hadoop.major.version - it seems unused.
    
    Adding fancy profile activation would also be nice, but I think that it's not necessary as an immediate fix. We can just say in the build doc that "you need special profiles for the following hadoop versions" and give a small table or list explaining which profiles to activate.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by CodingCat <gi...@git.apache.org>.

Github user CodingCat commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-42253192
  
    fixed in #629 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jet3st version to 0.9.0

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-40976205
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14299/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by witgo <gi...@git.apache.org>.

Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-42109604
  
    @srowen Not every one uses the same version of HDFS vs YARN.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-42096201
  
    @srowen if you'd like to take a crack at this by the way, please do. I'll probably look at it on Sunday if no one else has.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by darose <gi...@git.apache.org>.

Github user darose commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41732946
  
    Is there any way to apply this fix without a rebuild of spark?  E.g., to just replace jets3t-0.7.1.jar with jets3t-0.9.0.jar in a deployed spark package?  I'm running into this issue on a machine where I have the CDH5 hadoop and spark packages installed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41009471
  
    Unfortunately this will not work in older Hadoop versions as far as I know. Can you still build Spark against Hadoop 1.0.4 and run it with this change?
    
    It might be better to receive jets3t from Hadoop instead of depending on it ourselves. I'm not sure if hadoop-client depends on it...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jet3st version to 0.9.0

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-40974248
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14297/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by witgo <gi...@git.apache.org>.

Github user witgo commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-42110042
  
    @srowen  Related discussion in [PR 502](https://github.com/apache/spark/pull/502).
    @berngp Can you explain the reason of not using  the same version of HDFS vs YARN ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by berngp <gi...@git.apache.org>.

Github user berngp commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-42112284
  
    I think in general is an edge case but there are folks still using hdfs
    1.0.x with a different version of YARN, that said it is not my case.
    
    I like what you suggested in another PR where you reused the variable value
    of the hadoop.version to specify the yarn.version. Eg
    
    <yarn.version>$hadoop.version</yarn.version>
    
    Let me know if I should associate the small commits to specific PRs. Thanks
    again for following up on those commits.
    
    On Saturday, May 3, 2014, Guoqiang Li <no...@github.com> wrote:
    
    > @srowen <https://github.com/srowen> Related discussion in PR 502<https://github.com/apache/spark/pull/502>
    > .
    > @berngp <https://github.com/berngp> Can you explain the reason of not
    > using the same version of HDFS vs YARN ?
    >
    > —
    > Reply to this email directly or view it on GitHub<https://github.com/apache/spark/pull/468#issuecomment-42110042>
    > .
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by darose <gi...@git.apache.org>.

Github user darose commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41808390
  
    Sigh.  Was a promising idea, but no dice.  Even with the 0.7 jars out of the way, I'm still getting java.lang.NoClassDefFoundError: org/jets3t/service/S3ServiceException
            at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:280)
            at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:270)
    ...
            at shark.SharkCliDriver.main(SharkCliDriver.scala)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by CodingCat <gi...@git.apache.org>.

Github user CodingCat commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41747796
  
    @mateiz for @darose 's question, how about compile the application against a customized spark jar (with newer jets3t)? I think in that case, he does not need to restart the cluster?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by darose <gi...@git.apache.org>.

Github user darose commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-42070116
  
    I think I'm going to have to give up on getting Shark working on my existing CDH5 cluster right now.  I've tried everything I can think of (various binary releases, building both spark and shark myself against jets3t 0.9, various config tweaks, etc.) but I'm stuck at either the class not found error in https://issues.apache.org/jira/browse/SPARK-1556, or the verify error above.  I'll have to either wait until there's a new binary release, or look for an alternative.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41079837
  
    Sure, that would work. Please try it. Unfortunately I remember it having problems, but I could be wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41639352
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-42096004
  
    @srowen I'd prefer not to remove it from the dependency graph if possible because it will break local builds. The best solution I see is to add a profile for Hadoop 2.3 and 2.4. For now I'd be fine to just require users to manually trigger it and document this in `building-with-maven`. In SBT we can actually just insert logic in the build based on the Hadoop profile. I'm guessing we'll have to get into the habit of doing this, since it seems like Spark is good at finding bugs in Hadoop's dependency graph. We should probably start testing Spark against Hadoop RC's if they publish them to maven so we can give feedback.
    
    I don't quite understand why the hadoop-client library doesn't advertise jets3 specifically... if I write a Java application that opens an S3 FileSystem and reads and writes data, don't I need jets3 to do that (i.e if this is outside a MapReduce job)? Is this just a bug hadoop's dependencies?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by mag- <gi...@git.apache.org>.

Github user mag- commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-96642739
  
    Well:
    `val jets3tVersion = if ("^2\\.[3-9]+".r.findFirstIn(hadoopVersion).isDefined) "0.9.0" else "0.7.1"`
    It probably should be other way round, if hadoop version is lower than 2.3 we use 0.7.1
    Also someone needs to test it with hadoop 2.6/2.7 where s3 support was splitted to hadoop-aws.
    ( I'm thinking that mvn profile approach was maybe cleaner than this if/else... )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-42102935

@pwendell Before I begin can I propose a refactoring of profiles that will make this and similar issues easy to deal with? Probably it's for a different PR, but will probably make this and similar changes easy.

We need profiles to deal with this. Profiles can be triggered explicitly (e.g. `-Phadoop-2.3`) or by property values (`-Dhadoop.version=2.3.0`). It's necessary to have things like `hadoop.version` be customizable, so it would be nice to also trigger needed profiles from this. However, Maven lacks ability to trigger on a range of property values; you can trigger on a particular value like "2.3.0" but not "2.3.*" or "[2.3.0,2.4.0)" syntax.

So it seems necessary to use a series of named profiles. Those profiles can set default version values, and those versions can be overridden. For example, it's nice to have a `hadoop-2.3` profile set `hadoop.version=2.3.0` for you, even though that can still be overridden.

(The SBT build can shadow these changes.)

After reading over the build and docs, I propose the following:

- Introduce a `hadoop-2.3` profile, similar to `hadoop-0.23`, to encompass 2.3+-specific build changes, and one for `hadoop-2.2` as well (see later)
- `hadoop.major.version` appears to be unused -- remove it?
- I believe `yarn.version` can be removed; use `hadoop.version` in its place. Ideally these are always synced, no? All doc examples show `yarn.version` matching `hadoop.version` and the distribution script uses `SPARK_HADOOP_VERSION` for `yarn.version`. Now, the default Hadoop version is 1.0.4 and there is no such YARN version. But the `yarn-alpha` profile sets `hadoop.version=0.23.7` to match the default `yarn.version=0.23.7` anyway. It seems like Hadoop 1.x + YARN is not intended anyway, which seems corroborated by the build documentation.
- So, YARN-related profiles should not set `hadoop.version`, and in fact only serve to add the `yarn` child module

... and then the fix for this issue is trivial.

All of the build permutations listed in the documentation work under this new arrangement. Anyone want to see a PR or have objections?

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by CodingCat <gi...@git.apache.org>.

Github user CodingCat commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41804895
  
    Hi, @srowen, do you want to take over the patch? I'm concerning I cannot fix it in the following days, considering my schedule and my knowledge level on mvn and sbt?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jet3st version to 0.9.0

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-40974246
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by CodingCat <gi...@git.apache.org>.

Github user CodingCat commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41639298
  
    @mateiz you are right, I received the exception of ```java.lang.NoSuchMethodError: org.jets3t.service.impl.rest.httpclient.RestS3Service.<init>(Lorg/jets3t/service/security/AWSCredentials;)V" in both 
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41831614
  
    @CodingCat I can make a patch, but it will mean introducing a new profile like "hadoop230" that one has to enable when building for Hadoop 2.3.0. I always hate to add that complexity and hope someone has a better idea. But I'll propose the PR if a committer nods and says it's worth changing. 
    
    I imagine it won't be the last time the dependencies have to be fudged by Hadoop version -- isn't this already an existing issue with Avro anyway?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by darose <gi...@git.apache.org>.

Github user darose commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-96883761
  
    On 04/27/2015 07:11 AM, Sean Owen wrote:
    > @mag- if you're talking about what I think you are, it was a temporary thing that's long since gone already https://github.com/apache/spark/pull/629/files
    
    I think @srowen is correct.  A while back I upgraded to use a newer 
    version of Spark (and built it using the correct -Dhadoop.version= and 
    -Phadoop- flags) and the problem went away.
    
    DR
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by mag- <gi...@git.apache.org>.

Github user mag- commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-96587264
  
    Are you aware that all this regexp hacks will break when hadoop changes version to 3.0.0?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/468#issuecomment-41640655
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---