You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2014/10/02 12:43:33 UTC

[jira] [Commented] (SPARK-3764) Invalid dependencies of artifacts in Maven Central Repository.

    [ https://issues.apache.org/jira/browse/SPARK-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156311#comment-14156311 ] 

Sean Owen commented on SPARK-3764:
----------------------------------

This is correct and as intended. Without any additional flags, yes, the version of Hadoop referenced by Spark would be 1.0.4. You should not rely on this though. If your app uses Spark but not Hadoop, it's not relevant as you are not packaging Spark or Hadoop dependencies in your app. If you use Spark and Hadoop APIs, you need to explicitly depend on the version of Hadoop you use on your cluster (but still not bundle with your app).

> Invalid dependencies of artifacts in Maven Central Repository.
> --------------------------------------------------------------
>
>                 Key: SPARK-3764
>                 URL: https://issues.apache.org/jira/browse/SPARK-3764
>             Project: Spark
>          Issue Type: Bug
>          Components: Build
>    Affects Versions: 1.1.0
>            Reporter: Takuya Ueshin
>
> While testing my spark applications locally using spark artifacts downloaded from Maven Central, the following exception was thrown:
> {quote}
> ERROR executor.ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-2,5,main]
> java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
> 	at org.apache.spark.sql.parquet.AppendingParquetOutputFormat.getDefaultWorkFile(ParquetTableOperations.scala:334)
> 	at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:251)
> 	at org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:300)
> 	at org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
> 	at org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:318)
> 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
> 	at org.apache.spark.scheduler.Task.run(Task.scala:54)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {quote}
> This is because the hadoop class {{TaskAttemptContext}} is incompatible between hadoop-1 and hadoop-2.
> I guess the spark artifacts in Maven Central were built against hadoop-2 with Maven, but the depending version of hadoop in {{pom.xml}} remains 1.0.4, so the hadoop version mismatch is happend.
> FYI:
> sbt seems to publish 'effective pom'-like pom file, so the dependencies are correctly resolved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org