You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marc Reichman (JIRA)" <ji...@apache.org> on 2015/05/04 15:34:06 UTC

[jira] [Comment Edited] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data

    [ https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526627#comment-14526627 ] 

Marc Reichman edited comment on SPARK-1867 at 5/4/15 1:33 PM:
--------------------------------------------------------------

I'm running into the same problem, and have yet to see it resolved. My driver and workers are running 1.7.0_71, and I'm building it using 1.7.0_71 as well (albeit on a Windows machine but that shouldn't matter I hope!). I'm using the spark-provided 1.3.1hadoop2.6 bundle for runtime, and the maven spark-core to build. Both sets of spark components appear to be build with 1.6.0_30.

I'm using the AccumuloInputFormat with the new API hadoop RDD method. My Key.class and Value.class from Accumulo are on the KryoSerializer registration list.

I do NOT run into this issue if I run with a local execution, but I do run into it when submitting with spark-submit for YARN or the spark master. The trace is similar to the previous comment, with the ObjectInputStream steps. It smells like an issue serializing either the driver class or a closure inside it. I'm currently working to make sure all my versions are lined up as they should be (double checking). I'm holding off trying to build everything by hand with my 1.7.0_71 JDK but I will probably try that later if I can't resolve it otherwise.

No CDH in any way involved, my hadoop build is the binary 2.6.0 from apache.


was (Author: marcreichman):
I'm running into the same problem, and have yet to see it resolved. My driver and workers are running 1.7.0_71, and I'm building it using 1.7.0_71 as well (albeit on a Windows machine but that shouldn't matter I hope!). I'm using the spark-provided 1.3.1hadoop2.6 bundle for runtime, and the maven spark-core to build. Both sets of spark components appear to be build with 1.6.0_30.

I do NOT run into this issue if I run with a local execution, but I do run into it when submitting with spark-submit for YARN or the spark master. The trace is similar to the previous comment, with the ObjectInputStream steps. It smells like an issue serializing either the driver class or a closure inside it. I'm currently working to make sure all my versions are lined up as they should be (double checking). I'm holding off trying to build everything by hand with my 1.7.0_71 JDK but I will probably try that later if I can't resolve it otherwise.

No CDH in any way involved, my hadoop build is the binary 2.6.0 from apache.

> Spark Documentation Error causes java.lang.IllegalStateException: unread block data
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-1867
>                 URL: https://issues.apache.org/jira/browse/SPARK-1867
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: sam
>
> I've employed two System Administrators on a contract basis (for quite a bit of money), and both contractors have independently hit the following exception.  What we are doing is:
> 1. Installing Spark 0.9.1 according to the documentation on the website, along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs.
> 2. Building a fat jar with a Spark app with sbt then trying to run it on the cluster
> I've also included code snippets, and sbt deps at the bottom.
> When I've Googled this, there seems to be two somewhat vague responses:
> a) Mismatching spark versions on nodes/user code
> b) Need to add more jars to the SparkConf
> Now I know that (b) is not the problem having successfully run the same code on other clusters while only including one jar (it's a fat jar).
> But I have no idea how to check for (a) - it appears Spark doesn't have any version checks or anything - it would be nice if it checked versions and threw a "mismatching version exception: you have user code using version X and node Y has version Z".
> I would be very grateful for advice on this.
> The exception:
> Exception in thread "main" org.apache.spark.SparkException: Job aborted: Task 0.0:1 failed 32 times (most recent failure: Exception failure: java.lang.IllegalStateException: unread block data)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
> 	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> 	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> 	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> 	at scala.Option.foreach(Option.scala:236)
> 	at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
> 	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> 	at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> 	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> 	at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> 	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> 	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> 	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> 	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> 	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 14/05/16 18:05:31 INFO scheduler.TaskSetManager: Loss was due to java.lang.IllegalStateException: unread block data [duplicate 59]
> My code snippet:
> val conf = new SparkConf()
>                .setMaster(clusterMaster)
>                .setAppName(appName)
>                .setSparkHome(sparkHome)
>                .setJars(SparkContext.jarOfClass(this.getClass))
> println("count = " + new SparkContext(conf).textFile(someHdfsPath).count())
> My SBT dependencies:
> // relevant
> "org.apache.spark" % "spark-core_2.10" % "0.9.1",
> "org.apache.hadoop" % "hadoop-client" % "2.3.0-mr1-cdh5.0.0",
> // standard, probably unrelated
> "com.github.seratch" %% "awscala" % "[0.2,)",
> "org.scalacheck" %% "scalacheck" % "1.10.1" % "test",
> "org.specs2" %% "specs2" % "1.14" % "test",
> "org.scala-lang" % "scala-reflect" % "2.10.3",
> "org.scalaz" %% "scalaz-core" % "7.0.5",
> "net.minidev" % "json-smart" % "1.2"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org