You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Albert Yoon <yo...@kanizsalab.com> on 2015/07/23 02:58:39 UTC

Local class incompatible?

Hi, I have a test-running zeppelin connected to my spark cluster and cluster setup like this:
 
Spark: spark-1.4.1-bin-hadoop2.6
Hadoop: hadoop-2.7.1
 
Zeppelin installed as: mvn clean install -Pspark-1.4 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests
 
and ran a test code like this:
 
val textFile = sc.textFile("hdfs://analytics-master:9000/user/kanizsa.lab/gutenberg")
print textFile.count()
 
val textFile = sc.textFile("README.md")
print textFile.count()
 
this code works well with scala interpreter but I'd encountered error when executing similar code as %pyspark.
I'd also ran a code in pyspark command line shell but I works like a charm. no error like below:
 
How to resolve this error? It seems something inside of zeppelin not compatible with my spark cluster...
 
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 5.0 failed 4 times, most recent failure: Lost task 1.3 in stage 5.0 (TID 52, 10.32.10.97): java.io.InvalidClassException: org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437
	at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69)
	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
	at org.apache.spark.scheduler.Task.run(Task.scala:70)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
	at scala.Option.foreach(Option.scala:236)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

(&lt;class 'py4j.protocol.Py4JJavaError'&gt;, Py4JJavaError(u'An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.\n', JavaObject id=o93), &lt;traceback object at 0x1bde290&gt;) 

Re: Local class incompatible?

Posted by moon soo Lee <mo...@apache.org>.
Hi,

Could you try build Zeppelin with -Dspark.version=1.4.1 ?

mvn clean install -Pspark-1.4 -Dspark.version=1.4.1 -Ppyspark
-Dhadoop.version=2.6.0
-Phadoop-2.6 -DskipTests

Thanks,
moon


On Fri, Jul 24, 2015 at 1:45 PM Albert Yoon <yo...@kanizsalab.com> wrote:

> Hi,
>
> Although SPARK_HOME correctly set and build Z with -Dpyspark I still
> experience local class not compatible error when I running any pyspark in
> Z. Job was correctly send and processed on cluster but the error seems
> thrown after final stage (Python RDD deserialization?) I followed
> instruction from Jongyul Lee but it still not effective at all. Any
> workarounds?
> -----Original Message-----
> 보낸사람:"Albert Yoon" <yo...@kanizsalab.com>
> 받는사람:"Jongyoul Lee" <jo...@gmail.com>
> 날짜: 2015.07.23 오후 02:52:13
> 제목: Re: Local class incompatible?
>
>  Hi,
>
>
>
> I'd build Z with pyspark as like below after fresh git clone:
>
>
>
> mvn clean install -Pspark-1.4 -Ppyspark -Dhadoop.version=2.6.0
> -Phadoop-2.6 -DskipTests
>
> mvn clean install -Pspark-1.4 -Ppyspark -Dhadoop.version=2.7.0
> -Phadoop-2.6 -DskipTests
>
>
>
> and then setup environment values
>
>
>
> master spark://analytics-master:7077
>
>
>
> spark.home /home/kanizsa.lab/spark/spark-1.4.1-bin-hadoop2.6
>
>
>
> but all attempts give me the same error when using pyspark:
>
>
>
> Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.collectAndServe. :
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 0.0 (TID 9, 10.32.10.97): java.io.InvalidClassException:
> org.apache.spark.api.python.PythonRDD; local class incompatible: stream
> classdesc serialVersionUID = 1521627685947625661, local class
> serialVersionUID = -2629548733386970437
>
> at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621) at
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623) at
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69)
> at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58) at
> org.apache.spark.scheduler.Task.run(Task.scala:70) at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
> Exception in Z interpreter Log file shown like below:
>
>
>
>  INFO [2015-07-23 14:43:15,127]
> ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59)
> - Starting task 0.0 in stage 6.0 (TID 606, 10.32.10.97, ANY, 1567 bytes)
>
>  INFO [2015-07-23 14:43:15,128]
> ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59)
> - Starting task 1.0 in stage 6.0 (TID 607, 10.32.10.97, ANY, 1567 bytes)
>
>  INFO [2015-07-23 14:43:15,154]
> ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59)
> - Added broadcast_7_piece0 in memory on 10.32.10.97:39254 (size: 3.5 KB,
> free: 265.1 MB)
>
>  INFO [2015-07-23 14:43:15,155]
> ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59)
> - Added broadcast_7_piece0 in memory on 10.32.10.97:46622 (size: 3.5 KB,
> free: 265.1 MB)
>
>  INFO [2015-07-23 14:43:15,173]
> ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59)
> - Starting task 2.0 in stage 6.0 (TID 608, 10.32.10.97, ANY, 1567 bytes)
>
>  WARN [2015-07-23 14:43:15,178] ({task-result-getter-2}
> Logging.scala[logWarning]:71) - Lost task 1.0 in stage 6.0 (TID 607,
> 10.32.10.97): java.io.InvalidClassException:
> org.apache.spark.api.python.PythonRDD; local class incompatible: stream
> classdesc serialVersionUID = 1521627685947625661, local class
> serialVersionUID = -2629548733386970437
>
> at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621)
>
> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
>
> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
>
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
>
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
>
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
>
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
>
> at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69)
>
> at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95)
>
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
>
> at org.apache.spark.scheduler.Task.run(Task.scala:70)
>
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>
> at java.lang.Thread.run(Thread.java:745)
>
>
>
>  INFO [2015-07-23 14:43:15,179]
> ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59)
> - Starting task 1.1 in stage 6.0 (TID 609, 10.32.10.97, ANY, 1567 bytes)
>
>  INFO [2015-07-23 14:43:15,180] ({task-result-getter-1}
> Logging.scala[logInfo]:59) - Lost task 0.0 in stage 6.0 (TID 606) on
> executor 10.32.10.97: java.io.InvalidClassException
> (org.apache.spark.api.python.PythonRDD; local class incompatible: stream
> classdesc serialVersionUID = 1521627685947625661, local class
> serialVersionUID = -2629548733386970437) [duplicate 1]
>
>  INFO [2015-07-23 14:43:15,181]
> ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59)
> - Starting task 0.1 in stage 6.0 (TID 610, 10.32.10.97, ANY, 1567 bytes)
>
>  INFO [2015-07-23 14:43:15,181] ({task-result-getter-3}
> Logging.scala[logInfo]:59) - Lost task 2.0 in stage 6.0 (TID 608) on
> executor 10.32.10.97: java.io.InvalidClassException
> (org.apache.spark.api.python.PythonRDD; local class incompatible: stream
> classdesc serialVersionUID = 1521627685947625661, local class
> serialVersionUID = -2629548733386970437) [duplicate 2]
>
>  INFO [2015-07-23 14:43:15,182]
> ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59)
> - Starting task 2.1 in stage 6.0 (TID 611, 10.32.10.97, ANY, 1567 bytes)
>
>  INFO [2015-07-23 14:43:15,183] ({task-result-getter-0}
> Logging.scala[logInfo]:59) - Lost task 1.1 in stage 6.0 (TID 609) on
> executor 10.32.10.97: java.io.InvalidClassException
> (org.apache.spark.api.python.PythonRDD; local class incompatible: stream
> classdesc serialVersionUID = 1521627685947625661, local class
> serialVersionUID = -2629548733386970437) [duplicate 3]
>
>  INFO [2015-07-23 14:43:15,185]
> ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59)
> - Starting task 1.2 in stage 6.0 (TID 612, 10.32.10.97, ANY, 1567 bytes)
>
>  INFO [2015-07-23 14:43:15,185] ({task-result-getter-2}
> Logging.scala[logInfo]:59) - Lost task 0.1 in stage 6.0 (TID 610) on
> executor 10.32.10.97: java.io.InvalidClassException
> (org.apache.spark.api.python.PythonRDD; local class incompatible: stream
> classdesc serialVersionUID = 1521627685947625661, local class
> serialVersionUID = -2629548733386970437) [duplicate 4]
>
>  INFO [2015-07-23 14:43:15,186]
> ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59)
> - Starting task 0.2 in stage 6.0 (TID 613, 10.32.10.97, ANY, 1567 bytes)
>
>  INFO [2015-07-23 14:43:15,187] ({task-result-getter-1}
> Logging.scala[logInfo]:59) - Lost task 2.1 in stage 6.0 (TID 611) on
> executor 10.32.10.97: java.io.InvalidClassException
> (org.apache.spark.api.python.PythonRDD; local class incompatible: stream
> classdesc serialVersionUID = 1521627685947625661, local class
> serialVersionUID = -2629548733386970437) [duplicate 5]
>
>  INFO [2015-07-23 14:43:15,189]
> ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59)
> - Starting task 2.2 in stage 6.0 (TID 614, 10.32.10.97, ANY, 1567 bytes)
>
>  INFO [2015-07-23 14:43:15,189] ({task-result-getter-3}
> Logging.scala[logInfo]:59) - Lost task 1.2 in stage 6.0 (TID 612) on
> executor 10.32.10.97: java.io.InvalidClassException
> (org.apache.spark.api.python.PythonRDD; local class incompatible: stream
> classdesc serialVersionUID = 1521627685947625661, local class
> serialVersionUID = -2629548733386970437) [duplicate 6]
>
>  INFO [2015-07-23 14:43:15,190]
> ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59)
> - Starting task 1.3 in stage 6.0 (TID 615, 10.32.10.97, ANY, 1567 bytes)
>
>  INFO [2015-07-23 14:43:15,190] ({task-result-getter-0}
> Logging.scala[logInfo]:59) - Lost task 0.2 in stage 6.0 (TID 613) on
> executor 10.32.10.97: java.io.InvalidClassException
> (org.apache.spark.api.python.PythonRDD; local class incompatible: stream
> classdesc serialVersionUID = 1521627685947625661, local class
> serialVersionUID = -2629548733386970437) [duplicate 7]
>
>  INFO [2015-07-23 14:43:15,192]
> ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59)
> - Starting task 0.3 in stage 6.0 (TID 616, 10.32.10.97, ANY, 1567 bytes)
>
>  INFO [2015-07-23 14:43:15,192] ({task-result-getter-2}
> Logging.scala[logInfo]:59) - Lost task 2.2 in stage 6.0 (TID 614) on
> executor 10.32.10.97: java.io.InvalidClassException
> (org.apache.spark.api.python.PythonRDD; local class incompatible: stream
> classdesc serialVersionUID = 1521627685947625661, local class
> serialVersionUID = -2629548733386970437) [duplicate 8]
>
>  INFO [2015-07-23 14:43:15,193]
> ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59)
> - Starting task 2.3 in stage 6.0 (TID 617, 10.32.10.97, ANY, 1567 bytes)
>
>  INFO [2015-07-23 14:43:15,193] ({task-result-getter-1}
> Logging.scala[logInfo]:59) - Lost task 1.3 in stage 6.0 (TID 615) on
> executor 10.32.10.97: java.io.InvalidClassException
> (org.apache.spark.api.python.PythonRDD; local class incompatible: stream
> classdesc serialVersionUID = 1521627685947625661, local class
> serialVersionUID = -2629548733386970437) [duplicate 9]
>
> ERROR [2015-07-23 14:43:15,194] ({task-result-getter-1}
> Logging.scala[logError]:75) - Task 1 in stage 6.0 failed 4 times; aborting
> job
>
>  INFO [2015-07-23 14:43:15,197] ({task-result-getter-3}
> Logging.scala[logInfo]:59) - Lost task 0.3 in stage 6.0 (TID 616) on
> executor 10.32.10.97: java.io.InvalidClassException
> (org.apache.spark.api.python.PythonRDD; local class incompatible: stream
> classdesc serialVersionUID = 1521627685947625661, local class
> serialVersionUID = -2629548733386970437) [duplicate 10]
>
>  INFO [2015-07-23 14:43:15,202] ({task-result-getter-0}
> Logging.scala[logInfo]:59) - Lost task 2.3 in stage 6.0 (TID 617) on
> executor 10.32.10.97: java.io.InvalidClassException
> (org.apache.spark.api.python.PythonRDD; local class incompatible: stream
> classdesc serialVersionUID = 1521627685947625661, local class
> serialVersionUID = -2629548733386970437) [duplicate 11]
>
>  INFO [2015-07-23 14:43:15,202] ({task-result-getter-0}
> Logging.scala[logInfo]:59) - Removed TaskSet 6.0, whose tasks have all
> completed, from pool default
>
>  INFO [2015-07-23 14:43:15,203] ({dag-scheduler-event-loop}
> Logging.scala[logInfo]:59) - Cancelling stage 6
>
>  INFO [2015-07-23 14:43:15,205] ({dag-scheduler-event-loop}
> Logging.scala[logInfo]:59) - ResultStage 6 (count at <string>:2) failed in
> 0.078 s
>
>  INFO [2015-07-23 14:43:15,206] ({Thread-58} Logging.scala[logInfo]:59) -
> Job 3 failed: count at <string>:2, took 0.093380 s
>
>  INFO [2015-07-23 14:43:15,216] ({pool-2-thread-4}
> SchedulerFactory.java[jobFinished]:138) - Job
> remoteInterpretJob_1437630194249 finished by scheduler
> interpreter_1373012994
>
>
>
>
>
> -----Original Message-----
> *From:* "Jongyoul Lee"<jo...@gmail.com>
> *To:* "users@zeppelin.incubator.apache.org"<
> users@zeppelin.incubator.apache.org>; "Albert Yoon"<yo...@kanizsalab.com>;
>
> *Cc:*
> *Sent:* 2015-07-23 (목) 10:39:03
> *Subject:* Re: Local class incompatible?
>
> Hi,
>
> You should add -Ppyspark when you want to use Z with pyspark.
>
> Regards,
> JL
>
> On Thursday, July 23, 2015, Albert Yoon <yo...@kanizsalab.com> wrote:
>
> Hi, I have a test-running zeppelin connected to my spark cluster and
> cluster setup like this:
>
>
>
> *Spark*: spark-1.4.1-bin-hadoop2.6
>
> *Hadoop*: hadoop-2.7.1
>
>
>
> *Zeppelin installed as*: mvn clean install -Pspark-1.4
> -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests
>
>
>
> and ran a test code like this:
>
>
>
> val textFile =
> sc.textFile("hdfs://analytics-master:9000/user/kanizsa.lab/gutenberg")
>
> print textFile.count()
>
>
>
> val textFile = sc.textFile("README.md")
>
> print textFile.count()
>
>
>
> this code works well with scala interpreter but I'd encountered error
> when executing similar code as %pyspark.
>
> I'd also ran a code in pyspark command line shell but I works like a
> charm. no error like below:
>
>
>
> How to resolve this error? It seems something inside of zeppelin not
> compatible with my spark cluster...
>
>
>
> Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.collectAndServe. :
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 1
> in stage 5.0 failed 4 times, most recent failure: Lost task 1.3 in stage
> 5.0 (TID 52, 10.32.10.97): java.io.InvalidClassException:
> org.apache.spark.api.python.PythonRDD; local class incompatible: stream
> classdesc serialVersionUID = 1521627685947625661, local class
> serialVersionUID = -2629548733386970437 at
> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621) at
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623) at
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69)
> at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58) at
> org.apache.spark.scheduler.Task.run(Task.scala:70) at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
> at scala.Option.foreach(Option.scala:236) at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) (<class
> 'py4j.protocol.Py4JJavaError'>, Py4JJavaError(u'An error occurred while
> calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.\n',
> JavaObject id=o93), <traceback object at 0x1bde290>)
>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>
>

Re: Local class incompatible?

Posted by Albert Yoon <yo...@kanizsalab.com>.
Hi,
Although SPARK_HOME correctly set and build Z with -Dpyspark I still experience local class not compatible error when I running any pyspark in Z. Job was correctly send and processed on cluster but the error seems thrown after final stage (Python RDD deserialization?) I followed instruction from Jongyul Lee but it still not effective at all. Any workarounds?

-----Original Message-----
보낸사람:"Albert Yoon" &lt;yoonjs2@kanizsalab.com&gt;
받는사람:"Jongyoul Lee" &lt;jongyoul@gmail.com&gt;
날짜: 2015.07.23 오후 02:52:13
제목: Re: Local class incompatible?



 Hi,
 
I'd build Z with pyspark as like below after fresh git clone:
 
mvn clean install -Pspark-1.4 -Ppyspark -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests
mvn clean install -Pspark-1.4 -Ppyspark -Dhadoop.version=2.7.0 -Phadoop-2.6 -DskipTests 
 
and then setup environment values
 
master	spark://analytics-master:7077
 
spark.home	/home/kanizsa.lab/spark/spark-1.4.1-bin-hadoop2.6
 
but all attempts give me the same error when using pyspark: 
 
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 9, 10.32.10.97): java.io.InvalidClassException: org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437 
	at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69)
	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
	at org.apache.spark.scheduler.Task.run(Task.scala:70)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745) 
 
Exception in Z interpreter Log file shown like below:
 
 INFO [2015-07-23 14:43:15,127] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 0.0 in stage 6.0 (TID 606, 10.32.10.97, ANY, 1567 bytes)
 INFO [2015-07-23 14:43:15,128] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 1.0 in stage 6.0 (TID 607, 10.32.10.97, ANY, 1567 bytes)
 INFO [2015-07-23 14:43:15,154] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Added broadcast_7_piece0 in memory on 10.32.10.97:39254 (size: 3.5 KB, free: 265.1 MB)
 INFO [2015-07-23 14:43:15,155] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Added broadcast_7_piece0 in memory on 10.32.10.97:46622 (size: 3.5 KB, free: 265.1 MB)
 INFO [2015-07-23 14:43:15,173] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 2.0 in stage 6.0 (TID 608, 10.32.10.97, ANY, 1567 bytes)
 WARN [2015-07-23 14:43:15,178] ({task-result-getter-2} Logging.scala[logWarning]:71) - Lost task 1.0 in stage 6.0 (TID 607, 10.32.10.97): java.io.InvalidClassException: org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437
	at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69)
	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
	at org.apache.spark.scheduler.Task.run(Task.scala:70)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
 
 INFO [2015-07-23 14:43:15,179] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 1.1 in stage 6.0 (TID 609, 10.32.10.97, ANY, 1567 bytes)
 INFO [2015-07-23 14:43:15,180] ({task-result-getter-1} Logging.scala[logInfo]:59) - Lost task 0.0 in stage 6.0 (TID 606) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 1]
 INFO [2015-07-23 14:43:15,181] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 0.1 in stage 6.0 (TID 610, 10.32.10.97, ANY, 1567 bytes)
 INFO [2015-07-23 14:43:15,181] ({task-result-getter-3} Logging.scala[logInfo]:59) - Lost task 2.0 in stage 6.0 (TID 608) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 2]
 INFO [2015-07-23 14:43:15,182] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 2.1 in stage 6.0 (TID 611, 10.32.10.97, ANY, 1567 bytes)
 INFO [2015-07-23 14:43:15,183] ({task-result-getter-0} Logging.scala[logInfo]:59) - Lost task 1.1 in stage 6.0 (TID 609) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 3]
 INFO [2015-07-23 14:43:15,185] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 1.2 in stage 6.0 (TID 612, 10.32.10.97, ANY, 1567 bytes)
 INFO [2015-07-23 14:43:15,185] ({task-result-getter-2} Logging.scala[logInfo]:59) - Lost task 0.1 in stage 6.0 (TID 610) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 4]
 INFO [2015-07-23 14:43:15,186] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 0.2 in stage 6.0 (TID 613, 10.32.10.97, ANY, 1567 bytes)
 INFO [2015-07-23 14:43:15,187] ({task-result-getter-1} Logging.scala[logInfo]:59) - Lost task 2.1 in stage 6.0 (TID 611) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 5]
 INFO [2015-07-23 14:43:15,189] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 2.2 in stage 6.0 (TID 614, 10.32.10.97, ANY, 1567 bytes)
 INFO [2015-07-23 14:43:15,189] ({task-result-getter-3} Logging.scala[logInfo]:59) - Lost task 1.2 in stage 6.0 (TID 612) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 6]
 INFO [2015-07-23 14:43:15,190] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 1.3 in stage 6.0 (TID 615, 10.32.10.97, ANY, 1567 bytes)
 INFO [2015-07-23 14:43:15,190] ({task-result-getter-0} Logging.scala[logInfo]:59) - Lost task 0.2 in stage 6.0 (TID 613) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 7]
 INFO [2015-07-23 14:43:15,192] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 0.3 in stage 6.0 (TID 616, 10.32.10.97, ANY, 1567 bytes)
 INFO [2015-07-23 14:43:15,192] ({task-result-getter-2} Logging.scala[logInfo]:59) - Lost task 2.2 in stage 6.0 (TID 614) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 8]
 INFO [2015-07-23 14:43:15,193] ({sparkDriver-akka.actor.default-dispatcher-19} Logging.scala[logInfo]:59) - Starting task 2.3 in stage 6.0 (TID 617, 10.32.10.97, ANY, 1567 bytes)
 INFO [2015-07-23 14:43:15,193] ({task-result-getter-1} Logging.scala[logInfo]:59) - Lost task 1.3 in stage 6.0 (TID 615) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 9]
ERROR [2015-07-23 14:43:15,194] ({task-result-getter-1} Logging.scala[logError]:75) - Task 1 in stage 6.0 failed 4 times; aborting job
 INFO [2015-07-23 14:43:15,197] ({task-result-getter-3} Logging.scala[logInfo]:59) - Lost task 0.3 in stage 6.0 (TID 616) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 10]
 INFO [2015-07-23 14:43:15,202] ({task-result-getter-0} Logging.scala[logInfo]:59) - Lost task 2.3 in stage 6.0 (TID 617) on executor 10.32.10.97: java.io.InvalidClassException (org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437) [duplicate 11]
 INFO [2015-07-23 14:43:15,202] ({task-result-getter-0} Logging.scala[logInfo]:59) - Removed TaskSet 6.0, whose tasks have all completed, from pool default
 INFO [2015-07-23 14:43:15,203] ({dag-scheduler-event-loop} Logging.scala[logInfo]:59) - Cancelling stage 6
 INFO [2015-07-23 14:43:15,205] ({dag-scheduler-event-loop} Logging.scala[logInfo]:59) - ResultStage 6 (count at &lt;string&gt;:2) failed in 0.078 s
 INFO [2015-07-23 14:43:15,206] ({Thread-58} Logging.scala[logInfo]:59) - Job 3 failed: count at &lt;string&gt;:2, took 0.093380 s
 INFO [2015-07-23 14:43:15,216] ({pool-2-thread-4} SchedulerFactory.java[jobFinished]:138) - Job remoteInterpretJob_1437630194249 finished by scheduler interpreter_1373012994
 
 
-----Original Message-----
From: "Jongyoul Lee"&lt;jongyoul@gmail.com&gt; 
To: "users@zeppelin.incubator.apache.org"&lt;users@zeppelin.incubator.apache.org&gt;; "Albert Yoon"&lt;yoonjs2@kanizsalab.com&gt;; 
Cc: 
Sent: 2015-07-23 (목) 10:39:03
Subject: Re: Local class incompatible?
 
Hi, You should add -Ppyspark when you want to use Z with pyspark. Regards,JL

On Thursday, July 23, 2015, Albert Yoon &lt;yoonjs2@kanizsalab.com&gt; wrote:
Hi, I have a test-running zeppelin connected to my spark cluster and cluster setup like this:
 
Spark: spark-1.4.1-bin-hadoop2.6
Hadoop: hadoop-2.7.1
 
Zeppelin installed as: mvn clean install -Pspark-1.4 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests
 
and ran a test code like this:
 
val textFile = sc.textFile("hdfs://analytics-master:9000/user/kanizsa.lab/gutenberg")
print textFile.count()
 
val textFile = sc.textFile("README.md")
print textFile.count()
 
this code works well with scala interpreter but I'd encountered error when executing similar code as %pyspark.
I'd also ran a code in pyspark command line shell but I works like a charm. no error like below:
 
How to resolve this error? It seems something inside of zeppelin not compatible with my spark cluster...
 
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 5.0 failed 4 times, most recent failure: Lost task 1.3 in stage 5.0 (TID 52, 10.32.10.97): java.io.InvalidClassException: org.apache.spark.api.python.PythonRDD; local class incompatible: stream classdesc serialVersionUID = 1521627685947625661, local class serialVersionUID = -2629548733386970437
	at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69)
	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
	at org.apache.spark.scheduler.Task.run(Task.scala:70)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
	at scala.Option.foreach(Option.scala:236)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

(&lt;class 'py4j.protocol.Py4JJavaError'&gt;, Py4JJavaError(u'An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.\n', JavaObject id=o93), &lt;traceback object at 0x1bde290&gt;) 



-- 
이종열, Jongyoul Lee, 李宗烈 
http://madeng.net 






Re: Local class incompatible?

Posted by Jongyoul Lee <jo...@gmail.com>.
Hi,

You should add -Ppyspark when you want to use Z with pyspark.

Regards,
JL

On Thursday, July 23, 2015, Albert Yoon <yo...@kanizsalab.com> wrote:

> Hi, I have a test-running zeppelin connected to my spark cluster and
> cluster setup like this:
>
>
>
> *Spark*: spark-1.4.1-bin-hadoop2.6
>
> *Hadoop*: hadoop-2.7.1
>
>
>
> *Zeppelin installed as*: mvn clean install -Pspark-1.4
> -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests
>
>
>
> and ran a test code like this:
>
>
>
> val textFile =
> sc.textFile("hdfs://analytics-master:9000/user/kanizsa.lab/gutenberg")
>
> print textFile.count()
>
>
>
> val textFile = sc.textFile("README.md")
>
> print textFile.count()
>
>
>
> this code works well with scala interpreter but I'd encountered error
> when executing similar code as %pyspark.
>
> I'd also ran a code in pyspark command line shell but I works like a
> charm. no error like below:
>
>
>
> How to resolve this error? It seems something inside of zeppelin not
> compatible with my spark cluster...
>
>
>
> Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.collectAndServe. :
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 1
> in stage 5.0 failed 4 times, most recent failure: Lost task 1.3 in stage
> 5.0 (TID 52, 10.32.10.97): java.io.InvalidClassException:
> org.apache.spark.api.python.PythonRDD; local class incompatible: stream
> classdesc serialVersionUID = 1521627685947625661, local class
> serialVersionUID = -2629548733386970437 at
> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621) at
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623) at
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69)
> at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58) at
> org.apache.spark.scheduler.Task.run(Task.scala:70) at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
> at scala.Option.foreach(Option.scala:236) at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) (<class
> 'py4j.protocol.Py4JJavaError'>, Py4JJavaError(u'An error occurred while
> calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.\n',
> JavaObject id=o93), <traceback object at 0x1bde290>)
>


-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net