You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Oleg Ruchovets <or...@gmail.com> on 2014/09/04 18:25:51 UTC

2 python installations cause PySpark on Yarn problem

Hi  ,
   I am  evaluating  the PySpark.
I have hdp hortonworks installed with python 2.6.6. (I can't remove it
since it is used by hortonworks). I can successfully execute PySpark on
Yarn.

We need to use Anaconda packages , so I install anaconda. Anaconda is
installed with python 2.7.7 and it is added to classpath. After installing
the anaconda Pi example stops to work - I used it for testing PySpark on
Yarn.

Question:
   How PySpark the can be used with having 2 Python versions on one
machine. In classpath I have 2.7.7 on every machine.

How can I check what version is used in runtime executing PySpark 2.7.7?

Exception I get are the same as in previous emails:

[root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
./bin/spark-submit --master yarn  --num-executors 3  --driver-memory
4g --executor-memory 2g --executor-cores 1
examples/src/main/python/pi.py   1000
/usr/jdk64/jdk1.7.0_45/bin/java
::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-
563.jar:/etc/hadoop/conf
-XX:MaxPermSize=128m -Djava.library.path= -Xms4g -Xmx4g
14/09/04 12:53:11 INFO spark.SecurityManager: Changing view acls to: root
14/09/04 12:53:11 INFO spark.SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view
permissions: Set(root)
14/09/04 12:53:12 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/09/04 12:53:12 INFO Remoting: Starting remoting
14/09/04 12:53:12 INFO Remoting: Remoting started; listening on
addresses :[akka.tcp://spark@HDOP-B.AGT:45747]
14/09/04 12:53:12 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://spark@HDOP-B.AGT:45747]
14/09/04 12:53:12 INFO spark.SparkEnv: Registering MapOutputTracker
14/09/04 12:53:12 INFO spark.SparkEnv: Registering BlockManagerMaster
14/09/04 12:53:12 INFO storage.DiskBlockManager: Created local
directory at /tmp/spark-local-20140904125312-c7ea
14/09/04 12:53:12 INFO storage.MemoryStore: MemoryStore started with
capacity 2.3 GB.
14/09/04 12:53:12 INFO network.ConnectionManager: Bound socket to port
37363 with id = ConnectionManagerId(HDOP-B.AGT,37363)
14/09/04 12:53:12 INFO storage.BlockManagerMaster: Trying to register
BlockManager
14/09/04 12:53:12 INFO storage.BlockManagerInfo: Registering block
manager HDOP-B.AGT:37363 with 2.3 GB RAM
14/09/04 12:53:12 INFO storage.BlockManagerMaster: Registered BlockManager
14/09/04 12:53:12 INFO spark.HttpServer: Starting HTTP Server
14/09/04 12:53:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/09/04 12:53:12 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:33547
14/09/04 12:53:12 INFO broadcast.HttpBroadcast: Broadcast server
started at http://10.193.1.76:33547
14/09/04 12:53:12 INFO spark.HttpFileServer: HTTP File server
directory is /tmp/spark-054f4eda-b93b-47d3-87d5-c40e81fc1fe8
14/09/04 12:53:12 INFO spark.HttpServer: Starting HTTP Server
14/09/04 12:53:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/09/04 12:53:12 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:54594
14/09/04 12:53:13 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/09/04 12:53:13 INFO server.AbstractConnector: Started
SelectChannelConnector@0.0.0.0:4040
14/09/04 12:53:13 INFO ui.SparkUI: Started SparkUI at
http://HDOP-B.AGT:4040 <http://hdop-b.agt:4040/>
14/09/04 12:53:13 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
--args is deprecated. Use --arg instead.
14/09/04 12:53:14 INFO client.RMProxy: Connecting to ResourceManager
at HDOP-N1.AGT/10.193.1.72:8050
14/09/04 12:53:14 INFO yarn.Client: Got Cluster metric info from
ApplicationsManager (ASM), number of NodeManagers: 6
14/09/04 12:53:14 INFO yarn.Client: Queue info ... queueName: default,
queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
      queueApplicationCount = 0, queueChildQueueCount = 0
14/09/04 12:53:14 INFO yarn.Client: Max mem capabililty of a single
resource in this cluster 13824
14/09/04 12:53:14 INFO yarn.Client: Preparing Local resources
14/09/04 12:53:15 INFO yarn.Client: Uploading
file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
to hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409805761292_0005/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
14/09/04 12:53:17 INFO yarn.Client: Uploading
file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
to hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409805761292_0005/pi.py
14/09/04 12:53:17 INFO yarn.Client: Setting up the launch environment
14/09/04 12:53:17 INFO yarn.Client: Setting up container launch context
14/09/04 12:53:17 INFO yarn.Client: Command for starting the Spark
ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx4096m,
-Djava.io.tmpdir=$PWD/tmp,
-Dspark.tachyonStore.folderName=\"spark-2b59c845-3de2-4c3d-a352-1379ecade281\",
-Dspark.executor.memory=\"2g\", -Dspark.executor.instances=\"3\",
-Dspark.yarn.dist.files=\"file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py\",
-Dspark.yarn.secondary.jars=\"\", -Dspark.submit.pyFiles=\"\",
-Dspark.driver.host=\"HDOP-B.AGT\", -Dspark.app.name
<http://dspark.app.name/>=\"PythonPi\",
-Dspark.fileserver.uri=\"http://10.193.1.76:54594\",
-Dspark.master=\"yarn-client\", -Dspark.driver.port=\"45747\",
-Dspark.executor.cores=\"1\",
-Dspark.httpBroadcast.uri=\"http://10.193.1.76:33547\",
-Dlog4j.configuration=log4j-spark-container.properties,
org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused, --jar
, null,  --args  'HDOP-B.AGT:45747' , --executor-memory, 2048,
--executor-cores, 1, --num-executors , 3, 1>, <LOG_DIR>/stdout, 2>,
<LOG_DIR>/stderr)
14/09/04 12:53:17 INFO yarn.Client: Submitting application to ASM
14/09/04 12:53:17 INFO impl.YarnClientImpl: Submitted application
application_1409805761292_0005
14/09/04 12:53:17 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:
	 appMasterRpcPort: -1
	 appStartTime: 1409806397305
	 yarnAppState: ACCEPTED

14/09/04 12:53:18 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:
	 appMasterRpcPort: -1
	 appStartTime: 1409806397305
	 yarnAppState: ACCEPTED

14/09/04 12:53:19 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:
	 appMasterRpcPort: -1
	 appStartTime: 1409806397305
	 yarnAppState: ACCEPTED

14/09/04 12:53:20 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:
	 appMasterRpcPort: -1
	 appStartTime: 1409806397305
	 yarnAppState: ACCEPTED

14/09/04 12:53:21 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:
	 appMasterRpcPort: 0
	 appStartTime: 1409806397305
	 yarnAppState: RUNNING

14/09/04 12:53:23 INFO cluster.YarnClientClusterScheduler:
YarnClientClusterScheduler.postStartHook done
14/09/04 12:53:26 INFO cluster.YarnClientSchedulerBackend: Registered
executor: Actor[akka.tcp://sparkExecutor@HDOP-N1.AGT:40024/user/Executor#2065794895]
with ID 1
14/09/04 12:53:26 INFO storage.BlockManagerInfo: Registering block
manager HDOP-N1.AGT:34857 with 1178.1 MB RAM
14/09/04 12:53:26 INFO cluster.YarnClientSchedulerBackend: Registered
executor: Actor[akka.tcp://sparkExecutor@HDOP-N4.AGT:49234/user/Executor#820272849]
with ID 3
14/09/04 12:53:27 INFO cluster.YarnClientSchedulerBackend: Registered
executor: Actor[akka.tcp://sparkExecutor@HDOP-M.AGT:38124/user/Executor#715249825]
with ID 2
14/09/04 12:53:27 INFO storage.BlockManagerInfo: Registering block
manager HDOP-N4.AGT:43365 with 1178.1 MB RAM
14/09/04 12:53:27 INFO storage.BlockManagerInfo: Registering block
manager HDOP-M.AGT:45711 with 1178.1 MB RAM
14/09/04 12:53:55 INFO spark.SparkContext: Starting job: reduce at
/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
14/09/04 12:53:55 INFO scheduler.DAGScheduler: Got job 0 (reduce at
/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
with 1000 output partitions (allowLocal=false)
14/09/04 12:53:55 INFO scheduler.DAGScheduler: Final stage: Stage
0(reduce at /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
14/09/04 12:53:55 INFO scheduler.DAGScheduler: Parents of final stage: List()
14/09/04 12:53:55 INFO scheduler.DAGScheduler: Missing parents: List()
14/09/04 12:53:55 INFO scheduler.DAGScheduler: Submitting Stage 0
(PythonRDD[1] at RDD at PythonRDD.scala:37), which has no missing
parents
14/09/04 12:53:55 INFO scheduler.DAGScheduler: Submitting 1000 missing
tasks from Stage 0 (PythonRDD[1] at RDD at PythonRDD.scala:37)
14/09/04 12:53:55 INFO cluster.YarnClientClusterScheduler: Adding task
set 0.0 with 1000 tasks
14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:0
as TID 0 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task 0.0:0
as 369810 bytes in 5 ms
14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:1
as TID 1 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task 0.0:1
as 506275 bytes in 2 ms
14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:2
as TID 2 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task 0.0:2
as 501135 bytes in 2 ms
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3
as TID 3 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3
as 506275 bytes in 5 ms
14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 1 (task 0.0:1)
14/09/04 12:53:56 WARN scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode

	at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
	at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
	at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
	at org.apache.spark.scheduler.Task.run(Task.scala:51)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1
as TID 4 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:1
as 506275 bytes in 5 ms
14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 2 (task 0.0:2)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent
call last):
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 1]
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2
as TID 5 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:2
as 501135 bytes in 5 ms
14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 3 (task 0.0:3)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent
call last):
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 2]
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3
as TID 6 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3
as 506275 bytes in 5 ms
14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent
call last):
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 3]
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:0
as TID 7 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:0
as 369810 bytes in 4 ms
14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 5 (task 0.0:2)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent
call last):
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 4]
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2
as TID 8 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:2
as 501135 bytes in 3 ms
14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 4 (task 0.0:1)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent
call last):
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 5]
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1
as TID 9 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:1
as 506275 bytes in 4 ms
14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 6 (task 0.0:3)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent
call last):
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 6]
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3
as TID 10 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3
as 506275 bytes in 3 ms
14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 7 (task 0.0:0)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent
call last):
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 7]
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:0
as TID 11 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:0
as 369810 bytes in 3 ms
14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 8 (task 0.0:2)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent
call last):
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 8]
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2
as TID 12 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:2
as 501135 bytes in 4 ms
14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 10 (task 0.0:3)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent
call last):
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 9]
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3
as TID 13 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3
as 506275 bytes in 3 ms
14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 9 (task 0.0:1)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent
call last):
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 10]
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1
as TID 14 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:1
as 506275 bytes in 4 ms
14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 11 (task 0.0:0)
14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent
call last):
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 11]
14/09/04 12:53:57 INFO scheduler.TaskSetManager: Starting task 0.0:0
as TID 15 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
14/09/04 12:53:57 INFO scheduler.TaskSetManager: Serialized task 0.0:0
as 369810 bytes in 4 ms
14/09/04 12:53:57 WARN scheduler.TaskSetManager: Lost TID 12 (task 0.0:2)
14/09/04 12:53:57 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent
call last):
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 12]
14/09/04 12:53:57 ERROR scheduler.TaskSetManager: Task 0.0:2 failed 4
times; aborting job
14/09/04 12:53:57 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent
call last):
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 13]
14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Cancelling stage 0
14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Stage 0 was cancelled
14/09/04 12:53:57 INFO scheduler.DAGScheduler: Failed to run reduce at
/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
Traceback (most recent call last):
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 38, in <module>
    count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 619, in reduce
    vals = self.mapPartitions(func).collect()
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 583, in collect
    bytesInJava = self._jrdd.collect().iterator()
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
line 537, in __call__
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
line 300, in get_return_value
py4j.protocol.Py4JJavaError14/09/04 12:53:57 INFO
scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent
call last):
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 14]
14/09/04 12:53:57 WARN scheduler.TaskSetManager: Loss was due to
org.apache.spark.TaskKilledException
org.apache.spark.TaskKilledException
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)
: An error occurred while calling o24.collect.
: org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0.0:2 failed 4 times, most recent failure: Exception failure in
TID 12 on host HDOP-M.AGT:
org.apache.spark.api.python.PythonException: Traceback (most recent
call last):
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode

        org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
        org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
        org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
        org.apache.spark.scheduler.Task.run(Task.scala:51)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:744)
Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org
<http://org.apache.spark.scheduler.dagscheduler.org/>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
	at scala.Option.foreach(Option.scala:236)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
	at akka.actor.ActorCell.invoke(ActorCell.scala:456)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
	at akka.dispatch.Mailbox.run(Mailbox.scala:219)
	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Removed
TaskSet 0.0, whose tasks have all completed, from pool


thanks
Oleg.

Re: 2 python installations cause PySpark on Yarn problem

Posted by Andrew Or <an...@databricks.com>.
Since you're using YARN, you may also need to set SPARK_YARN_USER_ENV to
"PYSPARK_PYTHON=/your/desired/python/on/slave/nodes".


2014-09-04 9:59 GMT-07:00 Davies Liu <da...@databricks.com>:

> Hey Oleg,
>
> In pyspark, you MUST have the same version of Python in all the
> machines of the cluster,
> which means when you run `python` on these machines, all of them
> should be the same
> version ( 2.6 or 2.7).
>
> With PYSPARK_PYTHON, you can run pyspark with a specified version of
> Python. Also,
> you should install this version on all the machines and in the same
> location.
>
> Davies
>
> On Thu, Sep 4, 2014 at 9:25 AM, Oleg Ruchovets <or...@gmail.com>
> wrote:
> > Hi  ,
> >    I am  evaluating  the PySpark.
> > I have hdp hortonworks installed with python 2.6.6. (I can't remove it
> since
> > it is used by hortonworks). I can successfully execute PySpark on Yarn.
> >
> > We need to use Anaconda packages , so I install anaconda. Anaconda is
> > installed with python 2.7.7 and it is added to classpath. After
> installing
> > the anaconda Pi example stops to work - I used it for testing PySpark on
> > Yarn.
> >
> > Question:
> >    How PySpark the can be used with having 2 Python versions on one
> machine.
> > In classpath I have 2.7.7 on every machine.
> >
> > How can I check what version is used in runtime executing PySpark 2.7.7?
> >
> > Exception I get are the same as in previous emails:
> >
> > [root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
> > ./bin/spark-submit --master yarn  --num-executors 3  --driver-memory 4g
> > --executor-memory 2g --executor-cores 1   examples/src/main/python/pi.py
> > 1000
> > /usr/jdk64/jdk1.7.0_45/bin/java
> >
> ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-
> > 563.jar:/etc/hadoop/conf
> > -XX:MaxPermSize=128m -Djava.library.path= -Xms4g -Xmx4g
> > 14/09/04 12:53:11 INFO spark.SecurityManager: Changing view acls to: root
> > 14/09/04 12:53:11 INFO spark.SecurityManager: SecurityManager:
> > authentication disabled; ui acls disabled; users with view permissions:
> > Set(root)
> > 14/09/04 12:53:12 INFO slf4j.Slf4jLogger: Slf4jLogger started
> > 14/09/04 12:53:12 INFO Remoting: Starting remoting
> > 14/09/04 12:53:12 INFO Remoting: Remoting started; listening on addresses
> > :[akka.tcp://spark@HDOP-B.AGT:45747]
> > 14/09/04 12:53:12 INFO Remoting: Remoting now listens on addresses:
> > [akka.tcp://spark@HDOP-B.AGT:45747]
> > 14/09/04 12:53:12 INFO spark.SparkEnv: Registering MapOutputTracker
> > 14/09/04 12:53:12 INFO spark.SparkEnv: Registering BlockManagerMaster
> > 14/09/04 12:53:12 INFO storage.DiskBlockManager: Created local directory
> at
> > /tmp/spark-local-20140904125312-c7ea
> > 14/09/04 12:53:12 INFO storage.MemoryStore: MemoryStore started with
> > capacity 2.3 GB.
> > 14/09/04 12:53:12 INFO network.ConnectionManager: Bound socket to port
> 37363
> > with id = ConnectionManagerId(HDOP-B.AGT,37363)
> > 14/09/04 12:53:12 INFO storage.BlockManagerMaster: Trying to register
> > BlockManager
> > 14/09/04 12:53:12 INFO storage.BlockManagerInfo: Registering block
> manager
> > HDOP-B.AGT:37363 with 2.3 GB RAM
> > 14/09/04 12:53:12 INFO storage.BlockManagerMaster: Registered
> BlockManager
> > 14/09/04 12:53:12 INFO spark.HttpServer: Starting HTTP Server
> > 14/09/04 12:53:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
> > 14/09/04 12:53:12 INFO server.AbstractConnector: Started
> > SocketConnector@0.0.0.0:33547
> > 14/09/04 12:53:12 INFO broadcast.HttpBroadcast: Broadcast server started
> at
> > http://10.193.1.76:33547
> > 14/09/04 12:53:12 INFO spark.HttpFileServer: HTTP File server directory
> is
> > /tmp/spark-054f4eda-b93b-47d3-87d5-c40e81fc1fe8
> > 14/09/04 12:53:12 INFO spark.HttpServer: Starting HTTP Server
> > 14/09/04 12:53:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
> > 14/09/04 12:53:12 INFO server.AbstractConnector: Started
> > SocketConnector@0.0.0.0:54594
> > 14/09/04 12:53:13 INFO server.Server: jetty-8.y.z-SNAPSHOT
> > 14/09/04 12:53:13 INFO server.AbstractConnector: Started
> > SelectChannelConnector@0.0.0.0:4040
> > 14/09/04 12:53:13 INFO ui.SparkUI: Started SparkUI at
> http://HDOP-B.AGT:4040
> > 14/09/04 12:53:13 WARN util.NativeCodeLoader: Unable to load
> native-hadoop
> > library for your platform... using builtin-java classes where applicable
> > --args is deprecated. Use --arg instead.
> > 14/09/04 12:53:14 INFO client.RMProxy: Connecting to ResourceManager at
> > HDOP-N1.AGT/10.193.1.72:8050
> > 14/09/04 12:53:14 INFO yarn.Client: Got Cluster metric info from
> > ApplicationsManager (ASM), number of NodeManagers: 6
> > 14/09/04 12:53:14 INFO yarn.Client: Queue info ... queueName: default,
> > queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
> >       queueApplicationCount = 0, queueChildQueueCount = 0
> > 14/09/04 12:53:14 INFO yarn.Client: Max mem capabililty of a single
> resource
> > in this cluster 13824
> > 14/09/04 12:53:14 INFO yarn.Client: Preparing Local resources
> > 14/09/04 12:53:15 INFO yarn.Client: Uploading
> >
> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
> > to
> >
> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409805761292_0005/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
> > 14/09/04 12:53:17 INFO yarn.Client: Uploading
> >
> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
> > to
> >
> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409805761292_0005/pi.py
> > 14/09/04 12:53:17 INFO yarn.Client: Setting up the launch environment
> > 14/09/04 12:53:17 INFO yarn.Client: Setting up container launch context
> > 14/09/04 12:53:17 INFO yarn.Client: Command for starting the Spark
> > ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx4096m,
> > -Djava.io.tmpdir=$PWD/tmp,
> >
> -Dspark.tachyonStore.folderName=\"spark-2b59c845-3de2-4c3d-a352-1379ecade281\",
> > -Dspark.executor.memory=\"2g\", -Dspark.executor.instances=\"3\",
> >
> -Dspark.yarn.dist.files=\"file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py\",
> > -Dspark.yarn.secondary.jars=\"\", -Dspark.submit.pyFiles=\"\",
> > -Dspark.driver.host=\"HDOP-B.AGT\", -Dspark.app.name=\"PythonPi\",
> > -Dspark.fileserver.uri=\"http://10.193.1.76:54594\",
> > -Dspark.master=\"yarn-client\", -Dspark.driver.port=\"45747\",
> > -Dspark.executor.cores=\"1\",
> > -Dspark.httpBroadcast.uri=\"http://10.193.1.76:33547\",
> > -Dlog4j.configuration=log4j-spark-container.properties,
> > org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused, --jar ,
> > null,  --args  'HDOP-B.AGT:45747' , --executor-memory, 2048,
> > --executor-cores, 1, --num-executors , 3, 1>, <LOG_DIR>/stdout, 2>,
> > <LOG_DIR>/stderr)
> > 14/09/04 12:53:17 INFO yarn.Client: Submitting application to ASM
> > 14/09/04 12:53:17 INFO impl.YarnClientImpl: Submitted application
> > application_1409805761292_0005
> > 14/09/04 12:53:17 INFO cluster.YarnClientSchedulerBackend: Application
> > report from ASM:
> > appMasterRpcPort: -1
> > appStartTime: 1409806397305
> > yarnAppState: ACCEPTED
> >
> > 14/09/04 12:53:18 INFO cluster.YarnClientSchedulerBackend: Application
> > report from ASM:
> > appMasterRpcPort: -1
> > appStartTime: 1409806397305
> > yarnAppState: ACCEPTED
> >
> > 14/09/04 12:53:19 INFO cluster.YarnClientSchedulerBackend: Application
> > report from ASM:
> > appMasterRpcPort: -1
> > appStartTime: 1409806397305
> > yarnAppState: ACCEPTED
> >
> > 14/09/04 12:53:20 INFO cluster.YarnClientSchedulerBackend: Application
> > report from ASM:
> > appMasterRpcPort: -1
> > appStartTime: 1409806397305
> > yarnAppState: ACCEPTED
> >
> > 14/09/04 12:53:21 INFO cluster.YarnClientSchedulerBackend: Application
> > report from ASM:
> > appMasterRpcPort: 0
> > appStartTime: 1409806397305
> > yarnAppState: RUNNING
> >
> > 14/09/04 12:53:23 INFO cluster.YarnClientClusterScheduler:
> > YarnClientClusterScheduler.postStartHook done
> > 14/09/04 12:53:26 INFO cluster.YarnClientSchedulerBackend: Registered
> > executor:
> > Actor[akka.tcp://sparkExecutor@HDOP-N1.AGT:40024/user/Executor#
> 2065794895]
> > with ID 1
> > 14/09/04 12:53:26 INFO storage.BlockManagerInfo: Registering block
> manager
> > HDOP-N1.AGT:34857 with 1178.1 MB RAM
> > 14/09/04 12:53:26 INFO cluster.YarnClientSchedulerBackend: Registered
> > executor:
> > Actor[akka.tcp://sparkExecutor@HDOP-N4.AGT
> :49234/user/Executor#820272849]
> > with ID 3
> > 14/09/04 12:53:27 INFO cluster.YarnClientSchedulerBackend: Registered
> > executor:
> > Actor[akka.tcp://sparkExecutor@HDOP-M.AGT:38124/user/Executor#715249825]
> > with ID 2
> > 14/09/04 12:53:27 INFO storage.BlockManagerInfo: Registering block
> manager
> > HDOP-N4.AGT:43365 with 1178.1 MB RAM
> > 14/09/04 12:53:27 INFO storage.BlockManagerInfo: Registering block
> manager
> > HDOP-M.AGT:45711 with 1178.1 MB RAM
> > 14/09/04 12:53:55 INFO spark.SparkContext: Starting job: reduce at
> >
> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
> > 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Got job 0 (reduce at
> >
> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
> > with 1000 output partitions (allowLocal=false)
> > 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Final stage: Stage
> 0(reduce
> > at
> >
> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
> > 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Parents of final stage:
> > List()
> > 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Missing parents: List()
> > 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Submitting Stage 0
> > (PythonRDD[1] at RDD at PythonRDD.scala:37), which has no missing parents
> > 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Submitting 1000 missing
> tasks
> > from Stage 0 (PythonRDD[1] at RDD at PythonRDD.scala:37)
> > 14/09/04 12:53:55 INFO cluster.YarnClientClusterScheduler: Adding task
> set
> > 0.0 with 1000 tasks
> > 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
> TID
> > 0 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
> > 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as
> > 369810 bytes in 5 ms
> > 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
> TID
> > 1 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
> > 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as
> > 506275 bytes in 2 ms
> > 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
> TID
> > 2 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
> > 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as
> > 501135 bytes in 2 ms
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
> TID
> > 3 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as
> > 506275 bytes in 5 ms
> > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 1 (task 0.0:1)
> > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Loss was due to
> > org.apache.spark.api.python.PythonException
> > org.apache.spark.api.python.PythonException: Traceback (most recent call
> > last):
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> > line 77, in main
> >     serializer.dump_stream(func(split_index, iterator), outfile)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 191, in dump_stream
> >     self.serializer.dump_stream(self._batched(iterator), stream)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 123, in dump_stream
> >     for obj in iterator:
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 180, in _batched
> >     for item in iterator:
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> > line 612, in func
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> > line 36, in f
> > SystemError: unknown opcode
> >
> > at
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
> > at
> > org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
> > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
> > at org.apache.spark.scheduler.Task.run(Task.scala:51)
> > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:744)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
> TID
> > 4 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as
> > 506275 bytes in 5 ms
> > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 2 (task 0.0:2)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
> > org.apache.spark.api.python.PythonException: Traceback (most recent call
> > last):
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> > line 77, in main
> >     serializer.dump_stream(func(split_index, iterator), outfile)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 191, in dump_stream
> >     self.serializer.dump_stream(self._batched(iterator), stream)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 123, in dump_stream
> >     for obj in iterator:
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 180, in _batched
> >     for item in iterator:
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> > line 612, in func
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> > line 36, in f
> > SystemError: unknown opcode
> >  [duplicate 1]
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
> TID
> > 5 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as
> > 501135 bytes in 5 ms
> > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 3 (task 0.0:3)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
> > org.apache.spark.api.python.PythonException: Traceback (most recent call
> > last):
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> > line 77, in main
> >     serializer.dump_stream(func(split_index, iterator), outfile)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 191, in dump_stream
> >     self.serializer.dump_stream(self._batched(iterator), stream)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 123, in dump_stream
> >     for obj in iterator:
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 180, in _batched
> >     for item in iterator:
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> > line 612, in func
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> > line 36, in f
> > SystemError: unknown opcode
> >  [duplicate 2]
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
> TID
> > 6 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as
> > 506275 bytes in 5 ms
> > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
> > org.apache.spark.api.python.PythonException: Traceback (most recent call
> > last):
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> > line 77, in main
> >     serializer.dump_stream(func(split_index, iterator), outfile)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 191, in dump_stream
> >     self.serializer.dump_stream(self._batched(iterator), stream)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 123, in dump_stream
> >     for obj in iterator:
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 180, in _batched
> >     for item in iterator:
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> > line 612, in func
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> > line 36, in f
> > SystemError: unknown opcode
> >  [duplicate 3]
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
> TID
> > 7 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as
> > 369810 bytes in 4 ms
> > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 5 (task 0.0:2)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
> > org.apache.spark.api.python.PythonException: Traceback (most recent call
> > last):
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> > line 77, in main
> >     serializer.dump_stream(func(split_index, iterator), outfile)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 191, in dump_stream
> >     self.serializer.dump_stream(self._batched(iterator), stream)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 123, in dump_stream
> >     for obj in iterator:
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 180, in _batched
> >     for item in iterator:
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> > line 612, in func
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> > line 36, in f
> > SystemError: unknown opcode
> >  [duplicate 4]
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
> TID
> > 8 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as
> > 501135 bytes in 3 ms
> > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 4 (task 0.0:1)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
> > org.apache.spark.api.python.PythonException: Traceback (most recent call
> > last):
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> > line 77, in main
> >     serializer.dump_stream(func(split_index, iterator), outfile)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 191, in dump_stream
> >     self.serializer.dump_stream(self._batched(iterator), stream)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 123, in dump_stream
> >     for obj in iterator:
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 180, in _batched
> >     for item in iterator:
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> > line 612, in func
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> > line 36, in f
> > SystemError: unknown opcode
> >  [duplicate 5]
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
> TID
> > 9 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as
> > 506275 bytes in 4 ms
> > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 6 (task 0.0:3)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
> > org.apache.spark.api.python.PythonException: Traceback (most recent call
> > last):
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> > line 77, in main
> >     serializer.dump_stream(func(split_index, iterator), outfile)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 191, in dump_stream
> >     self.serializer.dump_stream(self._batched(iterator), stream)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 123, in dump_stream
> >     for obj in iterator:
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 180, in _batched
> >     for item in iterator:
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> > line 612, in func
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> > line 36, in f
> > SystemError: unknown opcode
> >  [duplicate 6]
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
> TID
> > 10 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as
> > 506275 bytes in 3 ms
> > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 7 (task 0.0:0)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
> > org.apache.spark.api.python.PythonException: Traceback (most recent call
> > last):
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> > line 77, in main
> >     serializer.dump_stream(func(split_index, iterator), outfile)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 191, in dump_stream
> >     self.serializer.dump_stream(self._batched(iterator), stream)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 123, in dump_stream
> >     for obj in iterator:
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 180, in _batched
> >     for item in iterator:
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> > line 612, in func
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> > line 36, in f
> > SystemError: unknown opcode
> >  [duplicate 7]
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
> TID
> > 11 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as
> > 369810 bytes in 3 ms
> > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 8 (task 0.0:2)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
> > org.apache.spark.api.python.PythonException: Traceback (most recent call
> > last):
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> > line 77, in main
> >     serializer.dump_stream(func(split_index, iterator), outfile)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 191, in dump_stream
> >     self.serializer.dump_stream(self._batched(iterator), stream)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 123, in dump_stream
> >     for obj in iterator:
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 180, in _batched
> >     for item in iterator:
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> > line 612, in func
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> > line 36, in f
> > SystemError: unknown opcode
> >  [duplicate 8]
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
> TID
> > 12 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as
> > 501135 bytes in 4 ms
> > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 10 (task 0.0:3)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
> > org.apache.spark.api.python.PythonException: Traceback (most recent call
> > last):
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> > line 77, in main
> >     serializer.dump_stream(func(split_index, iterator), outfile)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 191, in dump_stream
> >     self.serializer.dump_stream(self._batched(iterator), stream)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 123, in dump_stream
> >     for obj in iterator:
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 180, in _batched
> >     for item in iterator:
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> > line 612, in func
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> > line 36, in f
> > SystemError: unknown opcode
> >  [duplicate 9]
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
> TID
> > 13 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as
> > 506275 bytes in 3 ms
> > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 9 (task 0.0:1)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
> > org.apache.spark.api.python.PythonException: Traceback (most recent call
> > last):
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> > line 77, in main
> >     serializer.dump_stream(func(split_index, iterator), outfile)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 191, in dump_stream
> >     self.serializer.dump_stream(self._batched(iterator), stream)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 123, in dump_stream
> >     for obj in iterator:
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 180, in _batched
> >     for item in iterator:
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> > line 612, in func
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> > line 36, in f
> > SystemError: unknown opcode
> >  [duplicate 10]
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
> TID
> > 14 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as
> > 506275 bytes in 4 ms
> > 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 11 (task 0.0:0)
> > 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
> > org.apache.spark.api.python.PythonException: Traceback (most recent call
> > last):
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> > line 77, in main
> >     serializer.dump_stream(func(split_index, iterator), outfile)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 191, in dump_stream
> >     self.serializer.dump_stream(self._batched(iterator), stream)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 123, in dump_stream
> >     for obj in iterator:
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 180, in _batched
> >     for item in iterator:
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> > line 612, in func
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> > line 36, in f
> > SystemError: unknown opcode
> >  [duplicate 11]
> > 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
> TID
> > 15 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
> > 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as
> > 369810 bytes in 4 ms
> > 14/09/04 12:53:57 WARN scheduler.TaskSetManager: Lost TID 12 (task 0.0:2)
> > 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Loss was due to
> > org.apache.spark.api.python.PythonException: Traceback (most recent call
> > last):
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> > line 77, in main
> >     serializer.dump_stream(func(split_index, iterator), outfile)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 191, in dump_stream
> >     self.serializer.dump_stream(self._batched(iterator), stream)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 123, in dump_stream
> >     for obj in iterator:
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 180, in _batched
> >     for item in iterator:
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> > line 612, in func
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> > line 36, in f
> > SystemError: unknown opcode
> >  [duplicate 12]
> > 14/09/04 12:53:57 ERROR scheduler.TaskSetManager: Task 0.0:2 failed 4
> times;
> > aborting job
> > 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Loss was due to
> > org.apache.spark.api.python.PythonException: Traceback (most recent call
> > last):
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> > line 77, in main
> >     serializer.dump_stream(func(split_index, iterator), outfile)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 191, in dump_stream
> >     self.serializer.dump_stream(self._batched(iterator), stream)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 123, in dump_stream
> >     for obj in iterator:
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 180, in _batched
> >     for item in iterator:
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> > line 612, in func
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> > line 36, in f
> > SystemError: unknown opcode
> >  [duplicate 13]
> > 14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Cancelling
> stage
> > 0
> > 14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Stage 0 was
> > cancelled
> > 14/09/04 12:53:57 INFO scheduler.DAGScheduler: Failed to run reduce at
> >
> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
> > Traceback (most recent call last):
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> > line 38, in <module>
> >     count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> > line 619, in reduce
> >     vals = self.mapPartitions(func).collect()
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> > line 583, in collect
> >     bytesInJava = self._jrdd.collect().iterator()
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
> > line 537, in __call__
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
> > line 300, in get_return_value
> > py4j.protocol.Py4JJavaError14/09/04 12:53:57 INFO
> scheduler.TaskSetManager:
> > Loss was due to org.apache.spark.api.python.PythonException: Traceback
> (most
> > recent call last):
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> > line 77, in main
> >     serializer.dump_stream(func(split_index, iterator), outfile)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 191, in dump_stream
> >     self.serializer.dump_stream(self._batched(iterator), stream)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 123, in dump_stream
> >     for obj in iterator:
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 180, in _batched
> >     for item in iterator:
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> > line 612, in func
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> > line 36, in f
> > SystemError: unknown opcode
> >  [duplicate 14]
> > 14/09/04 12:53:57 WARN scheduler.TaskSetManager: Loss was due to
> > org.apache.spark.TaskKilledException
> > org.apache.spark.TaskKilledException
> > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:744)
> > : An error occurred while calling o24.collect.
> > : org.apache.spark.SparkException: Job aborted due to stage failure: Task
> > 0.0:2 failed 4 times, most recent failure: Exception failure in TID 12 on
> > host HDOP-M.AGT: org.apache.spark.api.python.PythonException: Traceback
> > (most recent call last):
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> > line 77, in main
> >     serializer.dump_stream(func(split_index, iterator), outfile)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 191, in dump_stream
> >     self.serializer.dump_stream(self._batched(iterator), stream)
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 123, in dump_stream
> >     for obj in iterator:
> >   File
> >
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> > line 180, in _batched
> >     for item in iterator:
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> > line 612, in func
> >   File
> >
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> > line 36, in f
> > SystemError: unknown opcode
> >
> >
> > org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
> >
> > org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
> >         org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
> >         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> >         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> >
>  org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
> >         org.apache.spark.scheduler.Task.run(Task.scala:51)
> >
> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >         java.lang.Thread.run(Thread.java:744)
> > Driver stacktrace:
> > at
> > org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
> > at
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
> > at
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
> > at
> >
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> > at
> >
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
> > at
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
> > at
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
> > at scala.Option.foreach(Option.scala:236)
> > at
> >
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
> > at
> >
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
> > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> > at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> > at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> > at
> >
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> > at
> >
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> > at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> > at
> >
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> >
> > 14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Removed
> TaskSet
> > 0.0, whose tasks have all completed, from pool
> >
> >
> > thanks
> > Oleg.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: 2 python installations cause PySpark on Yarn problem

Posted by Davies Liu <da...@databricks.com>.
Hey Oleg,

In pyspark, you MUST have the same version of Python in all the
machines of the cluster,
which means when you run `python` on these machines, all of them
should be the same
version ( 2.6 or 2.7).

With PYSPARK_PYTHON, you can run pyspark with a specified version of
Python. Also,
you should install this version on all the machines and in the same location.

Davies

On Thu, Sep 4, 2014 at 9:25 AM, Oleg Ruchovets <or...@gmail.com> wrote:
> Hi  ,
>    I am  evaluating  the PySpark.
> I have hdp hortonworks installed with python 2.6.6. (I can't remove it since
> it is used by hortonworks). I can successfully execute PySpark on Yarn.
>
> We need to use Anaconda packages , so I install anaconda. Anaconda is
> installed with python 2.7.7 and it is added to classpath. After installing
> the anaconda Pi example stops to work - I used it for testing PySpark on
> Yarn.
>
> Question:
>    How PySpark the can be used with having 2 Python versions on one machine.
> In classpath I have 2.7.7 on every machine.
>
> How can I check what version is used in runtime executing PySpark 2.7.7?
>
> Exception I get are the same as in previous emails:
>
> [root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
> ./bin/spark-submit --master yarn  --num-executors 3  --driver-memory 4g
> --executor-memory 2g --executor-cores 1   examples/src/main/python/pi.py
> 1000
> /usr/jdk64/jdk1.7.0_45/bin/java
> ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-
> 563.jar:/etc/hadoop/conf
> -XX:MaxPermSize=128m -Djava.library.path= -Xms4g -Xmx4g
> 14/09/04 12:53:11 INFO spark.SecurityManager: Changing view acls to: root
> 14/09/04 12:53:11 INFO spark.SecurityManager: SecurityManager:
> authentication disabled; ui acls disabled; users with view permissions:
> Set(root)
> 14/09/04 12:53:12 INFO slf4j.Slf4jLogger: Slf4jLogger started
> 14/09/04 12:53:12 INFO Remoting: Starting remoting
> 14/09/04 12:53:12 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://spark@HDOP-B.AGT:45747]
> 14/09/04 12:53:12 INFO Remoting: Remoting now listens on addresses:
> [akka.tcp://spark@HDOP-B.AGT:45747]
> 14/09/04 12:53:12 INFO spark.SparkEnv: Registering MapOutputTracker
> 14/09/04 12:53:12 INFO spark.SparkEnv: Registering BlockManagerMaster
> 14/09/04 12:53:12 INFO storage.DiskBlockManager: Created local directory at
> /tmp/spark-local-20140904125312-c7ea
> 14/09/04 12:53:12 INFO storage.MemoryStore: MemoryStore started with
> capacity 2.3 GB.
> 14/09/04 12:53:12 INFO network.ConnectionManager: Bound socket to port 37363
> with id = ConnectionManagerId(HDOP-B.AGT,37363)
> 14/09/04 12:53:12 INFO storage.BlockManagerMaster: Trying to register
> BlockManager
> 14/09/04 12:53:12 INFO storage.BlockManagerInfo: Registering block manager
> HDOP-B.AGT:37363 with 2.3 GB RAM
> 14/09/04 12:53:12 INFO storage.BlockManagerMaster: Registered BlockManager
> 14/09/04 12:53:12 INFO spark.HttpServer: Starting HTTP Server
> 14/09/04 12:53:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 14/09/04 12:53:12 INFO server.AbstractConnector: Started
> SocketConnector@0.0.0.0:33547
> 14/09/04 12:53:12 INFO broadcast.HttpBroadcast: Broadcast server started at
> http://10.193.1.76:33547
> 14/09/04 12:53:12 INFO spark.HttpFileServer: HTTP File server directory is
> /tmp/spark-054f4eda-b93b-47d3-87d5-c40e81fc1fe8
> 14/09/04 12:53:12 INFO spark.HttpServer: Starting HTTP Server
> 14/09/04 12:53:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 14/09/04 12:53:12 INFO server.AbstractConnector: Started
> SocketConnector@0.0.0.0:54594
> 14/09/04 12:53:13 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 14/09/04 12:53:13 INFO server.AbstractConnector: Started
> SelectChannelConnector@0.0.0.0:4040
> 14/09/04 12:53:13 INFO ui.SparkUI: Started SparkUI at http://HDOP-B.AGT:4040
> 14/09/04 12:53:13 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> --args is deprecated. Use --arg instead.
> 14/09/04 12:53:14 INFO client.RMProxy: Connecting to ResourceManager at
> HDOP-N1.AGT/10.193.1.72:8050
> 14/09/04 12:53:14 INFO yarn.Client: Got Cluster metric info from
> ApplicationsManager (ASM), number of NodeManagers: 6
> 14/09/04 12:53:14 INFO yarn.Client: Queue info ... queueName: default,
> queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
>       queueApplicationCount = 0, queueChildQueueCount = 0
> 14/09/04 12:53:14 INFO yarn.Client: Max mem capabililty of a single resource
> in this cluster 13824
> 14/09/04 12:53:14 INFO yarn.Client: Preparing Local resources
> 14/09/04 12:53:15 INFO yarn.Client: Uploading
> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
> to
> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409805761292_0005/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
> 14/09/04 12:53:17 INFO yarn.Client: Uploading
> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
> to
> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409805761292_0005/pi.py
> 14/09/04 12:53:17 INFO yarn.Client: Setting up the launch environment
> 14/09/04 12:53:17 INFO yarn.Client: Setting up container launch context
> 14/09/04 12:53:17 INFO yarn.Client: Command for starting the Spark
> ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx4096m,
> -Djava.io.tmpdir=$PWD/tmp,
> -Dspark.tachyonStore.folderName=\"spark-2b59c845-3de2-4c3d-a352-1379ecade281\",
> -Dspark.executor.memory=\"2g\", -Dspark.executor.instances=\"3\",
> -Dspark.yarn.dist.files=\"file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py\",
> -Dspark.yarn.secondary.jars=\"\", -Dspark.submit.pyFiles=\"\",
> -Dspark.driver.host=\"HDOP-B.AGT\", -Dspark.app.name=\"PythonPi\",
> -Dspark.fileserver.uri=\"http://10.193.1.76:54594\",
> -Dspark.master=\"yarn-client\", -Dspark.driver.port=\"45747\",
> -Dspark.executor.cores=\"1\",
> -Dspark.httpBroadcast.uri=\"http://10.193.1.76:33547\",
> -Dlog4j.configuration=log4j-spark-container.properties,
> org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused, --jar ,
> null,  --args  'HDOP-B.AGT:45747' , --executor-memory, 2048,
> --executor-cores, 1, --num-executors , 3, 1>, <LOG_DIR>/stdout, 2>,
> <LOG_DIR>/stderr)
> 14/09/04 12:53:17 INFO yarn.Client: Submitting application to ASM
> 14/09/04 12:53:17 INFO impl.YarnClientImpl: Submitted application
> application_1409805761292_0005
> 14/09/04 12:53:17 INFO cluster.YarnClientSchedulerBackend: Application
> report from ASM:
> appMasterRpcPort: -1
> appStartTime: 1409806397305
> yarnAppState: ACCEPTED
>
> 14/09/04 12:53:18 INFO cluster.YarnClientSchedulerBackend: Application
> report from ASM:
> appMasterRpcPort: -1
> appStartTime: 1409806397305
> yarnAppState: ACCEPTED
>
> 14/09/04 12:53:19 INFO cluster.YarnClientSchedulerBackend: Application
> report from ASM:
> appMasterRpcPort: -1
> appStartTime: 1409806397305
> yarnAppState: ACCEPTED
>
> 14/09/04 12:53:20 INFO cluster.YarnClientSchedulerBackend: Application
> report from ASM:
> appMasterRpcPort: -1
> appStartTime: 1409806397305
> yarnAppState: ACCEPTED
>
> 14/09/04 12:53:21 INFO cluster.YarnClientSchedulerBackend: Application
> report from ASM:
> appMasterRpcPort: 0
> appStartTime: 1409806397305
> yarnAppState: RUNNING
>
> 14/09/04 12:53:23 INFO cluster.YarnClientClusterScheduler:
> YarnClientClusterScheduler.postStartHook done
> 14/09/04 12:53:26 INFO cluster.YarnClientSchedulerBackend: Registered
> executor:
> Actor[akka.tcp://sparkExecutor@HDOP-N1.AGT:40024/user/Executor#2065794895]
> with ID 1
> 14/09/04 12:53:26 INFO storage.BlockManagerInfo: Registering block manager
> HDOP-N1.AGT:34857 with 1178.1 MB RAM
> 14/09/04 12:53:26 INFO cluster.YarnClientSchedulerBackend: Registered
> executor:
> Actor[akka.tcp://sparkExecutor@HDOP-N4.AGT:49234/user/Executor#820272849]
> with ID 3
> 14/09/04 12:53:27 INFO cluster.YarnClientSchedulerBackend: Registered
> executor:
> Actor[akka.tcp://sparkExecutor@HDOP-M.AGT:38124/user/Executor#715249825]
> with ID 2
> 14/09/04 12:53:27 INFO storage.BlockManagerInfo: Registering block manager
> HDOP-N4.AGT:43365 with 1178.1 MB RAM
> 14/09/04 12:53:27 INFO storage.BlockManagerInfo: Registering block manager
> HDOP-M.AGT:45711 with 1178.1 MB RAM
> 14/09/04 12:53:55 INFO spark.SparkContext: Starting job: reduce at
> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Got job 0 (reduce at
> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
> with 1000 output partitions (allowLocal=false)
> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Final stage: Stage 0(reduce
> at
> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Parents of final stage:
> List()
> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Missing parents: List()
> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Submitting Stage 0
> (PythonRDD[1] at RDD at PythonRDD.scala:37), which has no missing parents
> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Submitting 1000 missing tasks
> from Stage 0 (PythonRDD[1] at RDD at PythonRDD.scala:37)
> 14/09/04 12:53:55 INFO cluster.YarnClientClusterScheduler: Adding task set
> 0.0 with 1000 tasks
> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID
> 0 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as
> 369810 bytes in 5 ms
> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:1 as TID
> 1 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as
> 506275 bytes in 2 ms
> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:2 as TID
> 2 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as
> 501135 bytes in 2 ms
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3 as TID
> 3 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as
> 506275 bytes in 5 ms
> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 1 (task 0.0:1)
> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>
> at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
> at
> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
> at org.apache.spark.scheduler.Task.run(Task.scala:51)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1 as TID
> 4 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as
> 506275 bytes in 5 ms
> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 2 (task 0.0:2)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 1]
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2 as TID
> 5 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as
> 501135 bytes in 5 ms
> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 3 (task 0.0:3)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 2]
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3 as TID
> 6 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as
> 506275 bytes in 5 ms
> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 3]
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID
> 7 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as
> 369810 bytes in 4 ms
> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 5 (task 0.0:2)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 4]
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2 as TID
> 8 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as
> 501135 bytes in 3 ms
> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 4 (task 0.0:1)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 5]
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1 as TID
> 9 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as
> 506275 bytes in 4 ms
> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 6 (task 0.0:3)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 6]
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3 as TID
> 10 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as
> 506275 bytes in 3 ms
> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 7 (task 0.0:0)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 7]
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID
> 11 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as
> 369810 bytes in 3 ms
> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 8 (task 0.0:2)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 8]
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2 as TID
> 12 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as
> 501135 bytes in 4 ms
> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 10 (task 0.0:3)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 9]
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3 as TID
> 13 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as
> 506275 bytes in 3 ms
> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 9 (task 0.0:1)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 10]
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1 as TID
> 14 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as
> 506275 bytes in 4 ms
> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 11 (task 0.0:0)
> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 11]
> 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID
> 15 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
> 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as
> 369810 bytes in 4 ms
> 14/09/04 12:53:57 WARN scheduler.TaskSetManager: Lost TID 12 (task 0.0:2)
> 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 12]
> 14/09/04 12:53:57 ERROR scheduler.TaskSetManager: Task 0.0:2 failed 4 times;
> aborting job
> 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 13]
> 14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Cancelling stage
> 0
> 14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Stage 0 was
> cancelled
> 14/09/04 12:53:57 INFO scheduler.DAGScheduler: Failed to run reduce at
> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
> Traceback (most recent call last):
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 38, in <module>
>     count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 619, in reduce
>     vals = self.mapPartitions(func).collect()
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 583, in collect
>     bytesInJava = self._jrdd.collect().iterator()
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
> line 537, in __call__
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
> line 300, in get_return_value
> py4j.protocol.Py4JJavaError14/09/04 12:53:57 INFO scheduler.TaskSetManager:
> Loss was due to org.apache.spark.api.python.PythonException: Traceback (most
> recent call last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 14]
> 14/09/04 12:53:57 WARN scheduler.TaskSetManager: Loss was due to
> org.apache.spark.TaskKilledException
> org.apache.spark.TaskKilledException
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> : An error occurred while calling o24.collect.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 0.0:2 failed 4 times, most recent failure: Exception failure in TID 12 on
> host HDOP-M.AGT: org.apache.spark.api.python.PythonException: Traceback
> (most recent call last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>
>
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>
> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>         org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>         org.apache.spark.scheduler.Task.run(Task.scala:51)
>
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         java.lang.Thread.run(Thread.java:744)
> Driver stacktrace:
> at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
> at scala.Option.foreach(Option.scala:236)
> at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
> 14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Removed TaskSet
> 0.0, whose tasks have all completed, from pool
>
>
> thanks
> Oleg.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org