You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "John Omernik (JIRA)" <ji...@apache.org> on 2015/07/14 16:30:05 UTC
[jira] [Created] (SPARK-9035) Spark on Mesos Thread Context Class
Loader issues
John Omernik created SPARK-9035:
-----------------------------------
Summary: Spark on Mesos Thread Context Class Loader issues
Key: SPARK-9035
URL: https://issues.apache.org/jira/browse/SPARK-9035
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 1.4.0, 1.3.1, 1.3.0, 1.2.2
Environment: Mesos on MapRFS.
Reporter: John Omernik
Priority: Critical
There is an issue trying to run Spark on Mesos (using MapRFS). I am able to run this in YARN (Using Myriad on Mesos) on the same cluster, just not directly on Mesos. I've corresponded with MapR and the issue appears to be the class loader being NULL. They will look at trying to address it in their code as well, but the issue exists here as the desired behavior shouldn't be to pass NULL (see https://issues.apache.org/jira/browse/SPARK-1403) Note, I did try to work to reopen SPARK-1403 and Patrick Wendell asked me to open a new issue, (that is this JIRA).
Environment:
MapR 4.1.0 (using MapRFS)
Mesos 22.1
Spark 1.4 (The issue occurs on Spark 1.3.1, 1.3.0, 1.2.2 but not 1.2.0)
Some comments from Kannan at MapR (he is no longer with MapR, these comments were prior to him leaving:
Here is the corresponding ShimLoader code. cl.getParent is hitting NPE.
If you look at Spark code base, you can see that the setContextClassLoader is invoked in a few places, but not necessarily in the context of this stack trace.
private static ClassLoader getRootClassLoader() {
ClassLoader cl = Thread.currentThread().getContextClassLoader();
trace("getRootClassLoader: thread classLoader is '%s'",
cl.getClass().getCanonicalName());
while (cl.getParent() != null) {
cl = cl.getParent();
}
trace("getRootClassLoader: root classLoader is '%s'",
cl.getClass().getCanonicalName());
return cl;
}
MapR cannot handle NULL in this case. Basically, it is trying to get a root classloader to use for loading a bunch of classes. It uses the thread's context class loader (TCCL) and keeps going up the parent chain. We could fall back to using the current class's classloader whenever TCCL is NULL. I need to check with some folks what the impact will be. I don't know the specific reason for choosing the TCCL here.
I have raised an internal bug to fall back to using the current class loader if the TCCL is not set. Let us also figure out if there is a way for Spark to address this - if it is really a change in behavior from their side. I think we should still fix out code to not make this assumption. But since this is a core change, it may not get out soon.
Command Attempted in bin/pyspark
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext, Row, HiveContext
sparkhc = HiveContext(sc)
test = sparkhc.sql("show tables")
for r in test.collect():
print r
Stack Trace from CLI:
15/07/14 09:16:40 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@hadoopvm5.mydomain.com:58221] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/07/14 09:16:40 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, hadoopvm5.mydomain.com): ExecutorLostFailure (executor 20150630-193234-1644210368-5050-10591-S3 lost)
15/07/14 09:16:48 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@hadoopmapr3.mydomain.com:53763] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/07/14 09:16:48 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1, hadoopmapr3.mydomain.com): ExecutorLostFailure (executor 20150630-193234-1644210368-5050-10591-S2 lost)
15/07/14 09:16:53 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@hadoopvm5.mydomain.com:52102] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/07/14 09:16:53 WARN TaskSetManager: Lost task 0.2 in stage 0.0 (TID 2, hadoopvm5.mydomain.com): ExecutorLostFailure (executor 20150630-193234-1644210368-5050-10591-S3 lost)
15/07/14 09:17:01 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@hadoopmapr3.mydomain.com:58600] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/07/14 09:17:01 WARN TaskSetManager: Lost task 0.3 in stage 0.0 (TID 3, hadoopmapr3.mydomain.com): ExecutorLostFailure (executor 20150630-193234-1644210368-5050-10591-S2 lost)
15/07/14 09:17:01 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/mapr/brewpot/mesos/spark/spark-1.4.0-bin-2.5.1-mapr-1503/python/pyspark/sql/dataframe.py", line 314, in collect
port = self._sc._jvm.PythonRDD.collectAndServe(self._jdf.javaToPython().rdd())
File "/mapr/brewpot/mesos/spark/spark-1.4.0-bin-2.5.1-mapr-1503/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
File "/mapr/brewpot/mesos/spark/spark-1.4.0-bin-2.5.1-mapr-1503/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, hadoopmapr3.mydomain.com): ExecutorLostFailure (executor 20150630-193234-1644210368-5050-10591-S2 lost)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Stack Trace from STDERR on Failed Mesos Task:
I0714 09:16:31.665690 21429 fetcher.cpp:214] Fetching URI '/mapr/brewpot/mesos/spark/spark-1.4.0-bin-2.5.1-mapr-1503.tgz'
I0714 09:16:31.665841 21429 fetcher.cpp:194] Copying resource from '/mapr/brewpot/mesos/spark/spark-1.4.0-bin-2.5.1-mapr-1503.tgz' to '/tmp/mesos/slaves/20150630-193234-1644210368-5050-10591-S3/frameworks/20150630-193234-1644210368-5050-10591-0001/executors/20150630-193234-1644210368-5050-10591-S3/runs/bd6305f4-6034-4b2e-9b77-2aff5f03579f'
I0714 09:16:35.624750 21429 fetcher.cpp:78] Extracted resource '/tmp/mesos/slaves/20150630-193234-1644210368-5050-10591-S3/frameworks/20150630-193234-1644210368-5050-10591-0001/executors/20150630-193234-1644210368-5050-10591-S3/runs/bd6305f4-6034-4b2e-9b77-2aff5f03579f/spark-1.4.0-bin-2.5.1-mapr-1503.tgz' into '/tmp/mesos/slaves/20150630-193234-1644210368-5050-10591-S3/frameworks/20150630-193234-1644210368-5050-10591-0001/executors/20150630-193234-1644210368-5050-10591-S3/runs/bd6305f4-6034-4b2e-9b77-2aff5f03579f'
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/07/14 09:16:39 INFO MesosExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
I0714 09:16:39.139713 21504 exec.cpp:132] Version: 0.22.1
I0714 09:16:39.147428 21525 exec.cpp:206] Executor registered on slave 20150630-193234-1644210368-5050-10591-S3
15/07/14 09:16:39 INFO MesosExecutorBackend: Registered with Mesos as executor ID 20150630-193234-1644210368-5050-10591-S3 with 1 cpus
15/07/14 09:16:39 INFO SecurityManager: Changing view acls to: darkness
15/07/14 09:16:39 INFO SecurityManager: Changing modify acls to: darkness
15/07/14 09:16:39 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(darkness); users with modify permissions: Set(darkness)
15/07/14 09:16:39 INFO Slf4jLogger: Slf4jLogger started
15/07/14 09:16:39 INFO Remoting: Starting remoting
15/07/14 09:16:39 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor@hadoopvm5.mydomain.com:58221]
15/07/14 09:16:39 INFO Utils: Successfully started service 'sparkExecutor' on port 58221.
15/07/14 09:16:39 INFO DiskBlockManager: Created local directory at /tmp/spark-23d6412f-be9e-4351-9daa-bbba22758c42/blockmgr-54a629d3-5583-4c05-b478-dfee1ad5a113
15/07/14 09:16:39 INFO MemoryStore: MemoryStore started with capacity 1060.0 MB
java.lang.NullPointerException
at com.mapr.fs.ShimLoader.getRootClassLoader(ShimLoader.java:109)
at com.mapr.fs.ShimLoader.injectNativeLoader(ShimLoader.java:245)
at com.mapr.fs.ShimLoader.load(ShimLoader.java:207)
at org.apache.hadoop.conf.CoreDefaultProperties.<clinit>(CoreDefaultProperties.java:59)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1857)
at org.apache.hadoop.conf.Configuration.getProperties(Configuration.java:2072)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2282)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2234)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2151)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1002)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:974)
at org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:104)
at org.apache.spark.deploy.SparkHadoopUtil.<init>(SparkHadoopUtil.scala:49)
at org.apache.spark.deploy.SparkHadoopUtil$.<init>(SparkHadoopUtil.scala:353)
at org.apache.spark.deploy.SparkHadoopUtil$.<clinit>(SparkHadoopUtil.scala)
at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2120)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:95)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:171)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:338)
at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:211)
at org.apache.spark.executor.MesosExecutorBackend.registered(MesosExecutorBackend.scala:70)
java.lang.RuntimeException: Failure loading MapRClient.
at com.mapr.fs.ShimLoader.injectNativeLoader(ShimLoader.java:296)
at com.mapr.fs.ShimLoader.load(ShimLoader.java:207)
at org.apache.hadoop.conf.CoreDefaultProperties.<clinit>(CoreDefaultProperties.java:59)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1857)
at org.apache.hadoop.conf.Configuration.getProperties(Configuration.java:2072)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2282)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2234)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2151)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1002)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:974)
at org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:104)
at org.apache.spark.deploy.SparkHadoopUtil.<init>(SparkHadoopUtil.scala:49)
at org.apache.spark.deploy.SparkHadoopUtil$.<init>(SparkHadoopUtil.scala:353)
at org.apache.spark.deploy.SparkHadoopUtil$.<clinit>(SparkHadoopUtil.scala)
at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2120)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:95)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:171)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:338)
at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:211)
at org.apache.spark.executor.MesosExecutorBackend.registered(MesosExecutorBackend.scala:70)
Caused by: java.lang.NullPointerException
at com.mapr.fs.ShimLoader.getRootClassLoader(ShimLoader.java:109)
at com.mapr.fs.ShimLoader.injectNativeLoader(ShimLoader.java:245)
... 21 more
java.lang.ExceptionInInitializerError
at com.mapr.fs.ShimLoader.load(ShimLoader.java:227)
at org.apache.hadoop.conf.CoreDefaultProperties.<clinit>(CoreDefaultProperties.java:59)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1857)
at org.apache.hadoop.conf.Configuration.getProperties(Configuration.java:2072)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2282)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2234)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2151)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1002)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:974)
at org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:104)
at org.apache.spark.deploy.SparkHadoopUtil.<init>(SparkHadoopUtil.scala:49)
at org.apache.spark.deploy.SparkHadoopUtil$.<init>(SparkHadoopUtil.scala:353)
at org.apache.spark.deploy.SparkHadoopUtil$.<clinit>(SparkHadoopUtil.scala)
at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2120)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:95)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:171)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:338)
at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:211)
at org.apache.spark.executor.MesosExecutorBackend.registered(MesosExecutorBackend.scala:70)
Caused by: java.lang.RuntimeException: Failure loading MapRClient.
at com.mapr.fs.ShimLoader.injectNativeLoader(ShimLoader.java:296)
at com.mapr.fs.ShimLoader.load(ShimLoader.java:207)
... 20 more
Caused by: java.lang.NullPointerException
at com.mapr.fs.ShimLoader.getRootClassLoader(ShimLoader.java:109)
at com.mapr.fs.ShimLoader.injectNativeLoader(ShimLoader.java:245)
... 21 more
Exception in thread "Thread-2" I0714 09:16:40.007040 21525 exec.cpp:413] Deactivating the executor libprocess
15/07/14 09:16:40 INFO DiskBlockManager: Shutdown hook called
15/07/14 09:16:40 INFO Utils: path = /tmp/spark-23d6412f-be9e-4351-9daa-bbba22758c42/blockmgr-54a629d3-5583-4c05-b478-dfee1ad5a113, already present as root for deletion.
15/07/14 09:16:40 INFO Utils: Shutdown hook called
15/07/14 09:16:40 INFO Utils: Deleting directory /tmp/spark-23d6412f-be9e-4351-9daa-bbba22758c42
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org