You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Vajiha Begum S A <va...@maestrowiz.com> on 2022/11/30 10:13:24 UTC

Error - using Spark with GPU

 spark-submit /home/mwadmin/Documents/test.py
22/11/30 14:59:32 WARN Utils: Your hostname, mwadmin-HP-Z440-Workstation
resolves to a loopback address: 127.0.1.1; using ***.***.**.** instead (on
interface eno1)
22/11/30 14:59:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
another address
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
22/11/30 14:59:32 INFO SparkContext: Running Spark version 3.2.2
22/11/30 14:59:32 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
22/11/30 14:59:33 INFO ResourceUtils:
==============================================================
22/11/30 14:59:33 INFO ResourceUtils: No custom resources configured for
spark.driver.
22/11/30 14:59:33 INFO ResourceUtils:
==============================================================
22/11/30 14:59:33 INFO SparkContext: Submitted application: Spark.com
22/11/30 14:59:33 INFO ResourceProfile: Default ResourceProfile created,
executor resources: Map(cores -> name: cores, amount: 1, script: , vendor:
, memory -> name: memory, amount: 1024, script: , vendor: , offHeap ->
name: offHeap, amount: 0, script: , vendor: , gpu -> name: gpu, amount: 1,
script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0,
gpu -> name: gpu, amount: 0.5)
22/11/30 14:59:33 INFO ResourceProfile: Limiting resource is cpus at 1
tasks per executor
22/11/30 14:59:33 WARN ResourceUtils: The configuration of resource: gpu
(exec = 1, task = 0.5/2, runnable tasks = 2) will result in wasted
resources due to resource cpus limiting the number of runnable tasks per
executor to: 1. Please adjust your configuration.
22/11/30 14:59:33 INFO ResourceProfileManager: Added ResourceProfile id: 0
22/11/30 14:59:33 INFO SecurityManager: Changing view acls to: mwadmin
22/11/30 14:59:33 INFO SecurityManager: Changing modify acls to: mwadmin
22/11/30 14:59:33 INFO SecurityManager: Changing view acls groups to:
22/11/30 14:59:33 INFO SecurityManager: Changing modify acls groups to:
22/11/30 14:59:33 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users  with view permissions: Set(mwadmin);
groups with view permissions: Set(); users  with modify permissions:
Set(mwadmin); groups with modify permissions: Set()
22/11/30 14:59:33 INFO Utils: Successfully started service 'sparkDriver' on
port 45883.
22/11/30 14:59:33 INFO SparkEnv: Registering MapOutputTracker
22/11/30 14:59:33 INFO SparkEnv: Registering BlockManagerMaster
22/11/30 14:59:33 INFO BlockManagerMasterEndpoint: Using
org.apache.spark.storage.DefaultTopologyMapper for getting topology
information
22/11/30 14:59:33 INFO BlockManagerMasterEndpoint:
BlockManagerMasterEndpoint up
22/11/30 14:59:33 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
22/11/30 14:59:33 INFO DiskBlockManager: Created local directory at
/tmp/blockmgr-647d2c2a-72e4-402d-aeff-d7460726eb6d
22/11/30 14:59:33 INFO MemoryStore: MemoryStore started with capacity 366.3
MiB
22/11/30 14:59:33 INFO SparkEnv: Registering OutputCommitCoordinator
22/11/30 14:59:33 INFO Utils: Successfully started service 'SparkUI' on
port 4040.
22/11/30 14:59:33 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at
htttp://localhost:4040
22/11/30 14:59:33 INFO ShimLoader: Loading shim for Spark version: 3.2.2
22/11/30 14:59:33 INFO ShimLoader: Complete Spark build info: 3.2.2,
https://github.com/apache/spark, HEAD,
78a5825fe266c0884d2dd18cbca9625fa258d7f7, 2022-07-11T15:44:21Z
22/11/30 14:59:33 INFO ShimLoader: findURLClassLoader found a
URLClassLoader org.apache.spark.util.MutableURLClassLoader@1530c739
22/11/30 14:59:33 INFO ShimLoader: Updating spark classloader
org.apache.spark.util.MutableURLClassLoader@1530c739 with the URLs:
jar:file:/home/mwadmin/spark-3.2.2-bin-hadoop3.2/jars/rapids-4-spark_2.12-22.10.0.jar!/spark3xx-common/,
jar:file:/home/mwadmin/spark-3.2.2-bin-hadoop3.2/jars/rapids-4-spark_2.12-22.10.0.jar!/spark322/
22/11/30 14:59:33 INFO ShimLoader: Spark classLoader
org.apache.spark.util.MutableURLClassLoader@1530c739 updated successfully
22/11/30 14:59:33 INFO ShimLoader: Updating spark classloader
org.apache.spark.util.MutableURLClassLoader@1530c739 with the URLs:
jar:file:/home/mwadmin/spark-3.2.2-bin-hadoop3.2/jars/rapids-4-spark_2.12-22.10.0.jar!/spark3xx-common/,
jar:file:/home/mwadmin/spark-3.2.2-bin-hadoop3.2/jars/rapids-4-spark_2.12-22.10.0.jar!/spark322/
22/11/30 14:59:33 INFO ShimLoader: Spark classLoader
org.apache.spark.util.MutableURLClassLoader@1530c739 updated successfully
22/11/30 14:59:33 INFO RapidsPluginUtils: RAPIDS Accelerator build:
{version=22.10.0, user=, url=https://github.com/NVIDIA/spark-rapids.git,
date=2022-10-17T11:25:41Z,
revision=c75a2eafc9ce9fb3e6ab75c6677d97bf681bff50, cudf_version=22.10.0,
branch=HEAD}
22/11/30 14:59:33 INFO RapidsPluginUtils: RAPIDS Accelerator JNI build:
{version=22.10.0, user=, url=https://github.com/NVIDIA/spark-rapids-jni.git,
date=2022-10-14T05:19:41Z,
revision=b2c02b61afe1747f3741d6c5e2064edb8da51b32, branch=HEAD}
22/11/30 14:59:33 INFO RapidsPluginUtils: cudf build: {version=22.10.0,
user=, date=2022-10-14T01:51:22Z,
revision=8ffe375d85f8fd0f98e0052f36ccd820a669d0ab, branch=HEAD}
22/11/30 14:59:33 WARN RapidsPluginUtils: RAPIDS Accelerator 22.10.0 using
cudf 22.10.0.
22/11/30 14:59:33 WARN RapidsPluginUtils:
spark.rapids.sql.multiThreadedRead.numThreads is set to 20.
22/11/30 14:59:33 WARN RapidsPluginUtils: RAPIDS Accelerator is enabled, to
disable GPU support set `spark.rapids.sql.enabled` to false.
22/11/30 14:59:33 WARN RapidsPluginUtils: spark.rapids.sql.explain is set
to `NOT_ON_GPU`. Set it to 'NONE' to suppress the diagnostics logging about
the query placement on the GPU.
22/11/30 14:59:33 INFO DriverPluginContainer: Initialized driver component
for plugin com.nvidia.spark.SQLPlugin.
22/11/30 14:59:33 WARN ResourceUtils: The configuration of resource: gpu
(exec = 1, task = 0.5/2, runnable tasks = 2) will result in wasted
resources due to resource cpus limiting the number of runnable tasks per
executor to: 1. Please adjust your configuration.
22/11/30 14:59:34 INFO Executor: Starting executor ID driver on host
***.***.**.**
22/11/30 14:59:34 INFO RapidsExecutorPlugin: RAPIDS Accelerator build:
{version=22.10.0, user=, url=https://github.com/NVIDIA/spark-rapids.git,
date=2022-10-17T11:25:41Z,
revision=c75a2eafc9ce9fb3e6ab75c6677d97bf681bff50, cudf_version=22.10.0,
branch=HEAD}
22/11/30 14:59:34 INFO RapidsExecutorPlugin: cudf build: {version=22.10.0,
user=, date=2022-10-14T01:51:22Z,
revision=8ffe375d85f8fd0f98e0052f36ccd820a669d0ab, branch=HEAD}
22/11/30 14:59:34 INFO RapidsExecutorPlugin: Initializing memory from
Executor Plugin
22/11/30 14:59:47 INFO Executor: Told to re-register on heartbeat
22/11/30 14:59:47 INFO BlockManager: BlockManager null re-registering with
master
22/11/30 14:59:48 INFO BlockManagerMaster: Registering BlockManager null
22/11/30 14:59:48 ERROR Inbox: Ignoring error
java.lang.NullPointerException
at org.apache.spark.storage.BlockManagerMasterEndpoint.org
$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:534)
at
org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:117)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at org.apache.spark.rpc.netty.MessageLoop.org
$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
22/11/30 14:59:48 WARN Executor: Issue communicating with driver in
heartbeater
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:103)
at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:87)
at
org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:78)
at org.apache.spark.storage.BlockManager.reregister(BlockManager.scala:626)
at org.apache.spark.executor.Executor.reportHeartBeat(Executor.scala:1009)
at
org.apache.spark.executor.Executor.$anonfun$heartbeater$1(Executor.scala:212)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2048)
at org.apache.spark.Heartbeater$$anon$1.run(Heartbeater.scala:46)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.NullPointerException
at org.apache.spark.storage.BlockManagerMasterEndpoint.org
$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:534)
at
org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:117)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at org.apache.spark.rpc.netty.MessageLoop.org
$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
22/11/30 14:59:52 INFO GpuDeviceManager: Initializing RMM ASYNC pool size =
3137.0625 MB on gpuId 0
22/11/30 14:59:52 INFO GpuDeviceManager: Using per-thread default stream
22/11/30 14:59:52 ERROR RapidsExecutorPlugin: Exception in the executor
plugin, shutting down!
*ai.rapids.cudf.CudfException: RMM failure at:
/home/jenkins/agent/workspace/jenkins-cudf-release-39-cuda11/cpp/build/_deps/rmm-src/include/rmm/mr/device/cuda_async_memory_resource.hpp:90:
cudaMallocAsync not supported with this CUDA driver/runtime version*
at ai.rapids.cudf.Rmm.initializeInternal(Native Method)
at ai.rapids.cudf.Rmm.initialize(Rmm.java:119)
at
com.nvidia.spark.rapids.GpuDeviceManager$.initializeRmm(GpuDeviceManager.scala:296)
at
com.nvidia.spark.rapids.GpuDeviceManager$.initializeMemory(GpuDeviceManager.scala:328)
at
com.nvidia.spark.rapids.GpuDeviceManager$.initializeGpuAndMemory(GpuDeviceManager.scala:137)
at com.nvidia.spark.rapids.RapidsExecutorPlugin.init(Plugin.scala:258)
at
org.apache.spark.internal.plugin.ExecutorPluginContainer.$anonfun$executorPlugins$1(PluginContainer.scala:125)
at
scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
at
org.apache.spark.internal.plugin.ExecutorPluginContainer.<init>(PluginContainer.scala:113)
at
org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:211)
at
org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:199)
at org.apache.spark.executor.Executor.$anonfun$plugins$1(Executor.scala:253)
at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:231)
at org.apache.spark.executor.Executor.<init>(Executor.scala:253)
at
org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64)
at
org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:581)
at
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:750)
22/11/30 14:59:52 INFO DiskBlockManager: Shutdown hook called
22/11/30 14:59:52 INFO ShutdownHookManager: Shutdown hook called
22/11/30 14:59:52 INFO ShutdownHookManager: Deleting directory
/tmp/spark-58488513-7d53-42f2-8bc4-cdcb34b5cf49
22/11/30 14:59:52 INFO ShutdownHookManager: Deleting directory
/tmp/spark-24b8e0ea-43d4-430a-9756-b1e84ceaa1ff/userFiles-5ce7f28f-16db-48fd-94bd-e9ef563c01f1
22/11/30 14:59:52 INFO ShutdownHookManager: Deleting directory
/tmp/spark-24b8e0ea-43d4-430a-9756-b1e84ceaa1ff