You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Hrishikesh Mishra <sd...@gmail.com> on 2020/05/07 11:12:37 UTC

java.lang.OutOfMemoryError Spark Worker

Hi

I am getting out of memory error in worker log in streaming jobs in every
couple of hours. After this worker dies. There is no shuffle, no
aggression, no. caching  in job, its just a transformation.
I'm not able to identify where is the problem, driver or executor. And why
worker getting dead after the OOM streaming job should die. Am I missing
something.

Driver Memory:  2g
Executor memory: 4g

Spark Version:  2.4
Kafka Direct Stream
Spark Standalone Cluster.


20/05/06 12:52:20 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users  with view permissions: Set(root); groups
with view permissions: Set(); users  with modify permissions: Set(root);
groups with modify permissions: Set()

20/05/06 12:53:03 ERROR SparkUncaughtExceptionHandler: Uncaught exception
in thread Thread[ExecutorRunner for app-20200506124717-10226/0,5,main]

java.lang.OutOfMemoryError: Java heap space

at org.apache.xerces.util.XMLStringBuffer.append(Unknown Source)

at org.apache.xerces.impl.XMLEntityScanner.scanData(Unknown Source)

at org.apache.xerces.impl.XMLScanner.scanComment(Unknown Source)

at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanComment(Unknown
Source)

at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)

at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)

at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)

at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)

at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)

at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2480)

at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)

at
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)

at
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)

at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)

at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143)

at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115)

at
org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464)

at
org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)

at
org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:114)

at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:114)

at org.apache.spark.deploy.worker.ExecutorRunner.org
$apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:149)

at
org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73)

20/05/06 12:53:38 INFO DriverRunner: Worker shutting down, killing driver
driver-20200505181719-1187

20/05/06 12:53:38 INFO DriverRunner: Killing driver process!




Regards
Hrishi

Re: java.lang.OutOfMemoryError Spark Worker

Posted by Russell Spitzer <ru...@gmail.com>.
The error is in the Spark Standalone Worker. It's hitting an OOM while
launching/running an executor process. Specifically it's running out of
memory when parsing the hadoop configuration trying to figure out the
env/command line to run

https://github.com/apache/spark/blob/branch-2.4/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala#L142-L149

Now usually this is something that I wouldn't expect to happen, since a
Spark Worker is generally a very lightweight process. Unless it was
accumulating a lot of state it should be relatively small and it should be
very unlikely that generating a command
line string would cause this error unless the application configuration was
gigantic. So while it's possible you just have very large hadoop.xml files
it is probably not this specific action that is ooming, but rather this is
the straw that broke
the camel's back and the worker just has too much other state.

This may not be pathologic, and it may just be you are running a lot of
executors, or it's keeping track of lots of started and shutdown executor
metadata or something like that and it's not a big deal.
You could fix this by limiting the amount of metadata preserved after jobs
are run see (spark.deploy.* options for retaining apps and spark worker
cleanup)
or by increasing the  Spark Worker's heap (SPARK_DAEMON_MEMORY)

If I hit this I would start by bumping Daemon Memory

On Fri, May 8, 2020 at 11:59 AM Hrishikesh Mishra <sd...@gmail.com>
wrote:

> We submit spark job through spark-submit command, Like below one.
>
>
> sudo /var/lib/pf-spark/bin/spark-submit \
> --total-executor-cores 30 \
> --driver-cores 2 \
> --class com.hrishikesh.mishra.Main\
> --master spark://XX.XX.XXX.19:6066  \
> --deploy-mode cluster  \
> --supervise
> http://XX.XX.XXX.19:90/jar/fk-runner-framework-1.0-SNAPSHOT.jar
>
>
>
>
> We have python http server, where we hosted all jars.
>
> The user kill the driver driver-20200508153502-1291 and its visible in log
> also, but this is not problem. OOM is separate from this.
>
> 20/05/08 15:36:55 INFO Worker: Asked to kill driver
> driver-20200508153502-1291
>
> 20/05/08 15:36:55 INFO DriverRunner: Killing driver process!
>
> 20/05/08 15:36:55 INFO CommandUtils: Redirection to
> /grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed
>
> 20/05/08 15:36:55 INFO CommandUtils: Redirection to
> /grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed
>
> 20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application
> app-20200508153654-11776 removed, cleanupLocalDirs = true
>
> 20/05/08 *15:36:55* INFO Worker: Driver* driver-20200508153502-1291 was
> killed by user*
>
> *20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception
> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
> stacktrace] was thrown by a user handler's exceptionCaught() method while
> handling the following exception:*
>
> *java.lang.OutOfMemoryError: Java heap space*
>
> *20/05/08 15:43:23 ERROR SparkUncaughtExceptionHandler: Uncaught exception
> in thread Thread[dispatcher-event-loop-6,5,main]*
>
> *java.lang.OutOfMemoryError: Java heap space*
>
> *20/05/08 15:43:17 WARN AbstractChannelHandlerContext: An exception
> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
> stacktrace] was thrown by a user handler's exceptionCaught() method while
> handling the following exception:*
>
> *java.lang.OutOfMemoryError: Java heap space*
>
> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>
> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>
> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>
> 20/05/08 15:43:33 INFO ShutdownHookManager: Shutdown hook called
>
> 20/05/08 15:43:33 INFO ShutdownHookManager: Deleting directory
> /grid/1/spark/local/spark-e045e069-e126-4cff-9512-d36ad30ee922
>
>
> On Fri, May 8, 2020 at 9:27 PM Jacek Laskowski <ja...@japila.pl> wrote:
>
>> Hi,
>>
>> It's been a while since I worked with Spark Standalone, but I'd check the
>> logs of the workers. How do you spark-submit the app?
>>
>> DId you check /grid/1/spark/work/driver-20200508153502-1291 directory?
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://about.me/JacekLaskowski
>> "The Internals Of" Online Books <https://books.japila.pl/>
>> Follow me on https://twitter.com/jaceklaskowski
>>
>> <https://twitter.com/jaceklaskowski>
>>
>>
>> On Fri, May 8, 2020 at 2:32 PM Hrishikesh Mishra <sd...@gmail.com>
>> wrote:
>>
>>> Thanks Jacek for quick response.
>>> Due to our system constraints, we can't move to Structured Streaming
>>> now. But definitely YARN can be tried out.
>>>
>>> But my problem is I'm able to figure out where is the issue, Driver,
>>> Executor, or Worker. Even exceptions are clueless.  Please see the below
>>> exception, I'm unable to spot the issue for OOM.
>>>
>>> 20/05/08 15:36:55 INFO Worker: Asked to kill driver
>>> driver-20200508153502-1291
>>>
>>> 20/05/08 15:36:55 INFO DriverRunner: Killing driver process!
>>>
>>> 20/05/08 15:36:55 INFO CommandUtils: Redirection to
>>> /grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed
>>>
>>> 20/05/08 15:36:55 INFO CommandUtils: Redirection to
>>> /grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed
>>>
>>> 20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application
>>> app-20200508153654-11776 removed, cleanupLocalDirs = true
>>>
>>> 20/05/08 15:36:55 INFO Worker: Driver driver-20200508153502-1291 was
>>> killed by user
>>>
>>> *20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception
>>> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
>>> stacktrace] was thrown by a user handler's exceptionCaught() method while
>>> handling the following exception:*
>>>
>>> *java.lang.OutOfMemoryError: Java heap space*
>>>
>>> *20/05/08 15:43:23 ERROR SparkUncaughtExceptionHandler: Uncaught
>>> exception in thread Thread[dispatcher-event-loop-6,5,main]*
>>>
>>> *java.lang.OutOfMemoryError: Java heap space*
>>>
>>> *20/05/08 15:43:17 WARN AbstractChannelHandlerContext: An exception
>>> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
>>> stacktrace] was thrown by a user handler's exceptionCaught() method while
>>> handling the following exception:*
>>>
>>> *java.lang.OutOfMemoryError: Java heap space*
>>>
>>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>>
>>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>>
>>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>>
>>> 20/05/08 15:43:33 INFO ShutdownHookManager: Shutdown hook called
>>>
>>> 20/05/08 15:43:33 INFO ShutdownHookManager: Deleting directory
>>> /grid/1/spark/local/spark-e045e069-e126-4cff-9512-d36ad30ee922
>>>
>>>
>>>
>>>
>>> On Fri, May 8, 2020 at 5:14 PM Jacek Laskowski <ja...@japila.pl> wrote:
>>>
>>>> Hi,
>>>>
>>>> Sorry for being perhaps too harsh, but when you asked "Am I missing
>>>> something. " and I noticed this "Kafka Direct Stream" and "Spark Standalone
>>>> Cluster. " I immediately thought "Yeah...please upgrade your Spark env to
>>>> use Spark Structured Streaming at the very least and/or use YARN as the
>>>> cluster manager".
>>>>
>>>> Another thought was that the user code (your code) could be leaking
>>>> resources so Spark eventually reports heap-related errors that may not
>>>> necessarily be Spark's.
>>>>
>>>> Pozdrawiam,
>>>> Jacek Laskowski
>>>> ----
>>>> https://about.me/JacekLaskowski
>>>> "The Internals Of" Online Books <https://books.japila.pl/>
>>>> Follow me on https://twitter.com/jaceklaskowski
>>>>
>>>> <https://twitter.com/jaceklaskowski>
>>>>
>>>>
>>>> On Thu, May 7, 2020 at 1:12 PM Hrishikesh Mishra <sd...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> I am getting out of memory error in worker log in streaming jobs in
>>>>> every couple of hours. After this worker dies. There is no shuffle, no
>>>>> aggression, no. caching  in job, its just a transformation.
>>>>> I'm not able to identify where is the problem, driver or executor. And
>>>>> why worker getting dead after the OOM streaming job should die. Am I
>>>>> missing something.
>>>>>
>>>>> Driver Memory:  2g
>>>>> Executor memory: 4g
>>>>>
>>>>> Spark Version:  2.4
>>>>> Kafka Direct Stream
>>>>> Spark Standalone Cluster.
>>>>>
>>>>>
>>>>> 20/05/06 12:52:20 INFO SecurityManager: SecurityManager:
>>>>> authentication disabled; ui acls disabled; users  with view permissions:
>>>>> Set(root); groups with view permissions: Set(); users  with modify
>>>>> permissions: Set(root); groups with modify permissions: Set()
>>>>>
>>>>> 20/05/06 12:53:03 ERROR SparkUncaughtExceptionHandler: Uncaught
>>>>> exception in thread Thread[ExecutorRunner for
>>>>> app-20200506124717-10226/0,5,main]
>>>>>
>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>
>>>>> at org.apache.xerces.util.XMLStringBuffer.append(Unknown Source)
>>>>>
>>>>> at org.apache.xerces.impl.XMLEntityScanner.scanData(Unknown Source)
>>>>>
>>>>> at org.apache.xerces.impl.XMLScanner.scanComment(Unknown Source)
>>>>>
>>>>> at
>>>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanComment(Unknown
>>>>> Source)
>>>>>
>>>>> at
>>>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>>>>> Source)
>>>>>
>>>>> at
>>>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
>>>>> Source)
>>>>>
>>>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>>>>
>>>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>>>>
>>>>> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>>>>>
>>>>> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>>>>>
>>>>> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>>>>>
>>>>> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
>>>>>
>>>>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2480)
>>>>>
>>>>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
>>>>>
>>>>> at
>>>>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
>>>>>
>>>>> at
>>>>> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
>>>>>
>>>>> at
>>>>> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
>>>>>
>>>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143)
>>>>>
>>>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115)
>>>>>
>>>>> at
>>>>> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464)
>>>>>
>>>>> at
>>>>> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
>>>>>
>>>>> at
>>>>> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:114)
>>>>>
>>>>> at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:114)
>>>>>
>>>>> at org.apache.spark.deploy.worker.ExecutorRunner.org
>>>>> $apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:149)
>>>>>
>>>>> at
>>>>> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73)
>>>>>
>>>>> 20/05/06 12:53:38 INFO DriverRunner: Worker shutting down, killing
>>>>> driver driver-20200505181719-1187
>>>>>
>>>>> 20/05/06 12:53:38 INFO DriverRunner: Killing driver process!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Regards
>>>>> Hrishi
>>>>>
>>>>

Re: java.lang.OutOfMemoryError Spark Worker

Posted by Hrishikesh Mishra <sd...@gmail.com>.
Configuration:

Driver memory we tried: 2GB / 4GB / 5GB
Executor memory we tried: 4G / 5GB
Even reduced: *spark.memory.fraction *to 0.2  (we are not using cache)
VM Memory: 32 GB and 8 core
We tried for SPARK_WORKER_MEMORY:  30GB / 24GB
SPARK_WORKER_CORES: 32 (because jobs are not CPU bound )
SPARK_WORKER_INSTANCES: 1


What we feel there is not enable space for user classes / objects or clean
up for these is not happening frequently.





On Sat, May 9, 2020 at 12:30 AM Amit Sharma <re...@gmail.com> wrote:

> What memory you are assigning per executor. What is the driver memory
> configuration?
>
>
> Thanks
> Amit
>
> On Fri, May 8, 2020 at 12:59 PM Hrishikesh Mishra <sd...@gmail.com>
> wrote:
>
>> We submit spark job through spark-submit command, Like below one.
>>
>>
>> sudo /var/lib/pf-spark/bin/spark-submit \
>> --total-executor-cores 30 \
>> --driver-cores 2 \
>> --class com.hrishikesh.mishra.Main\
>> --master spark://XX.XX.XXX.19:6066  \
>> --deploy-mode cluster  \
>> --supervise
>> http://XX.XX.XXX.19:90/jar/fk-runner-framework-1.0-SNAPSHOT.jar
>>
>>
>>
>>
>> We have python http server, where we hosted all jars.
>>
>> The user kill the driver driver-20200508153502-1291 and its visible in
>> log also, but this is not problem. OOM is separate from this.
>>
>> 20/05/08 15:36:55 INFO Worker: Asked to kill driver
>> driver-20200508153502-1291
>>
>> 20/05/08 15:36:55 INFO DriverRunner: Killing driver process!
>>
>> 20/05/08 15:36:55 INFO CommandUtils: Redirection to
>> /grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed
>>
>> 20/05/08 15:36:55 INFO CommandUtils: Redirection to
>> /grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed
>>
>> 20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application
>> app-20200508153654-11776 removed, cleanupLocalDirs = true
>>
>> 20/05/08 *15:36:55* INFO Worker: Driver* driver-20200508153502-1291 was
>> killed by user*
>>
>> *20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception
>> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
>> stacktrace] was thrown by a user handler's exceptionCaught() method while
>> handling the following exception:*
>>
>> *java.lang.OutOfMemoryError: Java heap space*
>>
>> *20/05/08 15:43:23 ERROR SparkUncaughtExceptionHandler: Uncaught
>> exception in thread Thread[dispatcher-event-loop-6,5,main]*
>>
>> *java.lang.OutOfMemoryError: Java heap space*
>>
>> *20/05/08 15:43:17 WARN AbstractChannelHandlerContext: An exception
>> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
>> stacktrace] was thrown by a user handler's exceptionCaught() method while
>> handling the following exception:*
>>
>> *java.lang.OutOfMemoryError: Java heap space*
>>
>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>
>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>
>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>
>> 20/05/08 15:43:33 INFO ShutdownHookManager: Shutdown hook called
>>
>> 20/05/08 15:43:33 INFO ShutdownHookManager: Deleting directory
>> /grid/1/spark/local/spark-e045e069-e126-4cff-9512-d36ad30ee922
>>
>>
>> On Fri, May 8, 2020 at 9:27 PM Jacek Laskowski <ja...@japila.pl> wrote:
>>
>>> Hi,
>>>
>>> It's been a while since I worked with Spark Standalone, but I'd check
>>> the logs of the workers. How do you spark-submit the app?
>>>
>>> DId you check /grid/1/spark/work/driver-20200508153502-1291 directory?
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> ----
>>> https://about.me/JacekLaskowski
>>> "The Internals Of" Online Books <https://books.japila.pl/>
>>> Follow me on https://twitter.com/jaceklaskowski
>>>
>>> <https://twitter.com/jaceklaskowski>
>>>
>>>
>>> On Fri, May 8, 2020 at 2:32 PM Hrishikesh Mishra <sd...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Jacek for quick response.
>>>> Due to our system constraints, we can't move to Structured Streaming
>>>> now. But definitely YARN can be tried out.
>>>>
>>>> But my problem is I'm able to figure out where is the issue, Driver,
>>>> Executor, or Worker. Even exceptions are clueless.  Please see the below
>>>> exception, I'm unable to spot the issue for OOM.
>>>>
>>>> 20/05/08 15:36:55 INFO Worker: Asked to kill driver
>>>> driver-20200508153502-1291
>>>>
>>>> 20/05/08 15:36:55 INFO DriverRunner: Killing driver process!
>>>>
>>>> 20/05/08 15:36:55 INFO CommandUtils: Redirection to
>>>> /grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed
>>>>
>>>> 20/05/08 15:36:55 INFO CommandUtils: Redirection to
>>>> /grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed
>>>>
>>>> 20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application
>>>> app-20200508153654-11776 removed, cleanupLocalDirs = true
>>>>
>>>> 20/05/08 15:36:55 INFO Worker: Driver driver-20200508153502-1291 was
>>>> killed by user
>>>>
>>>> *20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception
>>>> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
>>>> stacktrace] was thrown by a user handler's exceptionCaught() method while
>>>> handling the following exception:*
>>>>
>>>> *java.lang.OutOfMemoryError: Java heap space*
>>>>
>>>> *20/05/08 15:43:23 ERROR SparkUncaughtExceptionHandler: Uncaught
>>>> exception in thread Thread[dispatcher-event-loop-6,5,main]*
>>>>
>>>> *java.lang.OutOfMemoryError: Java heap space*
>>>>
>>>> *20/05/08 15:43:17 WARN AbstractChannelHandlerContext: An exception
>>>> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
>>>> stacktrace] was thrown by a user handler's exceptionCaught() method while
>>>> handling the following exception:*
>>>>
>>>> *java.lang.OutOfMemoryError: Java heap space*
>>>>
>>>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>>>
>>>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>>>
>>>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>>>
>>>> 20/05/08 15:43:33 INFO ShutdownHookManager: Shutdown hook called
>>>>
>>>> 20/05/08 15:43:33 INFO ShutdownHookManager: Deleting directory
>>>> /grid/1/spark/local/spark-e045e069-e126-4cff-9512-d36ad30ee922
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, May 8, 2020 at 5:14 PM Jacek Laskowski <ja...@japila.pl> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Sorry for being perhaps too harsh, but when you asked "Am I missing
>>>>> something. " and I noticed this "Kafka Direct Stream" and "Spark Standalone
>>>>> Cluster. " I immediately thought "Yeah...please upgrade your Spark env to
>>>>> use Spark Structured Streaming at the very least and/or use YARN as the
>>>>> cluster manager".
>>>>>
>>>>> Another thought was that the user code (your code) could be leaking
>>>>> resources so Spark eventually reports heap-related errors that may not
>>>>> necessarily be Spark's.
>>>>>
>>>>> Pozdrawiam,
>>>>> Jacek Laskowski
>>>>> ----
>>>>> https://about.me/JacekLaskowski
>>>>> "The Internals Of" Online Books <https://books.japila.pl/>
>>>>> Follow me on https://twitter.com/jaceklaskowski
>>>>>
>>>>> <https://twitter.com/jaceklaskowski>
>>>>>
>>>>>
>>>>> On Thu, May 7, 2020 at 1:12 PM Hrishikesh Mishra <sd...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> I am getting out of memory error in worker log in streaming jobs in
>>>>>> every couple of hours. After this worker dies. There is no shuffle, no
>>>>>> aggression, no. caching  in job, its just a transformation.
>>>>>> I'm not able to identify where is the problem, driver or executor.
>>>>>> And why worker getting dead after the OOM streaming job should die. Am I
>>>>>> missing something.
>>>>>>
>>>>>> Driver Memory:  2g
>>>>>> Executor memory: 4g
>>>>>>
>>>>>> Spark Version:  2.4
>>>>>> Kafka Direct Stream
>>>>>> Spark Standalone Cluster.
>>>>>>
>>>>>>
>>>>>> 20/05/06 12:52:20 INFO SecurityManager: SecurityManager:
>>>>>> authentication disabled; ui acls disabled; users  with view permissions:
>>>>>> Set(root); groups with view permissions: Set(); users  with modify
>>>>>> permissions: Set(root); groups with modify permissions: Set()
>>>>>>
>>>>>> 20/05/06 12:53:03 ERROR SparkUncaughtExceptionHandler: Uncaught
>>>>>> exception in thread Thread[ExecutorRunner for
>>>>>> app-20200506124717-10226/0,5,main]
>>>>>>
>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>>
>>>>>> at org.apache.xerces.util.XMLStringBuffer.append(Unknown Source)
>>>>>>
>>>>>> at org.apache.xerces.impl.XMLEntityScanner.scanData(Unknown Source)
>>>>>>
>>>>>> at org.apache.xerces.impl.XMLScanner.scanComment(Unknown Source)
>>>>>>
>>>>>> at
>>>>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanComment(Unknown
>>>>>> Source)
>>>>>>
>>>>>> at
>>>>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>>>>>> Source)
>>>>>>
>>>>>> at
>>>>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
>>>>>> Source)
>>>>>>
>>>>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>>>>>
>>>>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>>>>>
>>>>>> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>>>>>>
>>>>>> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>>>>>>
>>>>>> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>>>>>>
>>>>>> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
>>>>>>
>>>>>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2480)
>>>>>>
>>>>>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
>>>>>>
>>>>>> at
>>>>>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
>>>>>>
>>>>>> at
>>>>>> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
>>>>>>
>>>>>> at
>>>>>> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
>>>>>>
>>>>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143)
>>>>>>
>>>>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115)
>>>>>>
>>>>>> at
>>>>>> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464)
>>>>>>
>>>>>> at
>>>>>> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
>>>>>>
>>>>>> at
>>>>>> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:114)
>>>>>>
>>>>>> at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:114)
>>>>>>
>>>>>> at org.apache.spark.deploy.worker.ExecutorRunner.org
>>>>>> $apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:149)
>>>>>>
>>>>>> at
>>>>>> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73)
>>>>>>
>>>>>> 20/05/06 12:53:38 INFO DriverRunner: Worker shutting down, killing
>>>>>> driver driver-20200505181719-1187
>>>>>>
>>>>>> 20/05/06 12:53:38 INFO DriverRunner: Killing driver process!
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>> Hrishi
>>>>>>
>>>>>

Re: java.lang.OutOfMemoryError Spark Worker

Posted by Hrishikesh Mishra <sd...@gmail.com>.
We submit spark job through spark-submit command, Like below one.


sudo /var/lib/pf-spark/bin/spark-submit \
--total-executor-cores 30 \
--driver-cores 2 \
--class com.hrishikesh.mishra.Main\
--master spark://XX.XX.XXX.19:6066  \
--deploy-mode cluster  \
--supervise http://XX.XX.XXX.19:90/jar/fk-runner-framework-1.0-SNAPSHOT.jar




We have python http server, where we hosted all jars.

The user kill the driver driver-20200508153502-1291 and its visible in log
also, but this is not problem. OOM is separate from this.

20/05/08 15:36:55 INFO Worker: Asked to kill driver
driver-20200508153502-1291

20/05/08 15:36:55 INFO DriverRunner: Killing driver process!

20/05/08 15:36:55 INFO CommandUtils: Redirection to
/grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed

20/05/08 15:36:55 INFO CommandUtils: Redirection to
/grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed

20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application
app-20200508153654-11776 removed, cleanupLocalDirs = true

20/05/08 *15:36:55* INFO Worker: Driver* driver-20200508153502-1291 was
killed by user*

*20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception
'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
stacktrace] was thrown by a user handler's exceptionCaught() method while
handling the following exception:*

*java.lang.OutOfMemoryError: Java heap space*

*20/05/08 15:43:23 ERROR SparkUncaughtExceptionHandler: Uncaught exception
in thread Thread[dispatcher-event-loop-6,5,main]*

*java.lang.OutOfMemoryError: Java heap space*

*20/05/08 15:43:17 WARN AbstractChannelHandlerContext: An exception
'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
stacktrace] was thrown by a user handler's exceptionCaught() method while
handling the following exception:*

*java.lang.OutOfMemoryError: Java heap space*

20/05/08 15:43:33 INFO ExecutorRunner: Killing process!

20/05/08 15:43:33 INFO ExecutorRunner: Killing process!

20/05/08 15:43:33 INFO ExecutorRunner: Killing process!

20/05/08 15:43:33 INFO ShutdownHookManager: Shutdown hook called

20/05/08 15:43:33 INFO ShutdownHookManager: Deleting directory
/grid/1/spark/local/spark-e045e069-e126-4cff-9512-d36ad30ee922


On Fri, May 8, 2020 at 9:27 PM Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> It's been a while since I worked with Spark Standalone, but I'd check the
> logs of the workers. How do you spark-submit the app?
>
> DId you check /grid/1/spark/work/driver-20200508153502-1291 directory?
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books <https://books.japila.pl/>
> Follow me on https://twitter.com/jaceklaskowski
>
> <https://twitter.com/jaceklaskowski>
>
>
> On Fri, May 8, 2020 at 2:32 PM Hrishikesh Mishra <sd...@gmail.com>
> wrote:
>
>> Thanks Jacek for quick response.
>> Due to our system constraints, we can't move to Structured Streaming now.
>> But definitely YARN can be tried out.
>>
>> But my problem is I'm able to figure out where is the issue, Driver,
>> Executor, or Worker. Even exceptions are clueless.  Please see the below
>> exception, I'm unable to spot the issue for OOM.
>>
>> 20/05/08 15:36:55 INFO Worker: Asked to kill driver
>> driver-20200508153502-1291
>>
>> 20/05/08 15:36:55 INFO DriverRunner: Killing driver process!
>>
>> 20/05/08 15:36:55 INFO CommandUtils: Redirection to
>> /grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed
>>
>> 20/05/08 15:36:55 INFO CommandUtils: Redirection to
>> /grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed
>>
>> 20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application
>> app-20200508153654-11776 removed, cleanupLocalDirs = true
>>
>> 20/05/08 15:36:55 INFO Worker: Driver driver-20200508153502-1291 was
>> killed by user
>>
>> *20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception
>> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
>> stacktrace] was thrown by a user handler's exceptionCaught() method while
>> handling the following exception:*
>>
>> *java.lang.OutOfMemoryError: Java heap space*
>>
>> *20/05/08 15:43:23 ERROR SparkUncaughtExceptionHandler: Uncaught
>> exception in thread Thread[dispatcher-event-loop-6,5,main]*
>>
>> *java.lang.OutOfMemoryError: Java heap space*
>>
>> *20/05/08 15:43:17 WARN AbstractChannelHandlerContext: An exception
>> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
>> stacktrace] was thrown by a user handler's exceptionCaught() method while
>> handling the following exception:*
>>
>> *java.lang.OutOfMemoryError: Java heap space*
>>
>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>
>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>
>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>
>> 20/05/08 15:43:33 INFO ShutdownHookManager: Shutdown hook called
>>
>> 20/05/08 15:43:33 INFO ShutdownHookManager: Deleting directory
>> /grid/1/spark/local/spark-e045e069-e126-4cff-9512-d36ad30ee922
>>
>>
>>
>>
>> On Fri, May 8, 2020 at 5:14 PM Jacek Laskowski <ja...@japila.pl> wrote:
>>
>>> Hi,
>>>
>>> Sorry for being perhaps too harsh, but when you asked "Am I missing
>>> something. " and I noticed this "Kafka Direct Stream" and "Spark Standalone
>>> Cluster. " I immediately thought "Yeah...please upgrade your Spark env to
>>> use Spark Structured Streaming at the very least and/or use YARN as the
>>> cluster manager".
>>>
>>> Another thought was that the user code (your code) could be leaking
>>> resources so Spark eventually reports heap-related errors that may not
>>> necessarily be Spark's.
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> ----
>>> https://about.me/JacekLaskowski
>>> "The Internals Of" Online Books <https://books.japila.pl/>
>>> Follow me on https://twitter.com/jaceklaskowski
>>>
>>> <https://twitter.com/jaceklaskowski>
>>>
>>>
>>> On Thu, May 7, 2020 at 1:12 PM Hrishikesh Mishra <sd...@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> I am getting out of memory error in worker log in streaming jobs in
>>>> every couple of hours. After this worker dies. There is no shuffle, no
>>>> aggression, no. caching  in job, its just a transformation.
>>>> I'm not able to identify where is the problem, driver or executor. And
>>>> why worker getting dead after the OOM streaming job should die. Am I
>>>> missing something.
>>>>
>>>> Driver Memory:  2g
>>>> Executor memory: 4g
>>>>
>>>> Spark Version:  2.4
>>>> Kafka Direct Stream
>>>> Spark Standalone Cluster.
>>>>
>>>>
>>>> 20/05/06 12:52:20 INFO SecurityManager: SecurityManager: authentication
>>>> disabled; ui acls disabled; users  with view permissions: Set(root); groups
>>>> with view permissions: Set(); users  with modify permissions: Set(root);
>>>> groups with modify permissions: Set()
>>>>
>>>> 20/05/06 12:53:03 ERROR SparkUncaughtExceptionHandler: Uncaught
>>>> exception in thread Thread[ExecutorRunner for
>>>> app-20200506124717-10226/0,5,main]
>>>>
>>>> java.lang.OutOfMemoryError: Java heap space
>>>>
>>>> at org.apache.xerces.util.XMLStringBuffer.append(Unknown Source)
>>>>
>>>> at org.apache.xerces.impl.XMLEntityScanner.scanData(Unknown Source)
>>>>
>>>> at org.apache.xerces.impl.XMLScanner.scanComment(Unknown Source)
>>>>
>>>> at
>>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanComment(Unknown
>>>> Source)
>>>>
>>>> at
>>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>>>> Source)
>>>>
>>>> at
>>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
>>>> Source)
>>>>
>>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>>>
>>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>>>
>>>> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>>>>
>>>> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>>>>
>>>> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>>>>
>>>> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
>>>>
>>>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2480)
>>>>
>>>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
>>>>
>>>> at
>>>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
>>>>
>>>> at
>>>> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
>>>>
>>>> at
>>>> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
>>>>
>>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143)
>>>>
>>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115)
>>>>
>>>> at
>>>> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464)
>>>>
>>>> at
>>>> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
>>>>
>>>> at
>>>> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:114)
>>>>
>>>> at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:114)
>>>>
>>>> at org.apache.spark.deploy.worker.ExecutorRunner.org
>>>> $apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:149)
>>>>
>>>> at
>>>> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73)
>>>>
>>>> 20/05/06 12:53:38 INFO DriverRunner: Worker shutting down, killing
>>>> driver driver-20200505181719-1187
>>>>
>>>> 20/05/06 12:53:38 INFO DriverRunner: Killing driver process!
>>>>
>>>>
>>>>
>>>>
>>>> Regards
>>>> Hrishi
>>>>
>>>

Re: java.lang.OutOfMemoryError Spark Worker

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi,

It's been a while since I worked with Spark Standalone, but I'd check the
logs of the workers. How do you spark-submit the app?

DId you check /grid/1/spark/work/driver-20200508153502-1291 directory?

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski

<https://twitter.com/jaceklaskowski>


On Fri, May 8, 2020 at 2:32 PM Hrishikesh Mishra <sd...@gmail.com>
wrote:

> Thanks Jacek for quick response.
> Due to our system constraints, we can't move to Structured Streaming now.
> But definitely YARN can be tried out.
>
> But my problem is I'm able to figure out where is the issue, Driver,
> Executor, or Worker. Even exceptions are clueless.  Please see the below
> exception, I'm unable to spot the issue for OOM.
>
> 20/05/08 15:36:55 INFO Worker: Asked to kill driver
> driver-20200508153502-1291
>
> 20/05/08 15:36:55 INFO DriverRunner: Killing driver process!
>
> 20/05/08 15:36:55 INFO CommandUtils: Redirection to
> /grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed
>
> 20/05/08 15:36:55 INFO CommandUtils: Redirection to
> /grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed
>
> 20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application
> app-20200508153654-11776 removed, cleanupLocalDirs = true
>
> 20/05/08 15:36:55 INFO Worker: Driver driver-20200508153502-1291 was
> killed by user
>
> *20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception
> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
> stacktrace] was thrown by a user handler's exceptionCaught() method while
> handling the following exception:*
>
> *java.lang.OutOfMemoryError: Java heap space*
>
> *20/05/08 15:43:23 ERROR SparkUncaughtExceptionHandler: Uncaught exception
> in thread Thread[dispatcher-event-loop-6,5,main]*
>
> *java.lang.OutOfMemoryError: Java heap space*
>
> *20/05/08 15:43:17 WARN AbstractChannelHandlerContext: An exception
> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
> stacktrace] was thrown by a user handler's exceptionCaught() method while
> handling the following exception:*
>
> *java.lang.OutOfMemoryError: Java heap space*
>
> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>
> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>
> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>
> 20/05/08 15:43:33 INFO ShutdownHookManager: Shutdown hook called
>
> 20/05/08 15:43:33 INFO ShutdownHookManager: Deleting directory
> /grid/1/spark/local/spark-e045e069-e126-4cff-9512-d36ad30ee922
>
>
>
>
> On Fri, May 8, 2020 at 5:14 PM Jacek Laskowski <ja...@japila.pl> wrote:
>
>> Hi,
>>
>> Sorry for being perhaps too harsh, but when you asked "Am I missing
>> something. " and I noticed this "Kafka Direct Stream" and "Spark Standalone
>> Cluster. " I immediately thought "Yeah...please upgrade your Spark env to
>> use Spark Structured Streaming at the very least and/or use YARN as the
>> cluster manager".
>>
>> Another thought was that the user code (your code) could be leaking
>> resources so Spark eventually reports heap-related errors that may not
>> necessarily be Spark's.
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://about.me/JacekLaskowski
>> "The Internals Of" Online Books <https://books.japila.pl/>
>> Follow me on https://twitter.com/jaceklaskowski
>>
>> <https://twitter.com/jaceklaskowski>
>>
>>
>> On Thu, May 7, 2020 at 1:12 PM Hrishikesh Mishra <sd...@gmail.com>
>> wrote:
>>
>>> Hi
>>>
>>> I am getting out of memory error in worker log in streaming jobs in
>>> every couple of hours. After this worker dies. There is no shuffle, no
>>> aggression, no. caching  in job, its just a transformation.
>>> I'm not able to identify where is the problem, driver or executor. And
>>> why worker getting dead after the OOM streaming job should die. Am I
>>> missing something.
>>>
>>> Driver Memory:  2g
>>> Executor memory: 4g
>>>
>>> Spark Version:  2.4
>>> Kafka Direct Stream
>>> Spark Standalone Cluster.
>>>
>>>
>>> 20/05/06 12:52:20 INFO SecurityManager: SecurityManager: authentication
>>> disabled; ui acls disabled; users  with view permissions: Set(root); groups
>>> with view permissions: Set(); users  with modify permissions: Set(root);
>>> groups with modify permissions: Set()
>>>
>>> 20/05/06 12:53:03 ERROR SparkUncaughtExceptionHandler: Uncaught
>>> exception in thread Thread[ExecutorRunner for
>>> app-20200506124717-10226/0,5,main]
>>>
>>> java.lang.OutOfMemoryError: Java heap space
>>>
>>> at org.apache.xerces.util.XMLStringBuffer.append(Unknown Source)
>>>
>>> at org.apache.xerces.impl.XMLEntityScanner.scanData(Unknown Source)
>>>
>>> at org.apache.xerces.impl.XMLScanner.scanComment(Unknown Source)
>>>
>>> at
>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanComment(Unknown
>>> Source)
>>>
>>> at
>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>>> Source)
>>>
>>> at
>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
>>> Source)
>>>
>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>>
>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>>
>>> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>>>
>>> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>>>
>>> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>>>
>>> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
>>>
>>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2480)
>>>
>>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
>>>
>>> at
>>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
>>>
>>> at
>>> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
>>>
>>> at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
>>>
>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143)
>>>
>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115)
>>>
>>> at
>>> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464)
>>>
>>> at
>>> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
>>>
>>> at
>>> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:114)
>>>
>>> at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:114)
>>>
>>> at org.apache.spark.deploy.worker.ExecutorRunner.org
>>> $apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:149)
>>>
>>> at
>>> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73)
>>>
>>> 20/05/06 12:53:38 INFO DriverRunner: Worker shutting down, killing
>>> driver driver-20200505181719-1187
>>>
>>> 20/05/06 12:53:38 INFO DriverRunner: Killing driver process!
>>>
>>>
>>>
>>>
>>> Regards
>>> Hrishi
>>>
>>

Re: java.lang.OutOfMemoryError Spark Worker

Posted by Hrishikesh Mishra <sd...@gmail.com>.
Thanks Jacek for quick response.
Due to our system constraints, we can't move to Structured Streaming now.
But definitely YARN can be tried out.

But my problem is I'm able to figure out where is the issue, Driver,
Executor, or Worker. Even exceptions are clueless.  Please see the below
exception, I'm unable to spot the issue for OOM.

20/05/08 15:36:55 INFO Worker: Asked to kill driver
driver-20200508153502-1291

20/05/08 15:36:55 INFO DriverRunner: Killing driver process!

20/05/08 15:36:55 INFO CommandUtils: Redirection to
/grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed

20/05/08 15:36:55 INFO CommandUtils: Redirection to
/grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed

20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application
app-20200508153654-11776 removed, cleanupLocalDirs = true

20/05/08 15:36:55 INFO Worker: Driver driver-20200508153502-1291 was killed
by user

*20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception
'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
stacktrace] was thrown by a user handler's exceptionCaught() method while
handling the following exception:*

*java.lang.OutOfMemoryError: Java heap space*

*20/05/08 15:43:23 ERROR SparkUncaughtExceptionHandler: Uncaught exception
in thread Thread[dispatcher-event-loop-6,5,main]*

*java.lang.OutOfMemoryError: Java heap space*

*20/05/08 15:43:17 WARN AbstractChannelHandlerContext: An exception
'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
stacktrace] was thrown by a user handler's exceptionCaught() method while
handling the following exception:*

*java.lang.OutOfMemoryError: Java heap space*

20/05/08 15:43:33 INFO ExecutorRunner: Killing process!

20/05/08 15:43:33 INFO ExecutorRunner: Killing process!

20/05/08 15:43:33 INFO ExecutorRunner: Killing process!

20/05/08 15:43:33 INFO ShutdownHookManager: Shutdown hook called

20/05/08 15:43:33 INFO ShutdownHookManager: Deleting directory
/grid/1/spark/local/spark-e045e069-e126-4cff-9512-d36ad30ee922




On Fri, May 8, 2020 at 5:14 PM Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> Sorry for being perhaps too harsh, but when you asked "Am I missing
> something. " and I noticed this "Kafka Direct Stream" and "Spark Standalone
> Cluster. " I immediately thought "Yeah...please upgrade your Spark env to
> use Spark Structured Streaming at the very least and/or use YARN as the
> cluster manager".
>
> Another thought was that the user code (your code) could be leaking
> resources so Spark eventually reports heap-related errors that may not
> necessarily be Spark's.
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books <https://books.japila.pl/>
> Follow me on https://twitter.com/jaceklaskowski
>
> <https://twitter.com/jaceklaskowski>
>
>
> On Thu, May 7, 2020 at 1:12 PM Hrishikesh Mishra <sd...@gmail.com>
> wrote:
>
>> Hi
>>
>> I am getting out of memory error in worker log in streaming jobs in every
>> couple of hours. After this worker dies. There is no shuffle, no
>> aggression, no. caching  in job, its just a transformation.
>> I'm not able to identify where is the problem, driver or executor. And
>> why worker getting dead after the OOM streaming job should die. Am I
>> missing something.
>>
>> Driver Memory:  2g
>> Executor memory: 4g
>>
>> Spark Version:  2.4
>> Kafka Direct Stream
>> Spark Standalone Cluster.
>>
>>
>> 20/05/06 12:52:20 INFO SecurityManager: SecurityManager: authentication
>> disabled; ui acls disabled; users  with view permissions: Set(root); groups
>> with view permissions: Set(); users  with modify permissions: Set(root);
>> groups with modify permissions: Set()
>>
>> 20/05/06 12:53:03 ERROR SparkUncaughtExceptionHandler: Uncaught exception
>> in thread Thread[ExecutorRunner for app-20200506124717-10226/0,5,main]
>>
>> java.lang.OutOfMemoryError: Java heap space
>>
>> at org.apache.xerces.util.XMLStringBuffer.append(Unknown Source)
>>
>> at org.apache.xerces.impl.XMLEntityScanner.scanData(Unknown Source)
>>
>> at org.apache.xerces.impl.XMLScanner.scanComment(Unknown Source)
>>
>> at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanComment(Unknown
>> Source)
>>
>> at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>> Source)
>>
>> at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
>> Source)
>>
>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>
>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>
>> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>>
>> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>>
>> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>>
>> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
>>
>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2480)
>>
>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
>>
>> at
>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
>>
>> at
>> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
>>
>> at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
>>
>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143)
>>
>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115)
>>
>> at
>> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464)
>>
>> at
>> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
>>
>> at
>> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:114)
>>
>> at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:114)
>>
>> at org.apache.spark.deploy.worker.ExecutorRunner.org
>> $apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:149)
>>
>> at
>> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73)
>>
>> 20/05/06 12:53:38 INFO DriverRunner: Worker shutting down, killing driver
>> driver-20200505181719-1187
>>
>> 20/05/06 12:53:38 INFO DriverRunner: Killing driver process!
>>
>>
>>
>>
>> Regards
>> Hrishi
>>
>

Re: java.lang.OutOfMemoryError Spark Worker

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi,

Sorry for being perhaps too harsh, but when you asked "Am I missing
something. " and I noticed this "Kafka Direct Stream" and "Spark Standalone
Cluster. " I immediately thought "Yeah...please upgrade your Spark env to
use Spark Structured Streaming at the very least and/or use YARN as the
cluster manager".

Another thought was that the user code (your code) could be leaking
resources so Spark eventually reports heap-related errors that may not
necessarily be Spark's.

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski

<https://twitter.com/jaceklaskowski>


On Thu, May 7, 2020 at 1:12 PM Hrishikesh Mishra <sd...@gmail.com>
wrote:

> Hi
>
> I am getting out of memory error in worker log in streaming jobs in every
> couple of hours. After this worker dies. There is no shuffle, no
> aggression, no. caching  in job, its just a transformation.
> I'm not able to identify where is the problem, driver or executor. And why
> worker getting dead after the OOM streaming job should die. Am I missing
> something.
>
> Driver Memory:  2g
> Executor memory: 4g
>
> Spark Version:  2.4
> Kafka Direct Stream
> Spark Standalone Cluster.
>
>
> 20/05/06 12:52:20 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users  with view permissions: Set(root); groups
> with view permissions: Set(); users  with modify permissions: Set(root);
> groups with modify permissions: Set()
>
> 20/05/06 12:53:03 ERROR SparkUncaughtExceptionHandler: Uncaught exception
> in thread Thread[ExecutorRunner for app-20200506124717-10226/0,5,main]
>
> java.lang.OutOfMemoryError: Java heap space
>
> at org.apache.xerces.util.XMLStringBuffer.append(Unknown Source)
>
> at org.apache.xerces.impl.XMLEntityScanner.scanData(Unknown Source)
>
> at org.apache.xerces.impl.XMLScanner.scanComment(Unknown Source)
>
> at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanComment(Unknown
> Source)
>
> at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
> Source)
>
> at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
>
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>
> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
>
> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2480)
>
> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
>
> at
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
>
> at
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
>
> at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
>
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143)
>
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115)
>
> at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464)
>
> at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
>
> at
> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:114)
>
> at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:114)
>
> at org.apache.spark.deploy.worker.ExecutorRunner.org
> $apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:149)
>
> at
> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73)
>
> 20/05/06 12:53:38 INFO DriverRunner: Worker shutting down, killing driver
> driver-20200505181719-1187
>
> 20/05/06 12:53:38 INFO DriverRunner: Killing driver process!
>
>
>
>
> Regards
> Hrishi
>

Re: java.lang.OutOfMemoryError Spark Worker

Posted by Hrishikesh Mishra <sd...@gmail.com>.
These errors are completely clueless. No clue why its OOM exception is
coming.


20/05/08 15:36:55 INFO Worker: Asked to kill driver
driver-20200508153502-1291

20/05/08 15:36:55 INFO DriverRunner: Killing driver process!

20/05/08 15:36:55 INFO CommandUtils: Redirection to
/grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed

20/05/08 15:36:55 INFO CommandUtils: Redirection to
/grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed

20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application
app-20200508153654-11776 removed, cleanupLocalDirs = true

20/05/08 15:36:55 INFO Worker: Driver driver-20200508153502-1291 was killed
by user

*20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception
'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
stacktrace] was thrown by a user handler's exceptionCaught() method while
handling the following exception:*

*java.lang.OutOfMemoryError: Java heap space*

*20/05/08 15:43:23 ERROR SparkUncaughtExceptionHandler: Uncaught exception
in thread Thread[dispatcher-event-loop-6,5,main]*

*java.lang.OutOfMemoryError: Java heap space*

*20/05/08 15:43:17 WARN AbstractChannelHandlerContext: An exception
'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
stacktrace] was thrown by a user handler's exceptionCaught() method while
handling the following exception:*

*java.lang.OutOfMemoryError: Java heap space*

20/05/08 15:43:33 INFO ExecutorRunner: Killing process!

20/05/08 15:43:33 INFO ExecutorRunner: Killing process!

20/05/08 15:43:33 INFO ExecutorRunner: Killing process!

20/05/08 15:43:33 INFO ShutdownHookManager: Shutdown hook called

20/05/08 15:43:33 INFO ShutdownHookManager: Deleting directory
/grid/1/spark/local/spark-e045e069-e126-4cff-9512-d36ad30ee922


On Thu, May 7, 2020 at 10:16 PM Hrishikesh Mishra <sd...@gmail.com>
wrote:

> It's only happening for Hadoop config. The exceptions trace are different
> for each time it gets died. And Jobs run for couple hours then worker dies.
>
> Another Reason:
>
> *20/05/02 02:26:34 ERROR SparkUncaughtExceptionHandler: Uncaught exception
> in thread Thread[ExecutorRunner for app-20200501213234-9846/3,5,main]*
>
> *java.lang.OutOfMemoryError: Java heap space*
>
> * at org.apache.xerces.xni.XMLString.toString(Unknown Source)*
>
> at org.apache.xerces.parsers.AbstractDOMParser.characters(Unknown Source)
>
> at org.apache.xerces.xinclude.XIncludeHandler.characters(Unknown Source)
>
> at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown
> Source)
>
> at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
> Source)
>
> at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
>
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>
> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
>
> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2480)
>
> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
>
> at
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
>
> at
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
>
> at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
>
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143)
>
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115)
>
> at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464)
>
> at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
>
> at
> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:114)
>
> at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:114)
>
> at org.apache.spark.deploy.worker.ExecutorRunner.org
> $apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:149)
>
> at
> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73)
>
> *20/05/02 02:26:37 ERROR SparkUncaughtExceptionHandler: Uncaught exception
> in thread Thread[dispatcher-event-loop-3,5,main]*
>
> *java.lang.OutOfMemoryError: Java heap space*
>
> * at java.lang.Class.newInstance(Class.java:411)*
>
> at
> sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:403)
>
> at
> sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:394)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at
> sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:393)
>
> at
> sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java:112)
>
> at
> sun.reflect.ReflectionFactory.generateConstructor(ReflectionFactory.java:398)
>
> at
> sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java:360)
>
> at
> java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java:1520)
>
> at java.io.ObjectStreamClass.access$1500(ObjectStreamClass.java:79)
>
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:507)
>
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:482)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:482)
>
> at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:379)
>
> at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:478)
>
> at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:379)
>
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134)
>
> at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>
> at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
>
> at
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
>
> at
> org.apache.spark.rpc.netty.RequestMessage.serialize(NettyRpcEnv.scala:565)
>
> at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:193)
>
> at
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:528)
>
> at org.apache.spark.deploy.worker.Worker.org
> $apache$spark$deploy$worker$Worker$$sendToMaster(Worker.scala:658)
>
> *20/05/02 02:26:34 ERROR SparkUncaughtExceptionHandler: Uncaught exception
> in thread Thread[spark-shuffle-directory-cleaner-4-1,5,main]*
>
> *java.lang.OutOfMemoryError: Java heap space*
>
> * at java.io.UnixFileSystem.resolve(UnixFileSystem.java:108)*
>
> * at java.io.File.<init>(File.java:262)*
>
> * at java.io.File.listFiles(File.java:1253)*
>
> at
> org.apache.spark.network.util.JavaUtils.listFilesSafely(JavaUtils.java:177)
>
> at
> org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:140)
>
> at
> org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
>
> at
> org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128)
>
> at
> org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
>
> at
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.deleteNonShuffleFiles(ExternalShuffleBlockResolver.java:269)
>
> at
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.lambda$executorRemoved$1(ExternalShuffleBlockResolver.java:235)
>
> at
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver$$Lambda$19/1657523067.run(Unknown
> Source)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>
> at
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
>
> at java.lang.Thread.run(Thread.java:748)
>
> 20/05/02 02:27:03 INFO ExecutorRunner: Killing pro
>
>
>
> Another Reason
>
> 20/05/02 22:15:21 INFO DriverRunner: Copying user jar
> http://XX.XX.XXX.19:90/jar/hc-job-1.0-SNAPSHOT.jar to
> /grid/1/spark/work/driver-20200502221520-1101/hc-job-1.0-SNAPSHOT.jar
> *20/05/02 22:15:50 WARN TransportChannelHandler: Exception in connection
> from /XX.XX.XXX.19:7077*
> *java.lang.OutOfMemoryError: Java heap space*
> * at java.util.Arrays.copyOf(Arrays.java:3332)*
> * at
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)*
> * at
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)*
> at java.lang.StringBuilder.append(StringBuilder.java:136)
> at java.io.ObjectStreamField.getClassSignature(ObjectStreamField.java:322)
> at java.io.ObjectStreamField.<init>(ObjectStreamField.java:140)
> at
> java.io.ObjectStreamClass.getDefaultSerialFields(ObjectStreamClass.java:1789)
> at java.io.ObjectStreamClass.getSerialFields(ObjectStreamClass.java:1705)
> at java.io.ObjectStreamClass.access$800(ObjectStreamClass.java:79)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:496)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:482)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:482)
> at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:379)
> at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:669)
> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1883)
> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1749)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2040)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
> at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
> at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:108)
> at
> org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1$$anonfun$apply$1.apply(NettyRpcEnv.scala:271)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
> at
> org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:320)
> at
> org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1.apply(NettyRpcEnv.scala:270)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
> at
> org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:269)
> at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:611)
> at
> org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:662)
> at
> org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:654)
> at
> org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:275)
> *20/05/02 22:15:50 ERROR SparkUncaughtExceptionHandler: Uncaught exception
> in thread Thread[DriverRunner for driver-20200502221520-1100,5,main]*
> *java.lang.OutOfMemoryError: Java heap space*
> * at
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2627)*
> at
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
> at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115)
> at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464)
> at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
> at
> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:114)
> at
> org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:160)
> at
> org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:173)
> at
> org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92)
> *20/05/02 22:15:51 ERROR SparkUncaughtExceptionHandler: Uncaught exception
> in thread Thread[dispatcher-event-loop-7,5,main]*
> *java.lang.OutOfMemoryError: Java heap space*
> * at org.apache.spark.deploy.worker.Worker.receive(Worker.scala:443)*
> * at
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)*
> * at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)*
> at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
> at
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 20/05/02 22:16:05 INFO ExecutorRunner: Killing process!
>
>
>
>
> On Thu, May 7, 2020 at 7:48 PM Jeff Evans <je...@gmail.com>
> wrote:
>
>> You might want to double check your Hadoop config files.  From the stack
>> trace it looks like this is happening when simply trying to load
>> configuration (XML files).  Make sure they're well formed.
>>
>> On Thu, May 7, 2020 at 6:12 AM Hrishikesh Mishra <sd...@gmail.com>
>> wrote:
>>
>>> Hi
>>>
>>> I am getting out of memory error in worker log in streaming jobs in
>>> every couple of hours. After this worker dies. There is no shuffle, no
>>> aggression, no. caching  in job, its just a transformation.
>>> I'm not able to identify where is the problem, driver or executor. And
>>> why worker getting dead after the OOM streaming job should die. Am I
>>> missing something.
>>>
>>> Driver Memory:  2g
>>> Executor memory: 4g
>>>
>>> Spark Version:  2.4
>>> Kafka Direct Stream
>>> Spark Standalone Cluster.
>>>
>>>
>>> 20/05/06 12:52:20 INFO SecurityManager: SecurityManager: authentication
>>> disabled; ui acls disabled; users  with view permissions: Set(root); groups
>>> with view permissions: Set(); users  with modify permissions: Set(root);
>>> groups with modify permissions: Set()
>>>
>>> 20/05/06 12:53:03 ERROR SparkUncaughtExceptionHandler: Uncaught
>>> exception in thread Thread[ExecutorRunner for
>>> app-20200506124717-10226/0,5,main]
>>>
>>> java.lang.OutOfMemoryError: Java heap space
>>>
>>> at org.apache.xerces.util.XMLStringBuffer.append(Unknown Source)
>>>
>>> at org.apache.xerces.impl.XMLEntityScanner.scanData(Unknown Source)
>>>
>>> at org.apache.xerces.impl.XMLScanner.scanComment(Unknown Source)
>>>
>>> at
>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanComment(Unknown
>>> Source)
>>>
>>> at
>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>>> Source)
>>>
>>> at
>>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
>>> Source)
>>>
>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>>
>>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>>
>>> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>>>
>>> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>>>
>>> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>>>
>>> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
>>>
>>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2480)
>>>
>>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
>>>
>>> at
>>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
>>>
>>> at
>>> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
>>>
>>> at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
>>>
>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143)
>>>
>>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115)
>>>
>>> at
>>> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464)
>>>
>>> at
>>> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
>>>
>>> at
>>> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:114)
>>>
>>> at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:114)
>>>
>>> at org.apache.spark.deploy.worker.ExecutorRunner.org
>>> $apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:149)
>>>
>>> at
>>> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73)
>>>
>>> 20/05/06 12:53:38 INFO DriverRunner: Worker shutting down, killing
>>> driver driver-20200505181719-1187
>>>
>>> 20/05/06 12:53:38 INFO DriverRunner: Killing driver process!
>>>
>>>
>>>
>>>
>>> Regards
>>> Hrishi
>>>
>>

Re: java.lang.OutOfMemoryError Spark Worker

Posted by Hrishikesh Mishra <sd...@gmail.com>.
It's only happening for Hadoop config. The exceptions trace are different
for each time it gets died. And Jobs run for couple hours then worker dies.

Another Reason:

*20/05/02 02:26:34 ERROR SparkUncaughtExceptionHandler: Uncaught exception
in thread Thread[ExecutorRunner for app-20200501213234-9846/3,5,main]*

*java.lang.OutOfMemoryError: Java heap space*

* at org.apache.xerces.xni.XMLString.toString(Unknown Source)*

at org.apache.xerces.parsers.AbstractDOMParser.characters(Unknown Source)

at org.apache.xerces.xinclude.XIncludeHandler.characters(Unknown Source)

at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown
Source)

at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)

at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)

at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)

at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)

at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)

at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2480)

at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)

at
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)

at
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)

at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)

at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143)

at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115)

at
org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464)

at
org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)

at
org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:114)

at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:114)

at org.apache.spark.deploy.worker.ExecutorRunner.org
$apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:149)

at
org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73)

*20/05/02 02:26:37 ERROR SparkUncaughtExceptionHandler: Uncaught exception
in thread Thread[dispatcher-event-loop-3,5,main]*

*java.lang.OutOfMemoryError: Java heap space*

* at java.lang.Class.newInstance(Class.java:411)*

at
sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:403)

at
sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:394)

at java.security.AccessController.doPrivileged(Native Method)

at
sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:393)

at
sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java:112)

at
sun.reflect.ReflectionFactory.generateConstructor(ReflectionFactory.java:398)

at
sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java:360)

at
java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java:1520)

at java.io.ObjectStreamClass.access$1500(ObjectStreamClass.java:79)

at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:507)

at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:482)

at java.security.AccessController.doPrivileged(Native Method)

at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:482)

at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:379)

at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:478)

at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:379)

at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134)

at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)

at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)

at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)

at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)

at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)

at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)

at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)

at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)

at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)

at
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)

at
org.apache.spark.rpc.netty.RequestMessage.serialize(NettyRpcEnv.scala:565)

at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:193)

at
org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:528)

at org.apache.spark.deploy.worker.Worker.org
$apache$spark$deploy$worker$Worker$$sendToMaster(Worker.scala:658)

*20/05/02 02:26:34 ERROR SparkUncaughtExceptionHandler: Uncaught exception
in thread Thread[spark-shuffle-directory-cleaner-4-1,5,main]*

*java.lang.OutOfMemoryError: Java heap space*

* at java.io.UnixFileSystem.resolve(UnixFileSystem.java:108)*

* at java.io.File.<init>(File.java:262)*

* at java.io.File.listFiles(File.java:1253)*

at
org.apache.spark.network.util.JavaUtils.listFilesSafely(JavaUtils.java:177)

at
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:140)

at
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)

at
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128)

at
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)

at
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.deleteNonShuffleFiles(ExternalShuffleBlockResolver.java:269)

at
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.lambda$executorRemoved$1(ExternalShuffleBlockResolver.java:235)

at
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver$$Lambda$19/1657523067.run(Unknown
Source)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)

at java.lang.Thread.run(Thread.java:748)

20/05/02 02:27:03 INFO ExecutorRunner: Killing pro



Another Reason

20/05/02 22:15:21 INFO DriverRunner: Copying user jar
http://XX.XX.XXX.19:90/jar/hc-job-1.0-SNAPSHOT.jar to
/grid/1/spark/work/driver-20200502221520-1101/hc-job-1.0-SNAPSHOT.jar
*20/05/02 22:15:50 WARN TransportChannelHandler: Exception in connection
from /XX.XX.XXX.19:7077*
*java.lang.OutOfMemoryError: Java heap space*
* at java.util.Arrays.copyOf(Arrays.java:3332)*
* at
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)*
* at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)*
at java.lang.StringBuilder.append(StringBuilder.java:136)
at java.io.ObjectStreamField.getClassSignature(ObjectStreamField.java:322)
at java.io.ObjectStreamField.<init>(ObjectStreamField.java:140)
at
java.io.ObjectStreamClass.getDefaultSerialFields(ObjectStreamClass.java:1789)
at java.io.ObjectStreamClass.getSerialFields(ObjectStreamClass.java:1705)
at java.io.ObjectStreamClass.access$800(ObjectStreamClass.java:79)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:496)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:482)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:482)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:379)
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:669)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1883)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1749)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2040)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:108)
at
org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1$$anonfun$apply$1.apply(NettyRpcEnv.scala:271)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:320)
at
org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1.apply(NettyRpcEnv.scala:270)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:269)
at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:611)
at
org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:662)
at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:654)
at
org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:275)
*20/05/02 22:15:50 ERROR SparkUncaughtExceptionHandler: Uncaught exception
in thread Thread[DriverRunner for driver-20200502221520-1100,5,main]*
*java.lang.OutOfMemoryError: Java heap space*
* at
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2627)*
at
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115)
at
org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464)
at
org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
at
org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:114)
at
org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:160)
at
org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:173)
at
org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92)
*20/05/02 22:15:51 ERROR SparkUncaughtExceptionHandler: Uncaught exception
in thread Thread[dispatcher-event-loop-7,5,main]*
*java.lang.OutOfMemoryError: Java heap space*
* at org.apache.spark.deploy.worker.Worker.receive(Worker.scala:443)*
* at
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)*
* at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)*
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
20/05/02 22:16:05 INFO ExecutorRunner: Killing process!




On Thu, May 7, 2020 at 7:48 PM Jeff Evans <je...@gmail.com>
wrote:

> You might want to double check your Hadoop config files.  From the stack
> trace it looks like this is happening when simply trying to load
> configuration (XML files).  Make sure they're well formed.
>
> On Thu, May 7, 2020 at 6:12 AM Hrishikesh Mishra <sd...@gmail.com>
> wrote:
>
>> Hi
>>
>> I am getting out of memory error in worker log in streaming jobs in every
>> couple of hours. After this worker dies. There is no shuffle, no
>> aggression, no. caching  in job, its just a transformation.
>> I'm not able to identify where is the problem, driver or executor. And
>> why worker getting dead after the OOM streaming job should die. Am I
>> missing something.
>>
>> Driver Memory:  2g
>> Executor memory: 4g
>>
>> Spark Version:  2.4
>> Kafka Direct Stream
>> Spark Standalone Cluster.
>>
>>
>> 20/05/06 12:52:20 INFO SecurityManager: SecurityManager: authentication
>> disabled; ui acls disabled; users  with view permissions: Set(root); groups
>> with view permissions: Set(); users  with modify permissions: Set(root);
>> groups with modify permissions: Set()
>>
>> 20/05/06 12:53:03 ERROR SparkUncaughtExceptionHandler: Uncaught exception
>> in thread Thread[ExecutorRunner for app-20200506124717-10226/0,5,main]
>>
>> java.lang.OutOfMemoryError: Java heap space
>>
>> at org.apache.xerces.util.XMLStringBuffer.append(Unknown Source)
>>
>> at org.apache.xerces.impl.XMLEntityScanner.scanData(Unknown Source)
>>
>> at org.apache.xerces.impl.XMLScanner.scanComment(Unknown Source)
>>
>> at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanComment(Unknown
>> Source)
>>
>> at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>> Source)
>>
>> at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
>> Source)
>>
>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>
>> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>>
>> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>>
>> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>>
>> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>>
>> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
>>
>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2480)
>>
>> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
>>
>> at
>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
>>
>> at
>> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
>>
>> at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
>>
>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143)
>>
>> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115)
>>
>> at
>> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464)
>>
>> at
>> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
>>
>> at
>> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:114)
>>
>> at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:114)
>>
>> at org.apache.spark.deploy.worker.ExecutorRunner.org
>> $apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:149)
>>
>> at
>> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73)
>>
>> 20/05/06 12:53:38 INFO DriverRunner: Worker shutting down, killing driver
>> driver-20200505181719-1187
>>
>> 20/05/06 12:53:38 INFO DriverRunner: Killing driver process!
>>
>>
>>
>>
>> Regards
>> Hrishi
>>
>

Re: java.lang.OutOfMemoryError Spark Worker

Posted by Jeff Evans <je...@gmail.com>.
You might want to double check your Hadoop config files.  From the stack
trace it looks like this is happening when simply trying to load
configuration (XML files).  Make sure they're well formed.

On Thu, May 7, 2020 at 6:12 AM Hrishikesh Mishra <sd...@gmail.com>
wrote:

> Hi
>
> I am getting out of memory error in worker log in streaming jobs in every
> couple of hours. After this worker dies. There is no shuffle, no
> aggression, no. caching  in job, its just a transformation.
> I'm not able to identify where is the problem, driver or executor. And why
> worker getting dead after the OOM streaming job should die. Am I missing
> something.
>
> Driver Memory:  2g
> Executor memory: 4g
>
> Spark Version:  2.4
> Kafka Direct Stream
> Spark Standalone Cluster.
>
>
> 20/05/06 12:52:20 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users  with view permissions: Set(root); groups
> with view permissions: Set(); users  with modify permissions: Set(root);
> groups with modify permissions: Set()
>
> 20/05/06 12:53:03 ERROR SparkUncaughtExceptionHandler: Uncaught exception
> in thread Thread[ExecutorRunner for app-20200506124717-10226/0,5,main]
>
> java.lang.OutOfMemoryError: Java heap space
>
> at org.apache.xerces.util.XMLStringBuffer.append(Unknown Source)
>
> at org.apache.xerces.impl.XMLEntityScanner.scanData(Unknown Source)
>
> at org.apache.xerces.impl.XMLScanner.scanComment(Unknown Source)
>
> at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanComment(Unknown
> Source)
>
> at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
> Source)
>
> at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
>
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>
> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
>
> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2480)
>
> at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
>
> at
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
>
> at
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
>
> at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
>
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143)
>
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115)
>
> at
> org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:464)
>
> at
> org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
>
> at
> org.apache.spark.deploy.SparkHadoopUtil.newConfiguration(SparkHadoopUtil.scala:114)
>
> at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:114)
>
> at org.apache.spark.deploy.worker.ExecutorRunner.org
> $apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:149)
>
> at
> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73)
>
> 20/05/06 12:53:38 INFO DriverRunner: Worker shutting down, killing driver
> driver-20200505181719-1187
>
> 20/05/06 12:53:38 INFO DriverRunner: Killing driver process!
>
>
>
>
> Regards
> Hrishi
>