You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Shady Xu <sh...@gmail.com> on 2016/10/13 09:00:18 UTC

OOM when running Spark SQL by PySpark on Java 8

Hi,

I have a problem when running Spark SQL by PySpark on Java 8. Below is the
log.


16/10/13 16:46:40 INFO spark.SparkContext: Starting job: sql at
NativeMethodAccessorImpl.java:-2
Exception in thread "dag-scheduler-event-loop"
java.lang.OutOfMemoryError: PermGen space
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
	at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:857)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1630)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1622)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1611)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Exception in thread "shuffle-server-2" java.lang.OutOfMemoryError: PermGen space
Exception in thread "shuffle-server-4" java.lang.OutOfMemoryError: PermGen space
Exception in thread "threadDeathWatcher-2-1"
java.lang.OutOfMemoryError: PermGen space


I tried to increase the driver memory and didn't help. However, things
are ok when I run the same code after switching to Java 7. I also find
it ok to run the SparkPi example on Java 8. So I believe the problem
stays with PySpark rather theSpark core.


I am using Spark 2.0.1 and run the program in YARN cluster mode.
Anyone any idea is appreciated.

Re: OOM when running Spark SQL by PySpark on Java 8

Posted by Shady Xu <sh...@gmail.com>.
All nodes of my YARN cluster is running on Java 7, but I submit the job
from a Java 8 client.

I realised I run the job in yarn cluster mode and that's why setting '
--driver-java-options' is effective. Now the problem is, why submitting a
job from a Java 8 client to a Java 7 cluster causes a PermGen OOM.

2016-10-13 17:30 GMT+08:00 Sean Owen <so...@cloudera.com>:

> You can specify it; it just doesn't do anything but cause a warning in
> Java 8. It won't work in general to have such a tiny PermGen. If it's
> working it means you're on Java 8 because it's ignored. You should set
> MaxPermSize if anything, not PermSize. However the error indicates you are
> not using Java 8 everywhere on your cluster, and that's a potentially
> bigger problem.
>
> On Thu, Oct 13, 2016 at 10:26 AM Shady Xu <sh...@gmail.com> wrote:
>
>> Solved the problem by specifying the PermGen size when submitting the job
>> (even to just a few MB).
>>
>> Seems Java 8 has removed the Permanent Generation space, thus
>> corresponding JVM arguments are ignored.  But I can still
>> use --driver-java-options "-XX:PermSize=80M -XX:MaxPermSize=100m" to
>> specify them when submitting the Spark job, which is wried. I don't know
>> whether it has anything to do with py4j as I am not familiar with it.
>>
>> 2016-10-13 17:00 GMT+08:00 Shady Xu <sh...@gmail.com>:
>>
>> Hi,
>>
>> I have a problem when running Spark SQL by PySpark on Java 8. Below is
>> the log.
>>
>>
>> 16/10/13 16:46:40 INFO spark.SparkContext: Starting job: sql at NativeMethodAccessorImpl.java:-2
>> Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError: PermGen space
>> 	at java.lang.ClassLoader.defineClass1(Native Method)
>> 	at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
>> 	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>> 	at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
>> 	at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> 	at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:857)
>> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1630)
>> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1622)
>> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1611)
>> 	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>> Exception in thread "shuffle-server-2" java.lang.OutOfMemoryError: PermGen space
>> Exception in thread "shuffle-server-4" java.lang.OutOfMemoryError: PermGen space
>> Exception in thread "threadDeathWatcher-2-1" java.lang.OutOfMemoryError: PermGen space
>>
>>
>> I tried to increase the driver memory and didn't help. However, things are ok when I run the same code after switching to Java 7. I also find it ok to run the SparkPi example on Java 8. So I believe the problem stays with PySpark rather theSpark core.
>>
>>
>> I am using Spark 2.0.1 and run the program in YARN cluster mode. Anyone any idea is appreciated.
>>
>>
>>

Re: OOM when running Spark SQL by PySpark on Java 8

Posted by Sean Owen <so...@cloudera.com>.
You can specify it; it just doesn't do anything but cause a warning in Java
8. It won't work in general to have such a tiny PermGen. If it's working it
means you're on Java 8 because it's ignored. You should set MaxPermSize if
anything, not PermSize. However the error indicates you are not using Java
8 everywhere on your cluster, and that's a potentially bigger problem.

On Thu, Oct 13, 2016 at 10:26 AM Shady Xu <sh...@gmail.com> wrote:

> Solved the problem by specifying the PermGen size when submitting the job
> (even to just a few MB).
>
> Seems Java 8 has removed the Permanent Generation space, thus
> corresponding JVM arguments are ignored.  But I can still
> use --driver-java-options "-XX:PermSize=80M -XX:MaxPermSize=100m" to
> specify them when submitting the Spark job, which is wried. I don't know
> whether it has anything to do with py4j as I am not familiar with it.
>
> 2016-10-13 17:00 GMT+08:00 Shady Xu <sh...@gmail.com>:
>
> Hi,
>
> I have a problem when running Spark SQL by PySpark on Java 8. Below is the
> log.
>
>
> 16/10/13 16:46:40 INFO spark.SparkContext: Starting job: sql at NativeMethodAccessorImpl.java:-2
> Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError: PermGen space
> 	at java.lang.ClassLoader.defineClass1(Native Method)
> 	at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
> 	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> 	at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
> 	at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> 	at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:857)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1630)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1622)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1611)
> 	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> Exception in thread "shuffle-server-2" java.lang.OutOfMemoryError: PermGen space
> Exception in thread "shuffle-server-4" java.lang.OutOfMemoryError: PermGen space
> Exception in thread "threadDeathWatcher-2-1" java.lang.OutOfMemoryError: PermGen space
>
>
> I tried to increase the driver memory and didn't help. However, things are ok when I run the same code after switching to Java 7. I also find it ok to run the SparkPi example on Java 8. So I believe the problem stays with PySpark rather theSpark core.
>
>
> I am using Spark 2.0.1 and run the program in YARN cluster mode. Anyone any idea is appreciated.
>
>
>

Re: OOM when running Spark SQL by PySpark on Java 8

Posted by Shady Xu <sh...@gmail.com>.
Solved the problem by specifying the PermGen size when submitting the job
(even to just a few MB).

Seems Java 8 has removed the Permanent Generation space, thus corresponding
JVM arguments are ignored.  But I can still use --driver-java-options
"-XX:PermSize=80M -XX:MaxPermSize=100m" to specify them when submitting the
Spark job, which is wried. I don't know whether it has anything to do with
py4j as I am not familiar with it.

2016-10-13 17:00 GMT+08:00 Shady Xu <sh...@gmail.com>:

> Hi,
>
> I have a problem when running Spark SQL by PySpark on Java 8. Below is the
> log.
>
>
> 16/10/13 16:46:40 INFO spark.SparkContext: Starting job: sql at NativeMethodAccessorImpl.java:-2
> Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError: PermGen space
> 	at java.lang.ClassLoader.defineClass1(Native Method)
> 	at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
> 	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> 	at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
> 	at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> 	at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:857)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1630)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1622)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1611)
> 	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> Exception in thread "shuffle-server-2" java.lang.OutOfMemoryError: PermGen space
> Exception in thread "shuffle-server-4" java.lang.OutOfMemoryError: PermGen space
> Exception in thread "threadDeathWatcher-2-1" java.lang.OutOfMemoryError: PermGen space
>
>
> I tried to increase the driver memory and didn't help. However, things are ok when I run the same code after switching to Java 7. I also find it ok to run the SparkPi example on Java 8. So I believe the problem stays with PySpark rather theSpark core.
>
>
> I am using Spark 2.0.1 and run the program in YARN cluster mode. Anyone any idea is appreciated.
>
>

Re: OOM when running Spark SQL by PySpark on Java 8

Posted by Sean Owen <so...@cloudera.com>.
The error doesn't say you're out of memory, but says you're out of PermGen.
If you see this, you aren't running Java 8 AFAIK, because 8 has no PermGen.
But if you're running Java 7, and you go investigate what this error means,
you'll find you need to increase PermGen. This is mentioned in the Spark
docs too, as you need to increase this when building on Java 7.

On Thu, Oct 13, 2016 at 10:00 AM Shady Xu <sh...@gmail.com> wrote:

> Hi,
>
> I have a problem when running Spark SQL by PySpark on Java 8. Below is the
> log.
>
>
> 16/10/13 16:46:40 INFO spark.SparkContext: Starting job: sql at NativeMethodAccessorImpl.java:-2
> Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError: PermGen space
> 	at java.lang.ClassLoader.defineClass1(Native Method)
> 	at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
> 	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> 	at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
> 	at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> 	at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:857)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1630)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1622)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1611)
> 	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> Exception in thread "shuffle-server-2" java.lang.OutOfMemoryError: PermGen space
> Exception in thread "shuffle-server-4" java.lang.OutOfMemoryError: PermGen space
> Exception in thread "threadDeathWatcher-2-1" java.lang.OutOfMemoryError: PermGen space
>
>
> I tried to increase the driver memory and didn't help. However, things are ok when I run the same code after switching to Java 7. I also find it ok to run the SparkPi example on Java 8. So I believe the problem stays with PySpark rather theSpark core.
>
>
> I am using Spark 2.0.1 and run the program in YARN cluster mode. Anyone any idea is appreciated.
>
>