You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jaonary Rabarisoa <ja...@gmail.com> on 2014/09/12 17:37:34 UTC

Why I get java.lang.OutOfMemoryError: Java heap space with join ?

Dear all,


I'm facing the following problem and I can't figure how to solve it.

I need to join 2 rdd in order to find their intersections. The first RDD
represent an image encoded in base64 string associated with image id. The
second RDD represent a set of geometric primitives (rectangle) associated
with image id. My goal is to draw these primitives on the corresponding
image. So my first attempt is to join images and primitives by image ids
and then do the drawing.

But, when I do

*primitives.join(images) *


I got the following error :

*java.lang.OutOfMemoryError: Java heap space*
* at java.util.Arrays.copyOf(Arrays.java:2367)*
* at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)*
* at
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)*
* at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)*
* at java.lang.StringBuilder.append(StringBuilder.java:204)*
* at
java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3143)*
* at
java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3051)*
* at
java.io.ObjectInputStream$BlockDataInputStream.readLongUTF(ObjectInputStream.java:3034)*
* at java.io.ObjectInputStream.readString(ObjectInputStream.java:1642)*
* at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)*
* at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)*
* at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)*
* at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)*
* at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)*
* at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)*
* at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)*
* at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)*
* at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)*
* at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)*
* at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)*
* at
org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125)*
* at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)*
* at
org.apache.spark.storage.BlockManager$LazyProxyIterator$1.hasNext(BlockManager.scala:1031)*
* at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)*
* at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)*
* at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)*
* at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)*
* at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)*
* at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)*
* at scala.collection.Iterator$class.foreach(Iterator.scala:727)*
* at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)*
* at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)*

I notice that sometime if I change the partition of the images RDD with
coalesce I can get it working.

What I'm doing wrong ?

Cheers,

Jaonary