You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Livni, Dana" <da...@intel.com> on 2014/02/12 08:23:15 UTC

GC issues

Hi,
When running a map task I got the following exception.
It is new, I have run this code many times in the past, and it the first time it happens,
any ideas why? Or how can I monitor when it happens?

Thanks Dana.

14/02/11 16:15:56 ERROR executor.Executor: Exception in task ID 128
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.lang.StringBuilder.toString(StringBuilder.java:430)
        at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3023)
        at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2819)
        at java.io.ObjectInputStream.readString(ObjectInputStream.java:1598)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1319)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
        at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:101)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
        at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
        at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:40)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$3.apply(PairRDDFunctions.scala:103)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$3.apply(PairRDDFunctions.scala:102)
        at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:465)
        at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:465)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
        at org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:32)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
        at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:29)
---------------------------------------------------------------------
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Re: GC issues

Posted by Andrew Ash <an...@andrewash.com>.
Alternatively, Spark's estimate of how much space you're using in the heap
is off on the low-side of true, so it runs out of memory when it thinks it
has breathing room.

Try lowering spark.storage.memoryFraction from its default (0.6) a bit to
something like 0.5 to make it more conservative with memory use within the
JVM if you don't have more physical memory to expand the Xmx setting.


On Wed, Feb 12, 2014 at 12:20 AM, Sean Owen <so...@cloudera.com> wrote:

> This is just Java's way of saying 'out of memory'. Your workers need more
> heap.
>  On Feb 12, 2014 7:23 AM, "Livni, Dana" <da...@intel.com> wrote:
>
>>  Hi,
>>
>> When running a map task I got the following exception.
>>
>> It is new, I have run this code many times in the past, and it the first
>> time it happens,
>>
>> any ideas why? Or how can I monitor when it happens?
>>
>>
>>
>> Thanks Dana.
>>
>>
>>
>> 14/02/11 16:15:56 ERROR executor.Executor: Exception in task ID 128
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at java.lang.StringBuilder.toString(StringBuilder.java:430)
>>
>>         at
>> java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3023)
>>
>>         at
>> java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2819)
>>
>>         at
>> java.io.ObjectInputStream.readString(ObjectInputStream.java:1598)
>>
>>         at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1319)
>>
>>         at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
>>
>>         at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
>>
>>         at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
>>
>>         at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
>>
>>         at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
>>
>>         at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
>>
>>         at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
>>
>>         at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
>>
>>         at
>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
>>
>>         at
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
>>
>>         at
>> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:101)
>>
>>         at
>> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
>>
>>         at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
>>
>>         at
>> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26)
>>
>>         at
>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
>>
>>         at
>> org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:40)
>>
>>         at
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$3.apply(PairRDDFunctions.scala:103)
>>
>>         at
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$3.apply(PairRDDFunctions.scala:102)
>>
>>         at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:465)
>>
>>         at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:465)
>>
>>         at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
>>
>>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>>
>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>>
>>         at
>> org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:32)
>>
>>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>>
>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>>
>>         at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:29)
>>
>> ---------------------------------------------------------------------
>> Intel Electronics Ltd.
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>>
>

Re: GC issues

Posted by Sean Owen <so...@cloudera.com>.
This is just Java's way of saying 'out of memory'. Your workers need more
heap.
On Feb 12, 2014 7:23 AM, "Livni, Dana" <da...@intel.com> wrote:

>  Hi,
>
> When running a map task I got the following exception.
>
> It is new, I have run this code many times in the past, and it the first
> time it happens,
>
> any ideas why? Or how can I monitor when it happens?
>
>
>
> Thanks Dana.
>
>
>
> 14/02/11 16:15:56 ERROR executor.Executor: Exception in task ID 128
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>         at java.lang.StringBuilder.toString(StringBuilder.java:430)
>
>         at
> java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3023)
>
>         at
> java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2819)
>
>         at
> java.io.ObjectInputStream.readString(ObjectInputStream.java:1598)
>
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1319)
>
>         at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
>
>         at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
>
>         at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
>
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
>
>         at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
>
>         at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
>
>         at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
>
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
>
>         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
>
>         at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
>
>         at
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:101)
>
>         at
> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
>
>         at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
>
>         at
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26)
>
>         at
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
>
>         at
> org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:40)
>
>         at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$3.apply(PairRDDFunctions.scala:103)
>
>         at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$3.apply(PairRDDFunctions.scala:102)
>
>         at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:465)
>
>         at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:465)
>
>         at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
>
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>
>         at
> org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:32)
>
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>
>         at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:29)
>
> ---------------------------------------------------------------------
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>