You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yifan LI <ia...@gmail.com> on 2015/10/23 12:24:19 UTC

java.lang.NegativeArraySizeException? as iterating a big RDD

Hi,

I have a big sorted RDD sRdd(~962million elements), and need to scan its elements in order(using sRdd.toLocalIterator).

But the process failed when the scanning was done after around 893million elements, returned with following exception:

Anyone has idea? Thanks!


Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 421752.0 failed 128 times, most recent failure: Lost task 0.127 in stage 421752.0 (TID 17304, small15-tap1.common.lip6.fr): java.lang.NegativeArraySizeException
	at com.esotericsoftware.kryo.util.IdentityObjectIntMap.resize(IdentityObjectIntMap.java:409)
	at com.esotericsoftware.kryo.util.IdentityObjectIntMap.putStash(IdentityObjectIntMap.java:227)
	at com.esotericsoftware.kryo.util.IdentityObjectIntMap.push(IdentityObjectIntMap.java:221)
	at com.esotericsoftware.kryo.util.IdentityObjectIntMap.put(IdentityObjectIntMap.java:117)
	at com.esotericsoftware.kryo.util.IdentityObjectIntMap.putStash(IdentityObjectIntMap.java:228)
	at com.esotericsoftware.kryo.util.IdentityObjectIntMap.push(IdentityObjectIntMap.java:221)
	at com.esotericsoftware.kryo.util.IdentityObjectIntMap.put(IdentityObjectIntMap.java:117)
	at com.esotericsoftware.kryo.util.MapReferenceResolver.addWrittenObject(MapReferenceResolver.java:23)
	at com.esotericsoftware.kryo.Kryo.writeReferenceOrNull(Kryo.java:598)
	at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:566)
	at com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:36)
	at com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:33)
	at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
	at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318)
	at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
	at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
	at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:250)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:236)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
	at scala.Option.foreach(Option.scala:236)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

Best,
Yifan LI






Re: java.lang.NegativeArraySizeException? as iterating a big RDD

Posted by Todd Nist <ts...@gmail.com>.
Hi Yifan,

You could also try increasing the spark.kryoserializer.buffer.max.mb

*spark.kryoserializer.buffer.max.mb *(64 Mb by default) : useful if your
default buffer size goes further than 64 Mb;

Per doc:
Maximum allowable size of Kryo serialization buffer. This must be larger
than any object you attempt to serialize. Increase this if you get a
"buffer limit exceeded" exception inside Kryo.

-Todd

On Fri, Oct 23, 2015 at 6:51 AM, Yifan LI <ia...@gmail.com> wrote:

> Thanks for your advice, Jem. :)
>
> I will increase the partitioning and see if it helps.
>
> Best,
> Yifan LI
>
>
>
>
>
> On 23 Oct 2015, at 12:48, Jem Tucker <je...@gmail.com> wrote:
>
> Hi Yifan,
>
> I think this is a result of Kryo trying to seriallize something too large.
> Have you tried to increase your partitioning?
>
> Cheers,
>
> Jem
>
> On Fri, Oct 23, 2015 at 11:24 AM Yifan LI <ia...@gmail.com> wrote:
>
>> Hi,
>>
>> I have a big sorted RDD sRdd(~962million elements), and need to scan its
>> elements in order(using sRdd.toLocalIterator).
>>
>> But the process failed when the scanning was done after around 893million
>> elements, returned with following exception:
>>
>> Anyone has idea? Thanks!
>>
>>
>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>> due to stage failure: Task 0 in stage 421752.0 failed 128 times, most
>> recent failure: Lost task 0.127 in stage 421752.0 (TID 17304,
>> small15-tap1.common.lip6.fr): java.lang.NegativeArraySizeException
>> at
>> com.esotericsoftware.kryo.util.IdentityObjectIntMap.resize(IdentityObjectIntMap.java:409)
>> at
>> com.esotericsoftware.kryo.util.IdentityObjectIntMap.putStash(IdentityObjectIntMap.java:227)
>> at
>> com.esotericsoftware.kryo.util.IdentityObjectIntMap.push(IdentityObjectIntMap.java:221)
>> at
>> com.esotericsoftware.kryo.util.IdentityObjectIntMap.put(IdentityObjectIntMap.java:117)
>> at
>> com.esotericsoftware.kryo.util.IdentityObjectIntMap.putStash(IdentityObjectIntMap.java:228)
>> at
>> com.esotericsoftware.kryo.util.IdentityObjectIntMap.push(IdentityObjectIntMap.java:221)
>> at
>> com.esotericsoftware.kryo.util.IdentityObjectIntMap.put(IdentityObjectIntMap.java:117)
>> at
>> com.esotericsoftware.kryo.util.MapReferenceResolver.addWrittenObject(MapReferenceResolver.java:23)
>> at com.esotericsoftware.kryo.Kryo.writeReferenceOrNull(Kryo.java:598)
>> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:566)
>> at com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:36)
>> at com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:33)
>> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
>> at
>> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318)
>> at
>> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
>> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
>> at
>> org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:250)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:236)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> Driver stacktrace:
>> at org.apache.spark.scheduler.DAGScheduler.org
>> <http://org.apache.spark.scheduler.dagscheduler.org/>
>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
>> at
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> at
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
>> at scala.Option.foreach(Option.scala:236)
>> at
>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
>> at
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
>> at
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>>
>> Best,
>> Yifan LI
>>
>>
>>
>>
>>
>>
>

Re: java.lang.NegativeArraySizeException? as iterating a big RDD

Posted by Yifan LI <ia...@gmail.com>.
Thanks for your advice, Jem. :)

I will increase the partitioning and see if it helps. 

Best,
Yifan LI





> On 23 Oct 2015, at 12:48, Jem Tucker <je...@gmail.com> wrote:
> 
> Hi Yifan, 
> 
> I think this is a result of Kryo trying to seriallize something too large. Have you tried to increase your partitioning? 
> 
> Cheers,
> 
> Jem
> 
> On Fri, Oct 23, 2015 at 11:24 AM Yifan LI <iamyifanli@gmail.com <ma...@gmail.com>> wrote:
> Hi,
> 
> I have a big sorted RDD sRdd(~962million elements), and need to scan its elements in order(using sRdd.toLocalIterator).
> 
> But the process failed when the scanning was done after around 893million elements, returned with following exception:
> 
> Anyone has idea? Thanks!
> 
> 
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 421752.0 failed 128 times, most recent failure: Lost task 0.127 in stage 421752.0 (TID 17304, small15-tap1.common.lip6.fr <http://small15-tap1.common.lip6.fr/>): java.lang.NegativeArraySizeException
> 	at com.esotericsoftware.kryo.util.IdentityObjectIntMap.resize(IdentityObjectIntMap.java:409)
> 	at com.esotericsoftware.kryo.util.IdentityObjectIntMap.putStash(IdentityObjectIntMap.java:227)
> 	at com.esotericsoftware.kryo.util.IdentityObjectIntMap.push(IdentityObjectIntMap.java:221)
> 	at com.esotericsoftware.kryo.util.IdentityObjectIntMap.put(IdentityObjectIntMap.java:117)
> 	at com.esotericsoftware.kryo.util.IdentityObjectIntMap.putStash(IdentityObjectIntMap.java:228)
> 	at com.esotericsoftware.kryo.util.IdentityObjectIntMap.push(IdentityObjectIntMap.java:221)
> 	at com.esotericsoftware.kryo.util.IdentityObjectIntMap.put(IdentityObjectIntMap.java:117)
> 	at com.esotericsoftware.kryo.util.MapReferenceResolver.addWrittenObject(MapReferenceResolver.java:23)
> 	at com.esotericsoftware.kryo.Kryo.writeReferenceOrNull(Kryo.java:598)
> 	at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:566)
> 	at com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:36)
> 	at com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:33)
> 	at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
> 	at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318)
> 	at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
> 	at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
> 	at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:250)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:236)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> 
> Driver stacktrace:
> 	at org.apache.spark.scheduler.DAGScheduler.org <http://org.apache.spark.scheduler.dagscheduler.org/>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
> 	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> 	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> 	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
> 	at scala.Option.foreach(Option.scala:236)
> 	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
> 	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> 
> Best,
> Yifan LI
> 
> 
> 
> 
> 


Re: java.lang.NegativeArraySizeException? as iterating a big RDD

Posted by Jem Tucker <je...@gmail.com>.
Hi Yifan,

I think this is a result of Kryo trying to seriallize something too large.
Have you tried to increase your partitioning?

Cheers,

Jem

On Fri, Oct 23, 2015 at 11:24 AM Yifan LI <ia...@gmail.com> wrote:

> Hi,
>
> I have a big sorted RDD sRdd(~962million elements), and need to scan its
> elements in order(using sRdd.toLocalIterator).
>
> But the process failed when the scanning was done after around 893million
> elements, returned with following exception:
>
> Anyone has idea? Thanks!
>
>
> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due to stage failure: Task 0 in stage 421752.0 failed 128 times, most
> recent failure: Lost task 0.127 in stage 421752.0 (TID 17304,
> small15-tap1.common.lip6.fr): java.lang.NegativeArraySizeException
> at
> com.esotericsoftware.kryo.util.IdentityObjectIntMap.resize(IdentityObjectIntMap.java:409)
> at
> com.esotericsoftware.kryo.util.IdentityObjectIntMap.putStash(IdentityObjectIntMap.java:227)
> at
> com.esotericsoftware.kryo.util.IdentityObjectIntMap.push(IdentityObjectIntMap.java:221)
> at
> com.esotericsoftware.kryo.util.IdentityObjectIntMap.put(IdentityObjectIntMap.java:117)
> at
> com.esotericsoftware.kryo.util.IdentityObjectIntMap.putStash(IdentityObjectIntMap.java:228)
> at
> com.esotericsoftware.kryo.util.IdentityObjectIntMap.push(IdentityObjectIntMap.java:221)
> at
> com.esotericsoftware.kryo.util.IdentityObjectIntMap.put(IdentityObjectIntMap.java:117)
> at
> com.esotericsoftware.kryo.util.MapReferenceResolver.addWrittenObject(MapReferenceResolver.java:23)
> at com.esotericsoftware.kryo.Kryo.writeReferenceOrNull(Kryo.java:598)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:566)
> at com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:36)
> at com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:33)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
> at
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318)
> at
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
> at
> org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:250)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:236)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> Driver stacktrace:
> at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
> at scala.Option.foreach(Option.scala:236)
> at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>
> Best,
> Yifan LI
>
>
>
>
>
>