You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jadhav Shweta <ja...@tcs.com> on 2015/07/15 18:04:58 UTC

Spark Accumulator Issue - java.io.IOException: java.lang.StackOverflowError

Hi,

I am trying one transformation by calling scala method
this scala method returns MutableList[AvroObject]

def processRecords(id: String, list1: Iterable[(String, GenericRecord)]): scala.collection.mutable.MutableList[AvroObject] 

Hence, the output of transaformation is RDD[MutableList[AvroObject]]

But I want o/p as RDD[AvroObject]

I tried applying foreach on RDD[MutableList[AvroObject]] --> RDD[AvroObject]

var uA = sparkContext.accumulableCollection[MutableList[AvroObject], universe](MutableList[AvroObject]())
rdd_list_avroObj.foreach(u => {
					uA ++= u
				})
var uRDD = sparkContext.parallelize(uA.value)

Its failing on large dataset with following error

java.io.IOException: java.lang.StackOverflowError
	at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1140)
	at org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:45)
	at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1458)
	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429)
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
	at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
	at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:80)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:226)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.StackOverflowError
	at java.io.ObjectOutputStream$HandleTable.hash(ObjectOutputStream.java:2359)
	at java.io.ObjectOutputStream$HandleTable.lookup(ObjectOutputStream.java:2292)
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1115)
	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
	at java.util.ArrayList.writeObject(ArrayList.java:742)
	at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)

I have two queries regarding this issue:
Option 1: REplacement of accumulator
Option 2: In scala method instead of returning List[AvroObject] can I send multiple AvroObject. SO that I'll get RDD[AvroObject]

Note:
I am using Saprk 1.3.0
Input DataSize 200GB
Cluster 3 Machines(2 Cores, 8GB)
Spark running in YARN Mode

Thanks & Regards
Shweta Jadhav
Tata Consultancy Services Limited
Cell:- +91-9867515614
Mailto: jadhav.shweta@tcs.com
Website: http://www.tcs.com
____________________________________________
Experience certainty.	IT Services
Business Solutions
Consulting
____________________________________________
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you