You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sonal Goyal <so...@gmail.com> on 2014/02/13 13:37:38 UTC

Java API - Serialization Issue

Hi,

I am using spark 0.9 and the Java API. I am configuring Kyro serialization
for my custom classes as follows:

SparkConf conf = new SparkConf();
conf.setMaster(args[0]);
conf.setAppName("Reifier");
conf.setJars(JavaSparkContext.jarOfClass(Reifier.class));
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
LOG.debug("Conf is " + conf.toDebugString());
JavaSparkContext ctx = new JavaSparkContext(conf);

The above prints

Conf is spark.app.name=Reifier
spark.jars=
spark.kryo.registrator=shark.KryoRegistrator
spark.master=local
spark.serializer=org.apache.spark.serializer.KryoSerializer


However, running the code gives me
 org.apache.spark.SparkException: Job aborted: Task not serializable:
java.io.NotSerializableException:reifier.myClass

My questions are:

1. Is Kyro indeed being used, as the error above points to Java
serialization ?
2. I am using some external libaries which I can not set Serializable. How
can I avoid the above error?

Thanks a lot in advance for helping out.




Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>

Re: Java API - Serialization Issue

Posted by santhoma <sa...@yahoo.com>.
This worked great. Thanks a lot



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Java-API-Serialization-Issue-tp1460p3178.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Java API - Serialization Issue

Posted by Sourav Chandra <so...@livestream.com>.
I can suggest two things:

1. While creating worker, submitting task make sure you are not keeping any
unwanted external class resource (which is not used in closure and not
serializable)
2. If this is ensured and you still get some issue from 3rd party library
you can make thet 3rd party variable reference as transient in your code
and define private static void readObject(is: ObjectInputStream) method to
initialize that particular variable.

e.g. class MyClass extends Serializable {
@transient
private var ref = initRef() this is a 3rd party variable which is not
serialzable
....

private <> initRef() {
  ref = ....
  return ref
}

private static void readObject(is: ObjectInputStream) {
  is.defaultReadObject() // this is to follow the java default serialzation
for all other parameters
  ref = initRef()
}
}

Thanks,
Sourav


On Mon, Mar 24, 2014 at 3:06 PM, santhoma <sa...@yahoo.com> wrote:

> I am also facing the same problem. I have implemented Serializable for my
> code, but the exception is thrown from third party libraries on which I
> have
> no control .
>
> Exception in thread "main" org.apache.spark.SparkException: Job aborted:
> Task not serializable: java.io.NotSerializableException: (lib class name
> here)
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
>
> Is it mandatory that Serializable must be implemented for dependent jars as
> well?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Java-API-Serialization-Issue-tp1460p3086.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>



-- 

Sourav Chandra

Senior Software Engineer

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·

sourav.chandra@livestream.com

o: +91 80 4121 8723

m: +91 988 699 3746

skype: sourav.chandra

Livestream

"Ajmera Summit", First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd
Block, Koramangala Industrial Area,

Bangalore 560034

www.livestream.com

Re: Java API - Serialization Issue

Posted by santhoma <sa...@yahoo.com>.
I am also facing the same problem. I have implemented Serializable for my
code, but the exception is thrown from third party libraries on which I have
no control . 

Exception in thread "main" org.apache.spark.SparkException: Job aborted:
Task not serializable: java.io.NotSerializableException: (lib class name
here)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)

Is it mandatory that Serializable must be implemented for dependent jars as
well?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Java-API-Serialization-Issue-tp1460p3086.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.