You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Daedalus <tu...@gmail.com> on 2014/06/18 07:11:13 UTC

Un-serializable 3rd-party classes (Spark, Java)

I'm trying to use  matrix-toolkit-java
<https://github.com/fommil/matrix-toolkits-java/>   for an application of
mine, particularly ,the FlexCompRowMatrix class (used to store sparse
matrices).

I have a class Dataframe -- which contains and int array, two double values,
and one FlexCompRowMatrix.

When I try and run a simple Spark .foreach() on a JavaRDD created using a
list of the above mentioned Dataframes, I get the following errors:

Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to s
tage failure:* Task not serializable: java.io.NotSerializableException:
no.uib.ci
pr.matrix.sparse.FlexCompRowMatrix*
        at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DA
GScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)

The FlexCompRowMatrix doesn't seem to implement Serializable. This class
suits my purpose very well, and I would prefer not to switch over from it.

Other than writing code to make the class serializable, and then recompiling
the matrix-toolkit-java source, what options do I have?

Is there any workaround for this issue?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Un-serializable-3rd-party-classes-Spark-Java-tp7815.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Un-serializable 3rd-party classes (Spark, Java)

Posted by Daedalus <tu...@gmail.com>.
Kryo did the job.
Thanks!


On Wed, Jun 18, 2014 at 10:44 AM, Matei Zaharia [via Apache Spark User
List] <ml...@n3.nabble.com> wrote:

> There are a few options:
>
> - Kryo might be able to serialize these objects out of the box, depending
> what’s inside them. Try turning it on as described at
> http://spark.apache.org/docs/latest/tuning.html.
>
> - If that doesn’t work, you can create your own “wrapper” objects that
> implement Serializable, or even a subclass of FlexCompRowMatrix. No need to
> change the original library.
>
> - If the library has its own serialization functions, you could also use
> those inside a wrapper object. Take a look at
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SerializableWritable.scala for
> an example where we make Hadoop’s Writables serializable.
>
> Matei
>
> On Jun 17, 2014, at 10:11 PM, Daedalus <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=7816&i=0>> wrote:
>
> > I'm trying to use  matrix-toolkit-java
> > <https://github.com/fommil/matrix-toolkits-java/>   for an application
> of
> > mine, particularly ,the FlexCompRowMatrix class (used to store sparse
> > matrices).
> >
> > I have a class Dataframe -- which contains and int array, two double
> values,
> > and one FlexCompRowMatrix.
> >
> > When I try and run a simple Spark .foreach() on a JavaRDD created using
> a
> > list of the above mentioned Dataframes, I get the following errors:
> >
> > Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due
> > to s
> > tage failure:* Task not serializable: java.io.NotSerializableException:
> > no.uib.ci
> > pr.matrix.sparse.FlexCompRowMatrix*
> >        at
> > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DA
> > GScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
> >
> > The FlexCompRowMatrix doesn't seem to implement Serializable. This class
> > suits my purpose very well, and I would prefer not to switch over from
> it.
> >
> > Other than writing code to make the class serializable, and then
> recompiling
> > the matrix-toolkit-java source, what options do I have?
> >
> > Is there any workaround for this issue?
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Un-serializable-3rd-party-classes-Spark-Java-tp7815.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Un-serializable-3rd-party-classes-Spark-Java-tp7815p7816.html
>  To unsubscribe from Un-serializable 3rd-party classes (Spark, Java), click
> here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=7815&code=dHVzaGFyLm5hZ2FyYWphbkBnbWFpbC5jb218NzgxNXw2ODI3MDA0MDc=>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Un-serializable-3rd-party-classes-Spark-Java-tp7815p7891.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Un-serializable 3rd-party classes (Spark, Java)

Posted by Matei Zaharia <ma...@gmail.com>.
There are a few options:

- Kryo might be able to serialize these objects out of the box, depending what’s inside them. Try turning it on as described at http://spark.apache.org/docs/latest/tuning.html.

- If that doesn’t work, you can create your own “wrapper” objects that implement Serializable, or even a subclass of FlexCompRowMatrix. No need to change the original library.

- If the library has its own serialization functions, you could also use those inside a wrapper object. Take a look at https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SerializableWritable.scala for an example where we make Hadoop’s Writables serializable.

Matei

On Jun 17, 2014, at 10:11 PM, Daedalus <tu...@gmail.com> wrote:

> I'm trying to use  matrix-toolkit-java
> <https://github.com/fommil/matrix-toolkits-java/>   for an application of
> mine, particularly ,the FlexCompRowMatrix class (used to store sparse
> matrices).
> 
> I have a class Dataframe -- which contains and int array, two double values,
> and one FlexCompRowMatrix.
> 
> When I try and run a simple Spark .foreach() on a JavaRDD created using a
> list of the above mentioned Dataframes, I get the following errors:
> 
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due
> to s
> tage failure:* Task not serializable: java.io.NotSerializableException:
> no.uib.ci
> pr.matrix.sparse.FlexCompRowMatrix*
>        at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DA
> GScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
> 
> The FlexCompRowMatrix doesn't seem to implement Serializable. This class
> suits my purpose very well, and I would prefer not to switch over from it.
> 
> Other than writing code to make the class serializable, and then recompiling
> the matrix-toolkit-java source, what options do I have?
> 
> Is there any workaround for this issue?
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Un-serializable-3rd-party-classes-Spark-Java-tp7815.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.