You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "vdiwakar.malladi" <vd...@gmail.com> on 2014/11/22 17:32:03 UTC

Getting exception on JavaSchemaRDD; org.apache.spark.SparkException: Task not serializable

Hi

I'm trying to load the parquet file for querying purpose from my web
application. I could able to load it as JavaSchemaRDD. But at the time of
using map function on the JavaSchemaRDD, I'm getting the following
exception. 

The class in which I'm using this code implements Serializable class. Could
anyone let me know the cause.


org.apache.spark.SparkException: Task not serializable

Caused by: org.apache.spark.SparkException: Task not serializable
	at
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
	at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
	at org.apache.spark.SparkContext.clean(SparkContext.scala:1242)
	at org.apache.spark.rdd.RDD.map(RDD.scala:270)
	at org.apache.spark.api.java.JavaRDDLike$class.map(JavaRDDLike.scala:75)
	at org.apache.spark.sql.api.java.JavaSchemaRDD.map(JavaSchemaRDD.scala:42)
... 35 more

Caused by: java.io.NotSerializableException:
org.apache.catalina.session.StandardSessionFacade
	at java.io.ObjectOutputStream.writeObject0(Unknown Source)
	at java.io.ObjectOutputStream.defaultWriteFields(Unknown Source)
	at java.io.ObjectOutputStream.writeSerialData(Unknown Source)
	at java.io.ObjectOutputStream.writeOrdinaryObject(Unknown Source)
	at java.io.ObjectOutputStream.writeObject0(Unknown Source)


Thanks in advance.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Getting-exception-on-JavaSchemaRDD-org-apache-spark-SparkException-Task-not-serializable-tp19558.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Getting exception on JavaSchemaRDD; org.apache.spark.SparkException: Task not serializable

Posted by "vdiwakar.malladi" <vd...@gmail.com>.

Thanks.

After writing it as static inner class, that exception not coming. But
getting snappy related exception. I could see the corresponding dependency
is in the spark assembly jar. Still getting the exception. Any quick
suggestion on this?

Here is the stack trace.

java.lang.UnsatisfiedLinkError:
org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
	at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
	at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:320)
	at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79)
	at
org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125)
	at
org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:207)
	at
org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:83)
	at
org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:68)
	at
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36)
	at
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
	at
org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
	at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809)
	at org.apache.spark.rdd.NewHadoopRDD.<init>(NewHadoopRDD.scala:76)
	at
org.apache.spark.sql.parquet.ParquetTableScan.execute(ParquetTableOperations.scala:118)
	at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:409)
	at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:409)
	at org.apache.spark.sql.SchemaRDD.getDependencies(SchemaRDD.scala:120)
	at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:191)
	at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:189)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.dependencies(RDD.scala:189)
	at org.apache.spark.rdd.RDD.firstParent(RDD.scala:1233)
	at org.apache.spark.sql.SchemaRDD.getPartitions(SchemaRDD.scala:117)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
	at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
	at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1135)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:774)
	at
org.apache.spark.api.java.JavaRDDLike$class.collect(JavaRDDLike.scala:305)
	at org.apache.spark.api.java.JavaRDD.collect(JavaRDD.scala:32)

Thanks in advance.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Getting-exception-while-calling-map-method-on-JavaSchemaRDD-org-apache-spark-SparkException-Task-note-tp19558p19569.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Getting exception on JavaSchemaRDD; org.apache.spark.SparkException: Task not serializable

Posted by Sean Owen <so...@cloudera.com>.

You are declaring an anonymous inner class here. It has a reference to the
containing class even if you don't use it. If the closure cleaner can't
determine it isn't used, this reference will cause everything in the outer
class to serialize. Try rewriting this as a named static inner class .
On Nov 22, 2014 5:23 PM, "vdiwakar.malladi" <vd...@gmail.com>
wrote:

> Thanks for your prompt response.
>
> I'm not using any thing in my map function. please see the below code. For
> sample purpose, I would like to using 'select * from
> '.
>
> This code worked for me in standalone mode. But when I integrated with my
> web application, it is throwing the specified exception.
>
> List<String> sdo = sdoData.map(new Function<Row, String>() {
>
>         public String call(Row row) {
>                 //return row.getString(0);
>                 return null;
>         }
> }).collect();
>
> Thanks in advance.
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Getting-exception-while-calling-map-method-on-JavaSchemaRDD-org-apache-spark-SparkException-Task-note-tp19558p19564.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: Getting exception on JavaSchemaRDD; org.apache.spark.SparkException: Task not serializable

Posted by "vdiwakar.malladi" <vd...@gmail.com>.

Thanks for your prompt response.

I'm not using any thing in my map function. please see the below code. For
sample purpose, I would like to using 'select * from 
'.

This code worked for me in standalone mode. But when I integrated with my
web application, it is throwing the specified exception.

List<String> sdo = sdoData.map(new Function<Row, String>() {

	public String call(Row row) {
		//return row.getString(0);
		return null;
	}
}).collect();

Thanks in advance.




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Getting-exception-while-calling-map-method-on-JavaSchemaRDD-org-apache-spark-SparkException-Task-note-tp19558p19564.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Getting exception on JavaSchemaRDD; org.apache.spark.SparkException: Task not serializable

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

You could be referring/sending the StandardSessionFacade inside your map
function. You could bring the class StandardSessionFacade locally and
Serialize it to get it fixed quickly.

Thanks
Best Regards

On Sat, Nov 22, 2014 at 10:02 PM, vdiwakar.malladi <
vdiwakar.malladi@gmail.com> wrote:

> Hi
>
> I'm trying to load the parquet file for querying purpose from my web
> application. I could able to load it as JavaSchemaRDD. But at the time of
> using map function on the JavaSchemaRDD, I'm getting the following
> exception.
>
> The class in which I'm using this code implements Serializable class. Could
> anyone let me know the cause.
>
>
> org.apache.spark.SparkException: Task not serializable
>
> Caused by: org.apache.spark.SparkException: Task not serializable
>         at
>
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
>         at
> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
>         at org.apache.spark.SparkContext.clean(SparkContext.scala:1242)
>         at org.apache.spark.rdd.RDD.map(RDD.scala:270)
>         at
> org.apache.spark.api.java.JavaRDDLike$class.map(JavaRDDLike.scala:75)
>         at
> org.apache.spark.sql.api.java.JavaSchemaRDD.map(JavaSchemaRDD.scala:42)
> ... 35 more
>
> Caused by: java.io.NotSerializableException:
> org.apache.catalina.session.StandardSessionFacade
>         at java.io.ObjectOutputStream.writeObject0(Unknown Source)
>         at java.io.ObjectOutputStream.defaultWriteFields(Unknown Source)
>         at java.io.ObjectOutputStream.writeSerialData(Unknown Source)
>         at java.io.ObjectOutputStream.writeOrdinaryObject(Unknown Source)
>         at java.io.ObjectOutputStream.writeObject0(Unknown Source)
>
>
> Thanks in advance.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Getting-exception-on-JavaSchemaRDD-org-apache-spark-SparkException-Task-not-serializable-tp19558.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>