You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Bijoy Deb <bi...@gmail.com> on 2014/09/05 08:19:05 UTC

NotSerializableException: org.apache.spark.sql.hive.api.java.JavaHiveContext

Hello All,

I am trying to query a Hive table using Spark SQL from my java code,but
getting the following error:





*Caused by: org.apache.spark.SparkException: Job aborted due to stage
failure: Task not serializable: java.io.NotSerializableException:
org.apache.spark.sql.hive.api.java.JavaHiveContext    at
org.apache.spark.scheduler.DAGScheduler.org
<http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)*

I am using Spark 1.0.2.

My code snippet is as below:



*JavaHiveContext hiveContext = null;JavaSparkContext jsCtx =
......;hiveContext = new JavaHiveContext(jsCtx);hiveContext.hql("select
col1,col2 from table1")*

Usually people have been suggesting not to pass any non-serializable object
to Spark closure function (map,reduce,etc.) to avoid it from getting
distributed across multiple machines.But I am not using any closure
functions here,so not
sure how to handle this issue.

Can you please advise how to resolve this problem?

Thanks
Bijoy