You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yadid Ayzenberg <ya...@media.mit.edu> on 2013/10/09 23:15:57 UTC
spark 0.8.0 null pointer exception when accessing mondodb twice

Hi All,

I have successfully accessed my mongodb using spark. after creating a 
NewHadoopRDD and calling the function first() I get the  data correctly 
from the DB.
However, if I call first() a second time (without calling anything else 
in between), spark crashes with the following message:

org.apache.spark.rdd.NewHadoopRDD[java.lang.Object,org.bson.BSONObject] 
= NewHadoopRDD[1] at NewHadoopRDD at <console>:36

scala> a.first()
13/10/09 16:58:49 INFO spark.SparkContext: Starting job: first at 
<console>:39
13/10/09 16:58:49 INFO scheduler.DAGScheduler: Got job 1 (first at 
<console>:39) with 1 output partitions (allowLocal=true)
13/10/09 16:58:49 INFO scheduler.DAGScheduler: Final stage: Stage 1 
(first at <console>:39)
13/10/09 16:58:49 INFO scheduler.DAGScheduler: Parents of final stage: 
List()
13/10/09 16:58:49 INFO scheduler.DAGScheduler: Missing parents: List()
13/10/09 16:58:49 INFO scheduler.DAGScheduler: Computing the requested 
partition locally
13/10/09 16:58:49 INFO rdd.NewHadoopRDD: Input split: 
MongoInputSplit{URI=mongodb://mongo12.mit.edu/local.testCollection 
<http://mongo12.mit.edu/local.testCollection>, keyField=_id, min=null, 
max=null, query={ }, sort={ }, fields={ }, limit=0, skip=0, notimeout=false}
13/10/09 16:58:49 INFO scheduler.DAGScheduler: Failed to run first at 
<console>:39
java.lang.NullPointerException
     at com.mongodb.DBApiLayer$Result.hasNext(DBApiLayer.java:416)
     at com.mongodb.DBCursor._hasNext(DBCursor.java:464)
     at com.mongodb.DBCursor.hasNext(DBCursor.java:484)
     at 
com.mongodb.hadoop.input.MongoRecordReader.nextKeyValue(MongoRecordReader.java:75)
     at 
org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:96)
     at scala.collection.Iterator$$anon$18.hasNext(Iterator.scala:381)
     at scala.collection.Iterator$class.foreach(Iterator.scala:772)
     at scala.collection.Iterator$$anon$18.foreach(Iterator.scala:379)
     at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
     at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:102)
     at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:250)
     at scala.collection.Iterator$$anon$18.toBuffer(Iterator.scala:379)
     at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:237)
     at scala.collection.Iterator$$anon$18.toArray(Iterator.scala:379)
     at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:768)
     at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:768)
     at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:758)
     at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:758)
     at 
org.apache.spark.scheduler.DAGScheduler.runLocallyWithinThread(DAGScheduler.scala:484)
     at 
org.apache.spark.scheduler.DAGScheduler$$anon$2.run(DAGScheduler.scala:470)

Any ideas what Im doing wrong ? is this a mongo driver problem or a 
spark problem?

Best,

Yadid