You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yadid Ayzenberg <ya...@media.mit.edu> on 2013/10/09 23:15:57 UTC
spark 0.8.0 null pointer exception when accessing mondodb twice
Hi All,
I have successfully accessed my mongodb using spark. after creating a
NewHadoopRDD and calling the function first() I get the data correctly
from the DB.
However, if I call first() a second time (without calling anything else
in between), spark crashes with the following message:
org.apache.spark.rdd.NewHadoopRDD[java.lang.Object,org.bson.BSONObject]
= NewHadoopRDD[1] at NewHadoopRDD at <console>:36
scala> a.first()
13/10/09 16:58:49 INFO spark.SparkContext: Starting job: first at
<console>:39
13/10/09 16:58:49 INFO scheduler.DAGScheduler: Got job 1 (first at
<console>:39) with 1 output partitions (allowLocal=true)
13/10/09 16:58:49 INFO scheduler.DAGScheduler: Final stage: Stage 1
(first at <console>:39)
13/10/09 16:58:49 INFO scheduler.DAGScheduler: Parents of final stage:
List()
13/10/09 16:58:49 INFO scheduler.DAGScheduler: Missing parents: List()
13/10/09 16:58:49 INFO scheduler.DAGScheduler: Computing the requested
partition locally
13/10/09 16:58:49 INFO rdd.NewHadoopRDD: Input split:
MongoInputSplit{URI=mongodb://mongo12.mit.edu/local.testCollection
<http://mongo12.mit.edu/local.testCollection>, keyField=_id, min=null,
max=null, query={ }, sort={ }, fields={ }, limit=0, skip=0, notimeout=false}
13/10/09 16:58:49 INFO scheduler.DAGScheduler: Failed to run first at
<console>:39
java.lang.NullPointerException
at com.mongodb.DBApiLayer$Result.hasNext(DBApiLayer.java:416)
at com.mongodb.DBCursor._hasNext(DBCursor.java:464)
at com.mongodb.DBCursor.hasNext(DBCursor.java:484)
at
com.mongodb.hadoop.input.MongoRecordReader.nextKeyValue(MongoRecordReader.java:75)
at
org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:96)
at scala.collection.Iterator$$anon$18.hasNext(Iterator.scala:381)
at scala.collection.Iterator$class.foreach(Iterator.scala:772)
at scala.collection.Iterator$$anon$18.foreach(Iterator.scala:379)
at
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:102)
at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:250)
at scala.collection.Iterator$$anon$18.toBuffer(Iterator.scala:379)
at
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:237)
at scala.collection.Iterator$$anon$18.toArray(Iterator.scala:379)
at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:768)
at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:768)
at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:758)
at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:758)
at
org.apache.spark.scheduler.DAGScheduler.runLocallyWithinThread(DAGScheduler.scala:484)
at
org.apache.spark.scheduler.DAGScheduler$$anon$2.run(DAGScheduler.scala:470)
Any ideas what Im doing wrong ? is this a mongo driver problem or a
spark problem?
Best,
Yadid