You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by anny9699 <an...@gmail.com> on 2014/03/17 18:36:55 UTC

java.lang.NullPointerException met when computing new RDD or use .count

Hi,

I met this exception when computing new RDD from an existing RDD or using
.count on some RDDs. The following is the situation:

val DD1=D.map(d => {
(d._1,D.map(x => math.sqrt(x._2*d._2)).toArray)
})

DD is in the format RDD[(Int,Double)] and the error message is:

org.apache.spark.SparkException: Job aborted: Task 14.0:8 failed more than 0
times; aborting job java.lang.NullPointerException
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)
        at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
        at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)
        at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)
        at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
        at
org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)

I also met this kind of problem when using .count() on some RDDs. 

Thanks a lot!




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-NullPointerException-met-when-computing-new-RDD-or-use-count-tp2766.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: java.lang.NullPointerException met when computing new RDD or use .count

Posted by Ian O'Connell <ia...@ianoconnell.com>.

I'm guessing the other result was wrong, or just never evaluated here. The
RDD transforms being lazy may have let it be expressed, but it wouldn't
work. Nested RDD's are not supported.


On Mon, Mar 17, 2014 at 4:01 PM, anny9699 <an...@gmail.com> wrote:

> Hi Andrew,
>
> Thanks for the reply. However I did almost the same thing in another
> closure:
>
> val simi=dataByRow.map(point => {
> val corrs=dataByRow.map(x => arrCorr(point._2,x._2))
> (point._1,corrs)
> })
>
> here dataByRow is of format RDD[(Int,Array[Double])] and arrCorr is a
> function that I wrote to compute correlation between two scala arrays.
>
> and it worked. So I am a little confused why it worked here and not in
> other
> places.
>
> Thanks!
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-NullPointerException-met-when-computing-new-RDD-or-use-count-tp2766p2779.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: java.lang.NullPointerException met when computing new RDD or use .count

Posted by anny9699 <an...@gmail.com>.

Hi Andrew,

Thanks for the reply. However I did almost the same thing in another
closure:

val simi=dataByRow.map(point => {
val corrs=dataByRow.map(x => arrCorr(point._2,x._2))
(point._1,corrs)
})

here dataByRow is of format RDD[(Int,Array[Double])] and arrCorr is a
function that I wrote to compute correlation between two scala arrays.
 
and it worked. So I am a little confused why it worked here and not in other
places.

Thanks!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-NullPointerException-met-when-computing-new-RDD-or-use-count-tp2766p2779.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: java.lang.NullPointerException met when computing new RDD or use .count

Posted by Andrew Ash <an...@andrewash.com>.

It looks like you're trying to access an RDD ("D") from inside a closure --
the parameter to the first map) which isn't possible with the current
implementation of Spark.  Can you rephrase to not access D from inside the
map call?


On Mon, Mar 17, 2014 at 10:36 AM, anny9699 <an...@gmail.com> wrote:

> Hi,
>
> I met this exception when computing new RDD from an existing RDD or using
> .count on some RDDs. The following is the situation:
>
> val DD1=D.map(d => {
> (d._1,D.map(x => math.sqrt(x._2*d._2)).toArray)
> })
>
> DD is in the format RDD[(Int,Double)] and the error message is:
>
> org.apache.spark.SparkException: Job aborted: Task 14.0:8 failed more than
> 0
> times; aborting job java.lang.NullPointerException
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)
>         at
>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)
>         at
>
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)
>         at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)
>
> I also met this kind of problem when using .count() on some RDDs.
>
> Thanks a lot!
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-NullPointerException-met-when-computing-new-RDD-or-use-count-tp2766.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>