You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Blackeye <bl...@iit.demokritos.gr> on 2014/09/09 12:09:43 UTC
Filter function problem
I have the following code written in scala in Spark:
(inactiveIDs is a RDD[(Int, Seq[String])], persons is a Broadcast[RDD[(Int,
Seq[Event])]] and Event is a class that I have created)
val test = persons.value
.map{tuple => (tuple._1, tuple._2
.filter{event => inactiveIDs.filter(event2 => event2._1 ==
tuple._1).count() != 0})}
and the following error:
java.lang.NullPointerException
Any ideas?
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Filter-function-problem-tp13787.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Filter function problem
Posted by Daniel Siegmann <da...@velos.io>.
You should not be broadcasting an RDD. You also should not be passing an
RDD in a lambda to another RDD. If you want, can call RDD.collect and then
broadcast those values (of course you must be able to fit all those values
in memory).
On Tue, Sep 9, 2014 at 6:34 AM, Blackeye <bl...@iit.demokritos.gr> wrote:
> In order to help anyone to answer i could say that i checked the
> inactiveIDs.filter operation seperated, and I found that it doesn't return
> null in any case. In addition i don't how to handle (or check) whether a
> RDD
> is null. I find the debugging to complicated to point the error. Any ideas
> how to find the null pointer?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Filter-function-problem-tp13787p13789.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
--
Daniel Siegmann, Software Developer
Velos
Accelerating Machine Learning
440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
E: daniel.siegmann@velos.io W: www.velos.io
Re: Filter function problem
Posted by Burak Yavuz <by...@stanford.edu>.
Hi,
val test = persons.value
.map{tuple => (tuple._1, tuple._2
.filter{event => *****inactiveIDs.filter(event2 => event2._1 == ****
tuple._1).count() != 0})}
Your problem is right between the asterisk. You can't make an RDD operation inside an RDD operation, because RDD's can't be serialized.
Therefore you are receiving the NullPointerException. Try joining the RDDs based on `event` and then filter based on that.
Best,
Burak
----- Original Message -----
From: "Blackeye" <bl...@iit.demokritos.gr>
To: user@spark.incubator.apache.org
Sent: Tuesday, September 9, 2014 3:34:58 AM
Subject: Re: Filter function problem
In order to help anyone to answer i could say that i checked the
inactiveIDs.filter operation seperated, and I found that it doesn't return
null in any case. In addition i don't how to handle (or check) whether a RDD
is null. I find the debugging to complicated to point the error. Any ideas
how to find the null pointer?
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Filter-function-problem-tp13787p13789.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Filter function problem
Posted by Blackeye <bl...@iit.demokritos.gr>.
In order to help anyone to answer i could say that i checked the
inactiveIDs.filter operation seperated, and I found that it doesn't return
null in any case. In addition i don't how to handle (or check) whether a RDD
is null. I find the debugging to complicated to point the error. Any ideas
how to find the null pointer?
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Filter-function-problem-tp13787p13789.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org