You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Blackeye <bl...@iit.demokritos.gr> on 2014/09/09 12:09:43 UTC

Filter function problem

I have the following code written in scala in Spark:

(inactiveIDs is a RDD[(Int, Seq[String])], persons is a Broadcast[RDD[(Int,
Seq[Event])]] and Event is a class that I have created)

val test = persons.value
  .map{tuple => (tuple._1, tuple._2
  .filter{event => inactiveIDs.filter(event2 => event2._1 ==
tuple._1).count() != 0})}

and the following error:

java.lang.NullPointerException

Any ideas?




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Filter-function-problem-tp13787.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Filter function problem

Posted by Daniel Siegmann <da...@velos.io>.
You should not be broadcasting an RDD. You also should not be passing an
RDD in a lambda to another RDD. If you want, can call RDD.collect and then
broadcast those values (of course you must be able to fit all those values
in memory).

On Tue, Sep 9, 2014 at 6:34 AM, Blackeye <bl...@iit.demokritos.gr> wrote:

> In order to help anyone to answer i could say that i checked the
> inactiveIDs.filter operation seperated, and I found that it doesn't return
> null in any case. In addition i don't how to handle (or check) whether a
> RDD
> is null. I find the debugging to complicated to point the error. Any ideas
> how to find the null pointer?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Filter-function-problem-tp13787p13789.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


-- 
Daniel Siegmann, Software Developer
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
E: daniel.siegmann@velos.io W: www.velos.io

Re: Filter function problem

Posted by Burak Yavuz <by...@stanford.edu>.
Hi,

val test = persons.value
  .map{tuple => (tuple._1, tuple._2
  .filter{event => *****inactiveIDs.filter(event2 => event2._1 == ****
tuple._1).count() != 0})}

Your problem is right between the asterisk. You can't make an RDD operation inside an RDD operation, because RDD's can't be serialized. 
Therefore you are receiving the NullPointerException. Try joining the RDDs based on `event` and then filter based on that.

Best,
Burak

----- Original Message -----
From: "Blackeye" <bl...@iit.demokritos.gr>
To: user@spark.incubator.apache.org
Sent: Tuesday, September 9, 2014 3:34:58 AM
Subject: Re: Filter function problem

In order to help anyone to answer i could say that i checked the
inactiveIDs.filter operation seperated, and I found that it doesn't return
null in any case. In addition i don't how to handle (or check) whether a RDD
is null. I find the debugging to complicated to point the error. Any ideas
how to find the null pointer? 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Filter-function-problem-tp13787p13789.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Filter function problem

Posted by Blackeye <bl...@iit.demokritos.gr>.
In order to help anyone to answer i could say that i checked the
inactiveIDs.filter operation seperated, and I found that it doesn't return
null in any case. In addition i don't how to handle (or check) whether a RDD
is null. I find the debugging to complicated to point the error. Any ideas
how to find the null pointer? 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Filter-function-problem-tp13787p13789.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org