You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by swetha <sw...@gmail.com> on 2015/10/07 18:42:38 UTC

Optimal way to avoid processing null returns in Spark Scala

Hi,

I have the following functions that I am using for my job in Scala. If you
see the getSessionId function I am returning null sometimes. If I return
null the only way that I can avoid processing those records is by filtering
out null records. I wanted to avoid having another pass for filtering so I
tried returning "None" . But, it seems to be having issues as it demands the
return type as optional. What is the optimal way to avoid processing null
records and at the same way avoid having Option as the return type using
None? The use of Option[] and Some(()) seems to be having type issues in
subsequent function calls.


    val sessions = filteredStream.transform(rdd=>getBeaconMap(rdd))

  def getBeaconMap(rdd: RDD[(String, String)]): RDD[(String, (Long,
String))] = {
    rdd.map[(String, (Long, String))]{ case (x, y) =>
      ((getSessionId(y), (getTimeStamp(y).toLong,y)))
    }
  }

  def getSessionId(eventRecord:String): String = {
    val beaconTestImpl: BeaconTestLoader = new BeaconTestImpl//This needs to
be changed.
    val beaconEvent: BeaconEventData =
beaconTestImpl.getBeaconEventData(eventRecord)

    if(beaconEvent!=null){
       beaconEvent.getSessionID //This might be in Set Cookie header
    }else{
     null
}


    val groupedAndSortedSessions =
sessions.transform(rdd=>ExpoJobCommonNew.getGroupedAndSortedSessions(rdd))




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Optimal-way-to-avoid-processing-null-returns-in-Spark-Scala-tp24972.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Optimal way to avoid processing null returns in Spark Scala

Posted by Iulian Dragoș <iu...@typesafe.com>.
On Wed, Oct 7, 2015 at 6:42 PM, swetha <sw...@gmail.com> wrote:

Hi,
>
> I have the following functions that I am using for my job in Scala. If you
> see the getSessionId function I am returning null sometimes. If I return
> null the only way that I can avoid processing those records is by filtering
> out null records. I wanted to avoid having another pass for filtering so I
> tried returning "None" . But, it seems to be having issues as it demands
> the
> return type as optional. What is the optimal way to avoid processing null
> records and at the same way avoid having Option as the return type using
> None? The use of Option[] and Some(()) seems to be having type issues in
> subsequent function calls.
>
You should use RDD.flatMap, this way you can map and filter at the same
time. Something like

rdd.flatMap { case (x, y) =>
  val sessionid = getSessionId(y)
  if (sessionId != null)
      Seq(((sessionId, (getTimeStamp(y).toLong,y))))
  else
      Seq()
}

I didn’t try to compile that method, but you’ll figure out the types, if
need be.

iulian


>
>     val sessions = filteredStream.transform(rdd=>getBeaconMap(rdd))
>
>   def getBeaconMap(rdd: RDD[(String, String)]): RDD[(String, (Long,
> String))] = {
>     rdd.map[(String, (Long, String))]{ case (x, y) =>
>       ((getSessionId(y), (getTimeStamp(y).toLong,y)))
>     }
>   }
>
>   def getSessionId(eventRecord:String): String = {
>     val beaconTestImpl: BeaconTestLoader = new BeaconTestImpl//This needs
> to
> be changed.
>     val beaconEvent: BeaconEventData =
> beaconTestImpl.getBeaconEventData(eventRecord)
>
>     if(beaconEvent!=null){
>        beaconEvent.getSessionID //This might be in Set Cookie header
>     }else{
>      null
> }
>
>
>     val groupedAndSortedSessions =
> sessions.transform(rdd=>ExpoJobCommonNew.getGroupedAndSortedSessions(rdd))
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Optimal-way-to-avoid-processing-null-returns-in-Spark-Scala-tp24972.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
> ​
-- 

--
Iulian Dragos

------
Reactive Apps on the JVM
www.typesafe.com