You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Chong Zhang <ch...@gmail.com> on 2016/08/23 12:49:17 UTC

question about Broadcast value NullPointerException

Hello,

I'm using Spark streaming to process kafka message, and wants to use a prop
file as the input and broadcast the properties:

val props = new Properties()
props.load(new FileInputStream(args(0)))
val sc = initSparkContext()
val propsBC = sc.broadcast(props)
println(s"propFileBC 1: " + propsBC.value)

val lines = createKStream(sc)
val parsedLines = lines.map (l => {
    println(s"propFileBC 2: " + propsBC.value)
    process(l, propsBC.value)
}).filter(...)

var goodLines = lines.window(2,2)
goodLines.print()


If I run it with spark-submit and master local[2], it works fine.
But if I used the --master spark://master:7077 (2 nodes), the 1st
propsBC.value is printed, but the 2nd print inside the map function causes
null pointer exception:

Caused by: java.lang.NullPointerException
        at test.spark.Main$$anonfun$1.apply(Main.scala:79)
        at test.spark.Main$$anonfun$1.apply(Main.scala:78)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at
org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:284)
        at
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
        at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)

Appreciate any help,  thanks!