You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Trident <cw...@vip.qq.com> on 2014/10/10 04:09:41 UTC

[Spark SQL] Strange NPE in Spark SQL with Hive

Hi Community,

      I use Spark 1.0.2, using Spark SQL to do Hive SQL.

      When I run the following code in Spark Shell:

val file = sc.textFile("./README.md")
val count = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_)
count.collect()
‍
      Correct and no error!

      When I run the following code:
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
hiveContext.hql("SHOW TABLES").collect().foreach(println)‍

      Correct and no error!

      But when I run:
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
hiveContext.hql("SELECT COUNT(*) from uservisits").collect().foreach(println)‍

      It comes with some error messages.


      What I found was the following error:      
14/10/09 19:47:34 ERROR Executor: Exception in task ID 4 java.lang.NullPointerException 	at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:594) 	at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:594) 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) 	at org.apache.spark.scheduler.Task.run(Task.scala:51) 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 	at java.lang.Thread.run(Thread.java:745) 14/10/09 19:47:34 INFO CoarseGrainedExecutorBackend: Got assigned task 5 14/10/09 19:47:34 INFO Executor: Running task ID 5 14/10/09 19:47:34 DEBUG BlockManager: Getting local block broadcast_1 14/10/09 19:47:34 DEBUG BlockManager: Level for block broadcast_1 is StorageLevel(true, true, false, true, 1) 14/10/09 19:47:34 DEBUG BlockManager: Getting block broadcast_1 from memory 14/10/09 19:47:34 INFO BlockManager: Found block broadcast_1 locally 14/10/09 19:47:34 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/10/09 19:47:34 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks 14/10/09 19:47:34 DEBUG BlockFetcherIterator$BasicBlockFetcherIterator: Sending request for 2 blocks (2.5 KB) from node19:50868 14/10/09 19:47:34 DEBUG BlockMessageArray: Adding BlockMessage [type = 1, id = shuffle_0_0_1, level = null, data = null] 14/10/09 19:47:34 DEBUG BlockMessageArray: Added BufferMessage(id = 5, size = 34) 14/10/09 19:47:34 DEBUG BlockMessageArray: Adding BlockMessage [type = 1, id = shuffle_0_1_1, level = null, data = null] 14/10/09 19:47:34 DEBUG BlockMessageArray: Added BufferMessage(id = 6, size = 34) 14/10/09 19:47:34 DEBUG BlockMessageArray: Buffer list: 14/10/09 19:47:34 DEBUG BlockMessageArray: java.nio.HeapByteBuffer[pos=0 lim=4 cap=4] 14/10/09 19:47:34 DEBUG BlockMessageArray: java.nio.HeapByteBuffer[pos=0 lim=34 cap=34] 14/10/09 19:47:34 DEBUG BlockMessageArray: java.nio.HeapByteBuffer[pos=0 lim=4 cap=4] 14/10/09 19:47:34 DEBUG BlockMessageArray: java.nio.HeapByteBuffer[pos=0 lim=34 cap=34] 14/10/09 19:47:34 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 1 remote fetches in 2 ms 14/10/09 19:47:34 DEBUG BlockFetcherIterator$BasicBlockFetcherIterator: Got local blocks in  0 ms ms 14/10/09 19:47:34 ERROR Executor: Exception in task ID 5 java.lang.NullPointerException 	at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:594) 	at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:594) 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) 	at org.apache.spark.scheduler.Task.run(Task.scala:51) 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 	at java.lang.Thread.run(Thread.java:745) 14/10/09 19:47:34 INFO CoarseGrainedExecutorBackend: Got assigned task 6 14/10/09 19:47:34 INFO Executor: Running task ID 6 14/10/09 19:47:34 DEBUG BlockManager: Getting local block broadcast_1 14/10/09 19:47:34 DEBUG BlockManager: Level for block broadcast_1 is StorageLevel(true, true, false, true, 1) 14/10/09 19:47:34 DEBUG BlockManager: Getting block broadcast_1 from memory 14/10/09 19:47:34 INFO BlockManager: Found block broadcast_1 locally 14/10/09 19:47:34 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/10/09 19:47:34 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks 14/10/09 19:47:34 DEBUG BlockFetcherIterator$BasicBlockFetcherIterator: Sending request for 2 blocks (2.5 KB) from node19:50868 14/10/09 19:47:34 DEBUG BlockMessageArray: Adding BlockMessage [type = 1, id = shuffle_0_0_1, level = null, data = null] 14/10/09 19:47:34 DEBUG BlockMessageArray: Added BufferMessage(id = 8, size = 34) 14/10/09 19:47:34 DEBUG BlockMessageArray: Adding BlockMessage [type = 1, id = shuffle_0_1_1, level = null, data = null] 14/10/09 19:47:34 DEBUG BlockMessageArray: Added BufferMessage(id = 9, size = 34) 14/10/09 19:47:34 DEBUG BlockMessageArray: Buffer list: 14/10/09 19:47:34 DEBUG BlockMessageArray: java.nio.HeapByteBuffer[pos=0 lim=4 cap=4] 14/10/09 19:47:34 DEBUG BlockMessageArray: java.nio.HeapByteBuffer[pos=0 lim=34 cap=34] 14/10/09 19:47:34 DEBUG BlockMessageArray: java.nio.HeapByteBuffer[pos=0 lim=4 cap=4] 14/10/09 19:47:34 DEBUG BlockMessageArray: java.nio.HeapByteBuffer[pos=0 lim=34 cap=34] 14/10/09 19:47:34 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 1 remote fetches in 2 ms 14/10/09 19:47:34 DEBUG BlockFetcherIterator$BasicBlockFetcherIterator: Got local blocks in  0 ms ms 14/10/09 19:47:34 ERROR Executor: Exception in task ID 6 java.lang.NullPointerException 	at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:594) 	at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:594) 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) 	at org.apache.spark.scheduler.Task.run(Task.scala:51) 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 	at java.lang.Thread.run(Thread.java:745)‍



             What can contribute to this? Is it a known problem?

                                                                                                         Chen Weikeng