You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Afshartous, Nick" <na...@turbine.com> on 2015/10/08 19:10:29 UTC

Using Sqark SQL mapping over an RDD

Hi,

Am using Spark, 1.5 in latest EMR 4.1.

I have an RDD of String

   scala> deviceIds
      res25: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[18] at map at <console>:28

and then when trying to map over the RDD while attempting to run a sql query the result is a NullPointerException

  scala> deviceIds.map(id => sqlContext.sql("select * from ad_info")).count()

with the stack trace below.  If I run the query as a top level expression the count is retuned.  There was additional code within
the anonymous function that's been removed to try and isolate.

Thanks for any insights or advice on how to debug this.
--
      Nick


scala> deviceIds.map(id => sqlContext.sql("select * from ad_info")).count()
deviceIds.map(id => sqlContext.sql("select * from ad_info")).count()
15/10/08 16:12:56 INFO SparkContext: Starting job: count at <console>:40
15/10/08 16:12:56 INFO DAGScheduler: Got job 18 (count at <console>:40) with 200 output partitions
15/10/08 16:12:56 INFO DAGScheduler: Final stage: ResultStage 37(count at <console>:40)
15/10/08 16:12:56 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 36)
15/10/08 16:12:56 INFO DAGScheduler: Missing parents: List()
15/10/08 16:12:56 INFO DAGScheduler: Submitting ResultStage 37 (MapPartitionsRDD[37] at map at <console>:40), which has no missing parents
15/10/08 16:12:56 INFO MemoryStore: ensureFreeSpace(17904) called with curMem=531894, maxMem=560993402
15/10/08 16:12:56 INFO MemoryStore: Block broadcast_22 stored as values in memory (estimated size 17.5 KB, free 534.5 MB)
15/10/08 16:12:56 INFO MemoryStore: ensureFreeSpace(7143) called with curMem=549798, maxMem=560993402
15/10/08 16:12:56 INFO MemoryStore: Block broadcast_22_piece0 stored as bytes in memory (estimated size 7.0 KB, free 534.5 MB)
15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in memory on 10.247.0.117:33555 (size: 7.0 KB, free: 535.0 MB)
15/10/08 16:12:56 INFO SparkContext: Created broadcast 22 from broadcast at DAGScheduler.scala:861
15/10/08 16:12:56 INFO DAGScheduler: Submitting 200 missing tasks from ResultStage 37 (MapPartitionsRDD[37] at map at <console>:40)
15/10/08 16:12:56 INFO YarnScheduler: Adding task set 37.0 with 200 tasks
15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.0 in stage 37.0 (TID 649, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.0 in stage 37.0 (TID 650, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in memory on ip-10-247-0-117.ec2.internal:46227 (size: 7.0 KB, free: 535.0 MB)
15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in memory on ip-10-247-0-117.ec2.internal:32938 (size: 7.0 KB, free: 535.0 MB)
15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.0 in stage 37.0 (TID 651, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 WARN TaskSetManager: Lost task 0.0 in stage 37.0 (TID 649, ip-10-247-0-117.ec2.internal): java.lang.NullPointerException
        at $line101.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
        at $line101.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.1 in stage 37.0 (TID 652, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.0 in stage 37.0 (TID 650) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 1]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.1 in stage 37.0 (TID 653, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.0 in stage 37.0 (TID 651) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 2]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.1 in stage 37.0 (TID 654, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.1 in stage 37.0 (TID 652) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 3]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.2 in stage 37.0 (TID 655, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.1 in stage 37.0 (TID 653) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 4]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.2 in stage 37.0 (TID 656, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.1 in stage 37.0 (TID 654) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 5]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.2 in stage 37.0 (TID 657, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.2 in stage 37.0 (TID 655) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 6]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.3 in stage 37.0 (TID 658, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.2 in stage 37.0 (TID 657) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 7]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.3 in stage 37.0 (TID 659, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.2 in stage 37.0 (TID 656) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 8]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.3 in stage 37.0 (TID 660, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.3 in stage 37.0 (TID 658) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 9]
15/10/08 16:12:56 ERROR TaskSetManager: Task 0 in stage 37.0 failed 4 times; aborting job
15/10/08 16:12:56 INFO YarnScheduler: Cancelling stage 37
15/10/08 16:12:56 INFO YarnScheduler: Stage 37 was cancelled
15/10/08 16:12:56 INFO DAGScheduler: ResultStage 37 (count at <console>:40) failed in 0.128 s
15/10/08 16:12:56 INFO DAGScheduler: Job 18 failed: count at <console>:40, took 0.145419 s
15/10/08 16:12:56 WARN TaskSetManager: Lost task 2.3 in stage 37.0 (TID 659, ip-10-247-0-117.ec2.internal): TaskKilled (killed intentionally)
15/10/08 16:12:56 WARN TaskSetManager: Lost task 1.3 in stage 37.0 (TID 660, ip-10-247-0-117.ec2.internal): TaskKilled (killed intentionally)
15/10/08 16:12:56 INFO YarnScheduler: Removed TaskSet 37.0, whose tasks have all completed, from pool
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 37.0 failed 4 times, most recent failure: Lost task 0.3 in stage 37.0 (TID 658, ip-10-247-0-117.ec2.internal): java.lang.NullPointerException
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1280)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1268)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1267)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1267)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1493)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1455)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1444)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1813)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1826)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1839)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1910)
        at org.apache.spark.rdd.RDD.count(RDD.scala:1121)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:45)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:47)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:49)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:55)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:57)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:59)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:61)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:63)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:65)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:67)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:69)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:71)
        at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:73)
        at $iwC$$iwC$$iwC$$iwC.<init>(<console>:75)
        at $iwC$$iwC$$iwC.<init>(<console>:77)
        at $iwC$$iwC.<init>(<console>:79)
        at $iwC.<init>(<console>:81)
        at <init>(<console>:83)
        at .<init>(<console>:87)
        at .<clinit>(<console>)
        at .<init>(<console>:7)
        at .<clinit>(<console>)
        at $print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
        at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
        at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
        at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
        at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
        at org.apache.spark.repl.Main$.main(Main.scala:31)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NullPointerException
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)


scala> 15/10/08 16:13:45 INFO ContextCleaner: Cleaned accumulator 34
15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on 10.247.0.117:33555 in memory (size: 7.0 KB, free: 535.0 MB)
15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on ip-10-247-0-117.ec2.internal:46227 in memory (size: 7.0 KB, free: 535.0 MB)
15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on ip-10-247-0-117.ec2.internal:32938 in memory (size: 7.0 KB, free: 535.0 MB)



scala>

Notice: This communication is for the intended recipient(s) only and may contain confidential, proprietary, legally protected or privileged information of Turbine, Inc. If you are not the intended recipient(s), please notify the sender at once and delete this communication. Unauthorized use of the information in this communication is strictly prohibited and may be unlawful. For those recipients under contract with Turbine, Inc., the information in this communication is subject to the terms and conditions of any applicable contracts or agreements.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Using Sqark SQL mapping over an RDD

Posted by Michael Armbrust <mi...@databricks.com>.
You are probably looking for a groupby instead:

sqlContext.sql("SELECT COUNT(*) FROM ad_info GROUP BY deviceId")

On Thu, Oct 8, 2015 at 10:27 AM, Afshartous, Nick <na...@turbine.com>
wrote:

>
> > You can't do nested operations on RDDs or DataFrames (i.e. you can't
> create a DataFrame from within a map function).  Perhaps if you explain
> what you are trying to accomplish someone can suggest another way.
>
> The code below what I had in mind.  For each Id, I'd like to run a query
> using the Id in the where clause, and then depending on the result possibly
> run a second query.  Either the result of the first or second query
> will be used to construct the output of the map function.
>
> Thanks for any suggestions,
> --
>       Nick
>
>
> val result = deviceIds.map(deviceId => {
>    val withAnalyticsId = sqlContext.sql(
>        "select * from ad_info where deviceId = '%1s' and analyticsId <>
> 'null' order by messageTime asc limit 1" format (deviceId))
>
>    if (withAnalyticsId.count() > 0) {
>        withAnalyticsId.take(1)(0)
>    }
>    else {
>        val withoutAnalyticsId = sqlContext.sql("select * from ad_info
> where deviceId = '%1s' order by messageTime desc limit 1" format (deviceId))
>
>        withoutAnalyticsId.take(1)(0)
>    }
> })
>
>
>
>
> ________________________________________
> From: Michael Armbrust [michael@databricks.com]
> Sent: Thursday, October 08, 2015 1:16 PM
> To: Afshartous, Nick
> Cc: user@spark.apache.org
> Subject: Re: Using Sqark SQL mapping over an RDD
>
> You can't do nested operations on RDDs or DataFrames (i.e. you can't
> create a DataFrame from within a map function).  Perhaps if you explain
> what you are trying to accomplish someone can suggest another way.
>
> On Thu, Oct 8, 2015 at 10:10 AM, Afshartous, Nick <nafshartous@turbine.com
> <ma...@turbine.com>> wrote:
>
> Hi,
>
> Am using Spark, 1.5 in latest EMR 4.1.
>
> I have an RDD of String
>
>    scala> deviceIds
>       res25: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[18] at
> map at <console>:28
>
> and then when trying to map over the RDD while attempting to run a sql
> query the result is a NullPointerException
>
>   scala> deviceIds.map(id => sqlContext.sql("select * from
> ad_info")).count()
>
> with the stack trace below.  If I run the query as a top level expression
> the count is retuned.  There was additional code within
> the anonymous function that's been removed to try and isolate.
>
> Thanks for any insights or advice on how to debug this.
> --
>       Nick
>
>
> scala> deviceIds.map(id => sqlContext.sql("select * from ad_info")).count()
> deviceIds.map(id => sqlContext.sql("select * from ad_info")).count()
> 15/10/08 16:12:56 INFO SparkContext: Starting job: count at <console>:40
> 15/10/08 16:12:56 INFO DAGScheduler: Got job 18 (count at <console>:40)
> with 200 output partitions
> 15/10/08 16:12:56 INFO DAGScheduler: Final stage: ResultStage 37(count at
> <console>:40)
> 15/10/08 16:12:56 INFO DAGScheduler: Parents of final stage:
> List(ShuffleMapStage 36)
> 15/10/08 16:12:56 INFO DAGScheduler: Missing parents: List()
> 15/10/08 16:12:56 INFO DAGScheduler: Submitting ResultStage 37
> (MapPartitionsRDD[37] at map at <console>:40), which has no missing parents
> 15/10/08 16:12:56 INFO MemoryStore: ensureFreeSpace(17904) called with
> curMem=531894, maxMem=560993402
> 15/10/08 16:12:56 INFO MemoryStore: Block broadcast_22 stored as values in
> memory (estimated size 17.5 KB, free 534.5 MB)
> 15/10/08 16:12:56 INFO MemoryStore: ensureFreeSpace(7143) called with
> curMem=549798, maxMem=560993402
> 15/10/08 16:12:56 INFO MemoryStore: Block broadcast_22_piece0 stored as
> bytes in memory (estimated size 7.0 KB, free 534.5 MB)
> 15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in
> memory on 10.247.0.117:33555<http://10.247.0.117:33555> (size: 7.0 KB,
> free: 535.0 MB)
> 15/10/08 16:12:56 INFO SparkContext: Created broadcast 22 from broadcast
> at DAGScheduler.scala:861
> 15/10/08 16:12:56 INFO DAGScheduler: Submitting 200 missing tasks from
> ResultStage 37 (MapPartitionsRDD[37] at map at <console>:40)
> 15/10/08 16:12:56 INFO YarnScheduler: Adding task set 37.0 with 200 tasks
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.0 in stage 37.0
> (TID 649, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.0 in stage 37.0
> (TID 650, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in
> memory on ip-10-247-0-117.ec2.internal:46227 (size: 7.0 KB, free: 535.0 MB)
> 15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in
> memory on ip-10-247-0-117.ec2.internal:32938 (size: 7.0 KB, free: 535.0 MB)
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.0 in stage 37.0
> (TID 651, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 WARN TaskSetManager: Lost task 0.0 in stage 37.0 (TID
> 649, ip-10-247-0-117.ec2.internal): java.lang.NullPointerException
>         at
> $line101.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
>         at
> $line101.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>         at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555)
>         at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
>         at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
>         at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
>         at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
>
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.1 in stage 37.0
> (TID 652, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.0 in stage 37.0 (TID
> 650) on executor ip-10-247-0-117.ec2.internal:
> java.lang.NullPointerException (null) [duplicate 1]
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.1 in stage 37.0
> (TID 653, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.0 in stage 37.0 (TID
> 651) on executor ip-10-247-0-117.ec2.internal:
> java.lang.NullPointerException (null) [duplicate 2]
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.1 in stage 37.0
> (TID 654, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.1 in stage 37.0 (TID
> 652) on executor ip-10-247-0-117.ec2.internal:
> java.lang.NullPointerException (null) [duplicate 3]
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.2 in stage 37.0
> (TID 655, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.1 in stage 37.0 (TID
> 653) on executor ip-10-247-0-117.ec2.internal:
> java.lang.NullPointerException (null) [duplicate 4]
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.2 in stage 37.0
> (TID 656, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.1 in stage 37.0 (TID
> 654) on executor ip-10-247-0-117.ec2.internal:
> java.lang.NullPointerException (null) [duplicate 5]
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.2 in stage 37.0
> (TID 657, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.2 in stage 37.0 (TID
> 655) on executor ip-10-247-0-117.ec2.internal:
> java.lang.NullPointerException (null) [duplicate 6]
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.3 in stage 37.0
> (TID 658, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.2 in stage 37.0 (TID
> 657) on executor ip-10-247-0-117.ec2.internal:
> java.lang.NullPointerException (null) [duplicate 7]
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.3 in stage 37.0
> (TID 659, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.2 in stage 37.0 (TID
> 656) on executor ip-10-247-0-117.ec2.internal:
> java.lang.NullPointerException (null) [duplicate 8]
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.3 in stage 37.0
> (TID 660, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.3 in stage 37.0 (TID
> 658) on executor ip-10-247-0-117.ec2.internal:
> java.lang.NullPointerException (null) [duplicate 9]
> 15/10/08 16:12:56 ERROR TaskSetManager: Task 0 in stage 37.0 failed 4
> times; aborting job
> 15/10/08 16:12:56 INFO YarnScheduler: Cancelling stage 37
> 15/10/08 16:12:56 INFO YarnScheduler: Stage 37 was cancelled
> 15/10/08 16:12:56 INFO DAGScheduler: ResultStage 37 (count at
> <console>:40) failed in 0.128 s
> 15/10/08 16:12:56 INFO DAGScheduler: Job 18 failed: count at <console>:40,
> took 0.145419 s
> 15/10/08 16:12:56 WARN TaskSetManager: Lost task 2.3 in stage 37.0 (TID
> 659, ip-10-247-0-117.ec2.internal): TaskKilled (killed intentionally)
> 15/10/08 16:12:56 WARN TaskSetManager: Lost task 1.3 in stage 37.0 (TID
> 660, ip-10-247-0-117.ec2.internal): TaskKilled (killed intentionally)
> 15/10/08 16:12:56 INFO YarnScheduler: Removed TaskSet 37.0, whose tasks
> have all completed, from pool
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 37.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 37.0 (TID 658, ip-10-247-0-117.ec2.internal): java.lang.NullPointerException
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>         at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555)
>         at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
>         at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
>         at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
>         at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
>
> Driver stacktrace:
>         at org.apache.spark.scheduler.DAGScheduler.org<
> http://org.apache.spark.scheduler.DAGScheduler.org
> >$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1280)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1268)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1267)
>         at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1267)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
>         at scala.Option.foreach(Option.scala:236)
>         at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1493)
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1455)
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1444)
>         at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>         at
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1813)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1826)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1839)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1910)
>         at org.apache.spark.rdd.RDD.count(RDD.scala:1121)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:45)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:47)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:49)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:55)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:57)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:59)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:61)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:63)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:65)
>         at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:67)
>         at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:69)
>         at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:71)
>         at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:73)
>         at $iwC$$iwC$$iwC$$iwC.<init>(<console>:75)
>         at $iwC$$iwC$$iwC.<init>(<console>:77)
>         at $iwC$$iwC.<init>(<console>:79)
>         at $iwC.<init>(<console>:81)
>         at <init>(<console>:83)
>         at .<init>(<console>:87)
>         at .<clinit>(<console>)
>         at .<init>(<console>:7)
>         at .<clinit>(<console>)
>         at $print(<console>)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>         at
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
>         at
> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>         at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>         at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>         at
> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>         at
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>         at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>         at
> org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>         at
> org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>         at org.apache.spark.repl.SparkILoop.org<
> http://org.apache.spark.repl.SparkILoop.org
> >$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>         at
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>         at
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>         at
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>         at
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>         at org.apache.spark.repl.SparkILoop.org<
> http://org.apache.spark.repl.SparkILoop.org
> >$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>         at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>         at org.apache.spark.repl.Main$.main(Main.scala:31)
>         at org.apache.spark.repl.Main.main(Main.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
>         at
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>         at
> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.NullPointerException
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>         at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555)
>         at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
>         at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
>         at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
>         at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
>
>
> scala> 15/10/08 16:13:45 INFO ContextCleaner: Cleaned accumulator 34
> 15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on
> 10.247.0.117:33555<http://10.247.0.117:33555> in memory (size: 7.0 KB,
> free: 535.0 MB)
> 15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on
> ip-10-247-0-117.ec2.internal:46227 in memory (size: 7.0 KB, free: 535.0 MB)
> 15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on
> ip-10-247-0-117.ec2.internal:32938 in memory (size: 7.0 KB, free: 535.0 MB)
>
>
>
> scala>
>
> Notice: This communication is for the intended recipient(s) only and may
> contain confidential, proprietary, legally protected or privileged
> information of Turbine, Inc. If you are not the intended recipient(s),
> please notify the sender at once and delete this communication.
> Unauthorized use of the information in this communication is strictly
> prohibited and may be unlawful. For those recipients under contract with
> Turbine, Inc., the information in this communication is subject to the
> terms and conditions of any applicable contracts or agreements.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<mailto:
> user-unsubscribe@spark.apache.org>
> For additional commands, e-mail: user-help@spark.apache.org<mailto:
> user-help@spark.apache.org>
>
>
>
> Notice: This communication is for the intended recipient(s) only and may
> contain confidential, proprietary, legally protected or privileged
> information of Turbine, Inc. If you are not the intended recipient(s),
> please notify the sender at once and delete this communication.
> Unauthorized use of the information in this communication is strictly
> prohibited and may be unlawful. For those recipients under contract with
> Turbine, Inc., the information in this communication is subject to the
> terms and conditions of any applicable contracts or agreements.
>

RE: Using Sqark SQL mapping over an RDD

Posted by "Afshartous, Nick" <na...@turbine.com>.
> You can't do nested operations on RDDs or DataFrames (i.e. you can't create a DataFrame from within a map function).  Perhaps if you explain what you are trying to accomplish someone can suggest another way.

The code below what I had in mind.  For each Id, I'd like to run a query using the Id in the where clause, and then depending on the result possibly run a second query.  Either the result of the first or second query
will be used to construct the output of the map function.

Thanks for any suggestions,
--
      Nick


val result = deviceIds.map(deviceId => {
   val withAnalyticsId = sqlContext.sql(
       "select * from ad_info where deviceId = '%1s' and analyticsId <> 'null' order by messageTime asc limit 1" format (deviceId))

   if (withAnalyticsId.count() > 0) {
       withAnalyticsId.take(1)(0)
   }
   else {
       val withoutAnalyticsId = sqlContext.sql("select * from ad_info where deviceId = '%1s' order by messageTime desc limit 1" format (deviceId))

       withoutAnalyticsId.take(1)(0)
   }
})




________________________________________
From: Michael Armbrust [michael@databricks.com]
Sent: Thursday, October 08, 2015 1:16 PM
To: Afshartous, Nick
Cc: user@spark.apache.org
Subject: Re: Using Sqark SQL mapping over an RDD

You can't do nested operations on RDDs or DataFrames (i.e. you can't create a DataFrame from within a map function).  Perhaps if you explain what you are trying to accomplish someone can suggest another way.

On Thu, Oct 8, 2015 at 10:10 AM, Afshartous, Nick <na...@turbine.com>> wrote:

Hi,

Am using Spark, 1.5 in latest EMR 4.1.

I have an RDD of String

   scala> deviceIds
      res25: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[18] at map at <console>:28

and then when trying to map over the RDD while attempting to run a sql query the result is a NullPointerException

  scala> deviceIds.map(id => sqlContext.sql("select * from ad_info")).count()

with the stack trace below.  If I run the query as a top level expression the count is retuned.  There was additional code within
the anonymous function that's been removed to try and isolate.

Thanks for any insights or advice on how to debug this.
--
      Nick


scala> deviceIds.map(id => sqlContext.sql("select * from ad_info")).count()
deviceIds.map(id => sqlContext.sql("select * from ad_info")).count()
15/10/08 16:12:56 INFO SparkContext: Starting job: count at <console>:40
15/10/08 16:12:56 INFO DAGScheduler: Got job 18 (count at <console>:40) with 200 output partitions
15/10/08 16:12:56 INFO DAGScheduler: Final stage: ResultStage 37(count at <console>:40)
15/10/08 16:12:56 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 36)
15/10/08 16:12:56 INFO DAGScheduler: Missing parents: List()
15/10/08 16:12:56 INFO DAGScheduler: Submitting ResultStage 37 (MapPartitionsRDD[37] at map at <console>:40), which has no missing parents
15/10/08 16:12:56 INFO MemoryStore: ensureFreeSpace(17904) called with curMem=531894, maxMem=560993402
15/10/08 16:12:56 INFO MemoryStore: Block broadcast_22 stored as values in memory (estimated size 17.5 KB, free 534.5 MB)
15/10/08 16:12:56 INFO MemoryStore: ensureFreeSpace(7143) called with curMem=549798, maxMem=560993402
15/10/08 16:12:56 INFO MemoryStore: Block broadcast_22_piece0 stored as bytes in memory (estimated size 7.0 KB, free 534.5 MB)
15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in memory on 10.247.0.117:33555<http://10.247.0.117:33555> (size: 7.0 KB, free: 535.0 MB)
15/10/08 16:12:56 INFO SparkContext: Created broadcast 22 from broadcast at DAGScheduler.scala:861
15/10/08 16:12:56 INFO DAGScheduler: Submitting 200 missing tasks from ResultStage 37 (MapPartitionsRDD[37] at map at <console>:40)
15/10/08 16:12:56 INFO YarnScheduler: Adding task set 37.0 with 200 tasks
15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.0 in stage 37.0 (TID 649, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.0 in stage 37.0 (TID 650, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in memory on ip-10-247-0-117.ec2.internal:46227 (size: 7.0 KB, free: 535.0 MB)
15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in memory on ip-10-247-0-117.ec2.internal:32938 (size: 7.0 KB, free: 535.0 MB)
15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.0 in stage 37.0 (TID 651, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 WARN TaskSetManager: Lost task 0.0 in stage 37.0 (TID 649, ip-10-247-0-117.ec2.internal): java.lang.NullPointerException
        at $line101.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
        at $line101.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.1 in stage 37.0 (TID 652, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.0 in stage 37.0 (TID 650) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 1]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.1 in stage 37.0 (TID 653, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.0 in stage 37.0 (TID 651) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 2]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.1 in stage 37.0 (TID 654, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.1 in stage 37.0 (TID 652) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 3]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.2 in stage 37.0 (TID 655, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.1 in stage 37.0 (TID 653) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 4]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.2 in stage 37.0 (TID 656, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.1 in stage 37.0 (TID 654) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 5]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.2 in stage 37.0 (TID 657, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.2 in stage 37.0 (TID 655) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 6]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.3 in stage 37.0 (TID 658, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.2 in stage 37.0 (TID 657) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 7]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.3 in stage 37.0 (TID 659, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.2 in stage 37.0 (TID 656) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 8]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.3 in stage 37.0 (TID 660, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.3 in stage 37.0 (TID 658) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 9]
15/10/08 16:12:56 ERROR TaskSetManager: Task 0 in stage 37.0 failed 4 times; aborting job
15/10/08 16:12:56 INFO YarnScheduler: Cancelling stage 37
15/10/08 16:12:56 INFO YarnScheduler: Stage 37 was cancelled
15/10/08 16:12:56 INFO DAGScheduler: ResultStage 37 (count at <console>:40) failed in 0.128 s
15/10/08 16:12:56 INFO DAGScheduler: Job 18 failed: count at <console>:40, took 0.145419 s
15/10/08 16:12:56 WARN TaskSetManager: Lost task 2.3 in stage 37.0 (TID 659, ip-10-247-0-117.ec2.internal): TaskKilled (killed intentionally)
15/10/08 16:12:56 WARN TaskSetManager: Lost task 1.3 in stage 37.0 (TID 660, ip-10-247-0-117.ec2.internal): TaskKilled (killed intentionally)
15/10/08 16:12:56 INFO YarnScheduler: Removed TaskSet 37.0, whose tasks have all completed, from pool
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 37.0 failed 4 times, most recent failure: Lost task 0.3 in stage 37.0 (TID 658, ip-10-247-0-117.ec2.internal): java.lang.NullPointerException
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org<http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1280)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1268)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1267)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1267)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1493)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1455)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1444)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1813)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1826)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1839)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1910)
        at org.apache.spark.rdd.RDD.count(RDD.scala:1121)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:45)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:47)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:49)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:55)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:57)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:59)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:61)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:63)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:65)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:67)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:69)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:71)
        at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:73)
        at $iwC$$iwC$$iwC$$iwC.<init>(<console>:75)
        at $iwC$$iwC$$iwC.<init>(<console>:77)
        at $iwC$$iwC.<init>(<console>:79)
        at $iwC.<init>(<console>:81)
        at <init>(<console>:83)
        at .<init>(<console>:87)
        at .<clinit>(<console>)
        at .<init>(<console>:7)
        at .<clinit>(<console>)
        at $print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
        at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
        at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
        at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
        at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
        at org.apache.spark.repl.SparkILoop.org<http://org.apache.spark.repl.SparkILoop.org>$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
        at org.apache.spark.repl.SparkILoop.org<http://org.apache.spark.repl.SparkILoop.org>$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
        at org.apache.spark.repl.Main$.main(Main.scala:31)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NullPointerException
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)


scala> 15/10/08 16:13:45 INFO ContextCleaner: Cleaned accumulator 34
15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on 10.247.0.117:33555<http://10.247.0.117:33555> in memory (size: 7.0 KB, free: 535.0 MB)
15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on ip-10-247-0-117.ec2.internal:46227 in memory (size: 7.0 KB, free: 535.0 MB)
15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on ip-10-247-0-117.ec2.internal:32938 in memory (size: 7.0 KB, free: 535.0 MB)



scala>

Notice: This communication is for the intended recipient(s) only and may contain confidential, proprietary, legally protected or privileged information of Turbine, Inc. If you are not the intended recipient(s), please notify the sender at once and delete this communication. Unauthorized use of the information in this communication is strictly prohibited and may be unlawful. For those recipients under contract with Turbine, Inc., the information in this communication is subject to the terms and conditions of any applicable contracts or agreements.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<ma...@spark.apache.org>



Notice: This communication is for the intended recipient(s) only and may contain confidential, proprietary, legally protected or privileged information of Turbine, Inc. If you are not the intended recipient(s), please notify the sender at once and delete this communication. Unauthorized use of the information in this communication is strictly prohibited and may be unlawful. For those recipients under contract with Turbine, Inc., the information in this communication is subject to the terms and conditions of any applicable contracts or agreements.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Using Sqark SQL mapping over an RDD

Posted by Michael Armbrust <mi...@databricks.com>.
You can't do nested operations on RDDs or DataFrames (i.e. you can't create
a DataFrame from within a map function).  Perhaps if you explain what you
are trying to accomplish someone can suggest another way.

On Thu, Oct 8, 2015 at 10:10 AM, Afshartous, Nick <na...@turbine.com>
wrote:

>
> Hi,
>
> Am using Spark, 1.5 in latest EMR 4.1.
>
> I have an RDD of String
>
>    scala> deviceIds
>       res25: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[18] at
> map at <console>:28
>
> and then when trying to map over the RDD while attempting to run a sql
> query the result is a NullPointerException
>
>   scala> deviceIds.map(id => sqlContext.sql("select * from
> ad_info")).count()
>
> with the stack trace below.  If I run the query as a top level expression
> the count is retuned.  There was additional code within
> the anonymous function that's been removed to try and isolate.
>
> Thanks for any insights or advice on how to debug this.
> --
>       Nick
>
>
> scala> deviceIds.map(id => sqlContext.sql("select * from ad_info")).count()
> deviceIds.map(id => sqlContext.sql("select * from ad_info")).count()
> 15/10/08 16:12:56 INFO SparkContext: Starting job: count at <console>:40
> 15/10/08 16:12:56 INFO DAGScheduler: Got job 18 (count at <console>:40)
> with 200 output partitions
> 15/10/08 16:12:56 INFO DAGScheduler: Final stage: ResultStage 37(count at
> <console>:40)
> 15/10/08 16:12:56 INFO DAGScheduler: Parents of final stage:
> List(ShuffleMapStage 36)
> 15/10/08 16:12:56 INFO DAGScheduler: Missing parents: List()
> 15/10/08 16:12:56 INFO DAGScheduler: Submitting ResultStage 37
> (MapPartitionsRDD[37] at map at <console>:40), which has no missing parents
> 15/10/08 16:12:56 INFO MemoryStore: ensureFreeSpace(17904) called with
> curMem=531894, maxMem=560993402
> 15/10/08 16:12:56 INFO MemoryStore: Block broadcast_22 stored as values in
> memory (estimated size 17.5 KB, free 534.5 MB)
> 15/10/08 16:12:56 INFO MemoryStore: ensureFreeSpace(7143) called with
> curMem=549798, maxMem=560993402
> 15/10/08 16:12:56 INFO MemoryStore: Block broadcast_22_piece0 stored as
> bytes in memory (estimated size 7.0 KB, free 534.5 MB)
> 15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in
> memory on 10.247.0.117:33555 (size: 7.0 KB, free: 535.0 MB)
> 15/10/08 16:12:56 INFO SparkContext: Created broadcast 22 from broadcast
> at DAGScheduler.scala:861
> 15/10/08 16:12:56 INFO DAGScheduler: Submitting 200 missing tasks from
> ResultStage 37 (MapPartitionsRDD[37] at map at <console>:40)
> 15/10/08 16:12:56 INFO YarnScheduler: Adding task set 37.0 with 200 tasks
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.0 in stage 37.0
> (TID 649, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.0 in stage 37.0
> (TID 650, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in
> memory on ip-10-247-0-117.ec2.internal:46227 (size: 7.0 KB, free: 535.0 MB)
> 15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in
> memory on ip-10-247-0-117.ec2.internal:32938 (size: 7.0 KB, free: 535.0 MB)
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.0 in stage 37.0
> (TID 651, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 WARN TaskSetManager: Lost task 0.0 in stage 37.0 (TID
> 649, ip-10-247-0-117.ec2.internal): java.lang.NullPointerException
>         at
> $line101.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
>         at
> $line101.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>         at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555)
>         at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
>         at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
>         at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
>         at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
>
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.1 in stage 37.0
> (TID 652, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.0 in stage 37.0 (TID
> 650) on executor ip-10-247-0-117.ec2.internal:
> java.lang.NullPointerException (null) [duplicate 1]
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.1 in stage 37.0
> (TID 653, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.0 in stage 37.0 (TID
> 651) on executor ip-10-247-0-117.ec2.internal:
> java.lang.NullPointerException (null) [duplicate 2]
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.1 in stage 37.0
> (TID 654, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.1 in stage 37.0 (TID
> 652) on executor ip-10-247-0-117.ec2.internal:
> java.lang.NullPointerException (null) [duplicate 3]
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.2 in stage 37.0
> (TID 655, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.1 in stage 37.0 (TID
> 653) on executor ip-10-247-0-117.ec2.internal:
> java.lang.NullPointerException (null) [duplicate 4]
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.2 in stage 37.0
> (TID 656, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.1 in stage 37.0 (TID
> 654) on executor ip-10-247-0-117.ec2.internal:
> java.lang.NullPointerException (null) [duplicate 5]
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.2 in stage 37.0
> (TID 657, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.2 in stage 37.0 (TID
> 655) on executor ip-10-247-0-117.ec2.internal:
> java.lang.NullPointerException (null) [duplicate 6]
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.3 in stage 37.0
> (TID 658, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.2 in stage 37.0 (TID
> 657) on executor ip-10-247-0-117.ec2.internal:
> java.lang.NullPointerException (null) [duplicate 7]
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.3 in stage 37.0
> (TID 659, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.2 in stage 37.0 (TID
> 656) on executor ip-10-247-0-117.ec2.internal:
> java.lang.NullPointerException (null) [duplicate 8]
> 15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.3 in stage 37.0
> (TID 660, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
> 15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.3 in stage 37.0 (TID
> 658) on executor ip-10-247-0-117.ec2.internal:
> java.lang.NullPointerException (null) [duplicate 9]
> 15/10/08 16:12:56 ERROR TaskSetManager: Task 0 in stage 37.0 failed 4
> times; aborting job
> 15/10/08 16:12:56 INFO YarnScheduler: Cancelling stage 37
> 15/10/08 16:12:56 INFO YarnScheduler: Stage 37 was cancelled
> 15/10/08 16:12:56 INFO DAGScheduler: ResultStage 37 (count at
> <console>:40) failed in 0.128 s
> 15/10/08 16:12:56 INFO DAGScheduler: Job 18 failed: count at <console>:40,
> took 0.145419 s
> 15/10/08 16:12:56 WARN TaskSetManager: Lost task 2.3 in stage 37.0 (TID
> 659, ip-10-247-0-117.ec2.internal): TaskKilled (killed intentionally)
> 15/10/08 16:12:56 WARN TaskSetManager: Lost task 1.3 in stage 37.0 (TID
> 660, ip-10-247-0-117.ec2.internal): TaskKilled (killed intentionally)
> 15/10/08 16:12:56 INFO YarnScheduler: Removed TaskSet 37.0, whose tasks
> have all completed, from pool
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 37.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 37.0 (TID 658, ip-10-247-0-117.ec2.internal): java.lang.NullPointerException
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>         at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555)
>         at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
>         at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
>         at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
>         at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
>
> Driver stacktrace:
>         at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1280)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1268)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1267)
>         at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1267)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
>         at scala.Option.foreach(Option.scala:236)
>         at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1493)
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1455)
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1444)
>         at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>         at
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1813)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1826)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1839)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1910)
>         at org.apache.spark.rdd.RDD.count(RDD.scala:1121)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:45)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:47)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:49)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:55)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:57)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:59)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:61)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:63)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:65)
>         at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:67)
>         at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:69)
>         at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:71)
>         at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:73)
>         at $iwC$$iwC$$iwC$$iwC.<init>(<console>:75)
>         at $iwC$$iwC$$iwC.<init>(<console>:77)
>         at $iwC$$iwC.<init>(<console>:79)
>         at $iwC.<init>(<console>:81)
>         at <init>(<console>:83)
>         at .<init>(<console>:87)
>         at .<clinit>(<console>)
>         at .<init>(<console>:7)
>         at .<clinit>(<console>)
>         at $print(<console>)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>         at
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
>         at
> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>         at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>         at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>         at
> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>         at
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>         at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>         at
> org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>         at
> org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>         at org.apache.spark.repl.SparkILoop.org
> $apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>         at
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>         at
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>         at
> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>         at
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>         at org.apache.spark.repl.SparkILoop.org
> $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>         at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>         at org.apache.spark.repl.Main$.main(Main.scala:31)
>         at org.apache.spark.repl.Main.main(Main.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
>         at
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>         at
> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.NullPointerException
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
>         at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>         at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555)
>         at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
>         at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
>         at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
>         at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
>
>
> scala> 15/10/08 16:13:45 INFO ContextCleaner: Cleaned accumulator 34
> 15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on
> 10.247.0.117:33555 in memory (size: 7.0 KB, free: 535.0 MB)
> 15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on
> ip-10-247-0-117.ec2.internal:46227 in memory (size: 7.0 KB, free: 535.0 MB)
> 15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on
> ip-10-247-0-117.ec2.internal:32938 in memory (size: 7.0 KB, free: 535.0 MB)
>
>
>
> scala>
>
> Notice: This communication is for the intended recipient(s) only and may
> contain confidential, proprietary, legally protected or privileged
> information of Turbine, Inc. If you are not the intended recipient(s),
> please notify the sender at once and delete this communication.
> Unauthorized use of the information in this communication is strictly
> prohibited and may be unlawful. For those recipients under contract with
> Turbine, Inc., the information in this communication is subject to the
> terms and conditions of any applicable contracts or agreements.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>