You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Naveen Madhire <vm...@umail.iu.edu> on 2014/12/31 05:26:58 UTC

Sample Spark Program Error

Hi All,

I am trying to run a sample Spark program using Scala SBT,

Below is the program,

def main(args: Array[String]) {

      val logFile = "E:/ApacheSpark/usb/usb/spark/bin/README.md" // Should
be some file on your system
      val sc = new SparkContext("local", "Simple App",
"E:/ApacheSpark/usb/usb/spark/bin",List("target/scala-2.10/sbt2_2.10-1.0.jar"))
      val logData = sc.textFile(logFile, 2).cache()

      val numAs = logData.filter(line => line.contains("a")).count()
      val numBs = logData.filter(line => line.contains("b")).count()

      println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))

    }


Below is the error log,


14/12/30 23:20:21 INFO rdd.HadoopRDD: Input split:
file:/E:/ApacheSpark/usb/usb/spark/bin/README.md:0+673
14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(2032) called
with curMem=34047, maxMem=280248975
14/12/30 23:20:21 INFO storage.MemoryStore: Block rdd_1_0 stored as values
in memory (estimated size 2032.0 B, free 267.2 MB)
14/12/30 23:20:21 INFO storage.BlockManagerInfo: Added rdd_1_0 in memory on
zealot:61452 (size: 2032.0 B, free: 267.3 MB)
14/12/30 23:20:21 INFO storage.BlockManagerMaster: Updated info of block
rdd_1_0
14/12/30 23:20:21 INFO executor.Executor: Finished task 0.0 in stage 0.0
(TID 0). 2300 bytes result sent to driver
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 1.0 in stage
0.0 (TID 1, localhost, PROCESS_LOCAL, 1264 bytes)
14/12/30 23:20:21 INFO executor.Executor: Running task 1.0 in stage 0.0
(TID 1)
14/12/30 23:20:21 INFO spark.CacheManager: Partition rdd_1_1 not found,
computing it
14/12/30 23:20:21 INFO rdd.HadoopRDD: Input split:
file:/E:/ApacheSpark/usb/usb/spark/bin/README.md:673+673
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 0.0 in stage
0.0 (TID 0) in 3507 ms on localhost (1/2)
14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(1912) called
with curMem=36079, maxMem=280248975
14/12/30 23:20:21 INFO storage.MemoryStore: Block rdd_1_1 stored as values
in memory (estimated size 1912.0 B, free 267.2 MB)
14/12/30 23:20:21 INFO storage.BlockManagerInfo: Added rdd_1_1 in memory on
zealot:61452 (size: 1912.0 B, free: 267.3 MB)
14/12/30 23:20:21 INFO storage.BlockManagerMaster: Updated info of block
rdd_1_1
14/12/30 23:20:21 INFO executor.Executor: Finished task 1.0 in stage 0.0
(TID 1). 2300 bytes result sent to driver
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 1.0 in stage
0.0 (TID 1) in 261 ms on localhost (2/2)
14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0,
whose tasks have all completed, from pool
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Stage 0 (count at
Test1.scala:19) finished in 3.811 s
14/12/30 23:20:21 INFO spark.SparkContext: Job finished: count at
Test1.scala:19, took 3.997365232 s
14/12/30 23:20:21 INFO spark.SparkContext: Starting job: count at
Test1.scala:20
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Got job 1 (count at
Test1.scala:20) with 2 output partitions (allowLocal=false)
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Final stage: Stage 1(count
at Test1.scala:20)
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Parents of final stage:
List()
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Missing parents: List()
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Submitting Stage 1
(FilteredRDD[3] at filter at Test1.scala:20), which has no missing parents
14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(2600) called
with curMem=37991, maxMem=280248975
14/12/30 23:20:21 INFO storage.MemoryStore: Block broadcast_2 stored as
values in memory (estimated size 2.5 KB, free 267.2 MB)
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Submitting 2 missing tasks
from Stage 1 (FilteredRDD[3] at filter at Test1.scala:20)
14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0
with 2 tasks
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 0.0 in stage
1.0 (TID 2, localhost, ANY, 1264 bytes)
14/12/30 23:20:21 INFO executor.Executor: Running task 0.0 in stage 1.0
(TID 2)
14/12/30 23:20:21 INFO storage.BlockManager: Found block rdd_1_0 locally
14/12/30 23:20:21 INFO executor.Executor: Finished task 0.0 in stage 1.0
(TID 2). 1731 bytes result sent to driver
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 1.0 in stage
1.0 (TID 3, localhost, ANY, 1264 bytes)
14/12/30 23:20:21 INFO executor.Executor: Running task 1.0 in stage 1.0
(TID 3)
14/12/30 23:20:21 INFO storage.BlockManager: Found block rdd_1_1 locally
14/12/30 23:20:21 INFO executor.Executor: Finished task 1.0 in stage 1.0
(TID 3). 1731 bytes result sent to driver
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 1.0 in stage
1.0 (TID 3) in 7 ms on localhost (1/2)
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 0.0 in stage
1.0 (TID 2) in 16 ms on localhost (2/2)
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Stage 1 (count at
Test1.scala:20) finished in 0.016 s
14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0,
whose tasks have all completed, from pool
14/12/30 23:20:21 INFO spark.SparkContext: Job finished: count at
Test1.scala:20, took 0.041709824 s
Lines with a: 24, Lines with b: 15
14/12/30 23:20:21 ERROR util.Utils: Uncaught exception in thread
SparkListenerBus
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1301)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
at
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:48)
at
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
at
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
[success] Total time: 12 s, completed Dec 30, 2014 11:20:21 PM
at
org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)
14/12/30 23:20:21 ERROR spark.ContextCleaner: Error in cleaning thread
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
at
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:136)
at
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134)
at
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at org.apache.spark.ContextCleaner.org
$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:133)
at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65)
14/12/30 23:20:21 INFO network.ConnectionManager: Selector thread was
interrupted!
14/12/30 23:20:21 INFO storage.BlockManager: Removing broadcast 2
14/12/30 23:20:21 INFO storage.BlockManager: Removing block broadcast_2
14/12/30 23:20:21 INFO storage.MemoryStore: Block broadcast_2 of size 2600
dropped from memory (free 280210984)
14/12/30 23:20:21 INFO spark.ContextCleaner: Cleaned broadcast 2




Please let me know any pointers to debug the error.


Thanks a lot.

Re: Sample Spark Program Error

Posted by Nicholas Chammas <ni...@gmail.com>.
You sent this to the dev list. Please send it instead to the user list.

We use the dev list to discuss development on Spark itself, new features,
fixes to known bugs, and so forth.

The user list is to discuss issues using Spark, which I believe is what you
are looking for.

Nick


On Tue Dec 30 2014 at 11:27:52 PM Naveen Madhire <vm...@umail.iu.edu>
wrote:

> Hi All,
>
> I am trying to run a sample Spark program using Scala SBT,
>
> Below is the program,
>
> def main(args: Array[String]) {
>
>       val logFile = "E:/ApacheSpark/usb/usb/spark/bin/README.md" // Should
> be some file on your system
>       val sc = new SparkContext("local", "Simple App",
> "E:/ApacheSpark/usb/usb/spark/bin",List("target/scala-2.10/s
> bt2_2.10-1.0.jar"))
>       val logData = sc.textFile(logFile, 2).cache()
>
>       val numAs = logData.filter(line => line.contains("a")).count()
>       val numBs = logData.filter(line => line.contains("b")).count()
>
>       println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
>
>     }
>
>
> Below is the error log,
>
>
> 14/12/30 23:20:21 INFO rdd.HadoopRDD: Input split:
> file:/E:/ApacheSpark/usb/usb/spark/bin/README.md:0+673
> 14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(2032) called
> with curMem=34047, maxMem=280248975
> 14/12/30 23:20:21 INFO storage.MemoryStore: Block rdd_1_0 stored as values
> in memory (estimated size 2032.0 B, free 267.2 MB)
> 14/12/30 23:20:21 INFO storage.BlockManagerInfo: Added rdd_1_0 in memory on
> zealot:61452 (size: 2032.0 B, free: 267.3 MB)
> 14/12/30 23:20:21 INFO storage.BlockManagerMaster: Updated info of block
> rdd_1_0
> 14/12/30 23:20:21 INFO executor.Executor: Finished task 0.0 in stage 0.0
> (TID 0). 2300 bytes result sent to driver
> 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 1.0 in stage
> 0.0 (TID 1, localhost, PROCESS_LOCAL, 1264 bytes)
> 14/12/30 23:20:21 INFO executor.Executor: Running task 1.0 in stage 0.0
> (TID 1)
> 14/12/30 23:20:21 INFO spark.CacheManager: Partition rdd_1_1 not found,
> computing it
> 14/12/30 23:20:21 INFO rdd.HadoopRDD: Input split:
> file:/E:/ApacheSpark/usb/usb/spark/bin/README.md:673+673
> 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 0.0 in stage
> 0.0 (TID 0) in 3507 ms on localhost (1/2)
> 14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(1912) called
> with curMem=36079, maxMem=280248975
> 14/12/30 23:20:21 INFO storage.MemoryStore: Block rdd_1_1 stored as values
> in memory (estimated size 1912.0 B, free 267.2 MB)
> 14/12/30 23:20:21 INFO storage.BlockManagerInfo: Added rdd_1_1 in memory on
> zealot:61452 (size: 1912.0 B, free: 267.3 MB)
> 14/12/30 23:20:21 INFO storage.BlockManagerMaster: Updated info of block
> rdd_1_1
> 14/12/30 23:20:21 INFO executor.Executor: Finished task 1.0 in stage 0.0
> (TID 1). 2300 bytes result sent to driver
> 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 1.0 in stage
> 0.0 (TID 1) in 261 ms on localhost (2/2)
> 14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0,
> whose tasks have all completed, from pool
> 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Stage 0 (count at
> Test1.scala:19) finished in 3.811 s
> 14/12/30 23:20:21 INFO spark.SparkContext: Job finished: count at
> Test1.scala:19, took 3.997365232 s
> 14/12/30 23:20:21 INFO spark.SparkContext: Starting job: count at
> Test1.scala:20
> 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Got job 1 (count at
> Test1.scala:20) with 2 output partitions (allowLocal=false)
> 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Final stage: Stage 1(count
> at Test1.scala:20)
> 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Parents of final stage:
> List()
> 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Missing parents: List()
> 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Submitting Stage 1
> (FilteredRDD[3] at filter at Test1.scala:20), which has no missing parents
> 14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(2600) called
> with curMem=37991, maxMem=280248975
> 14/12/30 23:20:21 INFO storage.MemoryStore: Block broadcast_2 stored as
> values in memory (estimated size 2.5 KB, free 267.2 MB)
> 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Submitting 2 missing tasks
> from Stage 1 (FilteredRDD[3] at filter at Test1.scala:20)
> 14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0
> with 2 tasks
> 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 0.0 in stage
> 1.0 (TID 2, localhost, ANY, 1264 bytes)
> 14/12/30 23:20:21 INFO executor.Executor: Running task 0.0 in stage 1.0
> (TID 2)
> 14/12/30 23:20:21 INFO storage.BlockManager: Found block rdd_1_0 locally
> 14/12/30 23:20:21 INFO executor.Executor: Finished task 0.0 in stage 1.0
> (TID 2). 1731 bytes result sent to driver
> 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 1.0 in stage
> 1.0 (TID 3, localhost, ANY, 1264 bytes)
> 14/12/30 23:20:21 INFO executor.Executor: Running task 1.0 in stage 1.0
> (TID 3)
> 14/12/30 23:20:21 INFO storage.BlockManager: Found block rdd_1_1 locally
> 14/12/30 23:20:21 INFO executor.Executor: Finished task 1.0 in stage 1.0
> (TID 3). 1731 bytes result sent to driver
> 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 1.0 in stage
> 1.0 (TID 3) in 7 ms on localhost (1/2)
> 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 0.0 in stage
> 1.0 (TID 2) in 16 ms on localhost (2/2)
> 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Stage 1 (count at
> Test1.scala:20) finished in 0.016 s
> 14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0,
> whose tasks have all completed, from pool
> 14/12/30 23:20:21 INFO spark.SparkContext: Job finished: count at
> Test1.scala:20, took 0.041709824 s
> Lines with a: 24, Lines with b: 15
> 14/12/30 23:20:21 ERROR util.Utils: Uncaught exception in thread
> SparkListenerBus
> java.lang.InterruptedException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquir
> eSharedInterruptibly(AbstractQueuedSynchronizer.java:1301)
> at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
> at
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$
> run$1.apply$mcV$sp(LiveListenerBus.scala:48)
> at
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(
> LiveListenerBus.scala:47)
> at
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(
> LiveListenerBus.scala:47)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> [success] Total time: 12 s, completed Dec 30, 2014 11:20:21 PM
> at
> org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveL
> istenerBus.scala:46)
> 14/12/30 23:20:21 ERROR spark.ContextCleaner: Error in cleaning thread
> java.lang.InterruptedException
> at java.lang.Object.wait(Native Method)
> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
> at
> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$
> keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:136)
> at
> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$
> keepCleaning$1.apply(ContextCleaner.scala:134)
> at
> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$
> keepCleaning$1.apply(ContextCleaner.scala:134)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> at org.apache.spark.ContextCleaner.org
> $apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:133)
> at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65)
> 14/12/30 23:20:21 INFO network.ConnectionManager: Selector thread was
> interrupted!
> 14/12/30 23:20:21 INFO storage.BlockManager: Removing broadcast 2
> 14/12/30 23:20:21 INFO storage.BlockManager: Removing block broadcast_2
> 14/12/30 23:20:21 INFO storage.MemoryStore: Block broadcast_2 of size 2600
> dropped from memory (free 280210984)
> 14/12/30 23:20:21 INFO spark.ContextCleaner: Cleaned broadcast 2
>
>
>
>
> Please let me know any pointers to debug the error.
>
>
> Thanks a lot.
>

RE: Fwd: Sample Spark Program Error

Posted by Kapil Malik <km...@adobe.com>.
Hi Naveen,

Quoting http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkContext
SparkContext is Main entry point for Spark functionality. A SparkContext represents the connection to a Spark cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster.

Only one SparkContext may be active per JVM. You must stop() the active SparkContext before creating a new one

So stop ( ) shuts down the connection between Driver program and Spark master, and does some cleanup. Indeed, after calling this, you cannot do any operation on it or on any RDD created via this context.

Regards,

Kapil

From: Naveen Madhire [mailto:vmadhire@umail.iu.edu]
Sent: 31 December 2014 22:08
To: RK
Cc: user@spark.apache.org
Subject: Re: Fwd: Sample Spark Program Error

Yes. The exception is gone now after adding stop() at the end.
Can you please tell me what this stop() does at the end. Does it disable the spark context.

On Wed, Dec 31, 2014 at 10:09 AM, RK <pr...@yahoo.com>> wrote:
If you look at your program output closely, you can see the following output.
Lines with a: 24, Lines with b: 15

The exception seems to be happening with Spark cleanup after executing your code. Try adding sc.stop() at the end of your program to see if the exception goes away.



On Wednesday, December 31, 2014 6:40 AM, Naveen Madhire <vm...@umail.iu.edu>> wrote:


Hi All,

I am trying to run a sample Spark program using Scala SBT,

Below is the program,

def main(args: Array[String]) {

      val logFile = "E:/ApacheSpark/usb/usb/spark/bin/README.md" // Should be some file on your system
      val sc = new SparkContext("local", "Simple App", "E:/ApacheSpark/usb/usb/spark/bin",List("target/scala-2.10/sbt2_2.10-1.0.jar"))
      val logData = sc.textFile(logFile, 2).cache()

      val numAs = logData.filter(line => line.contains("a")).count()
      val numBs = logData.filter(line => line.contains("b")).count()

      println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))

    }


Below is the error log,


14/12/30 23:20:21 INFO rdd.HadoopRDD: Input split: file:/E:/ApacheSpark/usb/usb/spark/bin/README.md:0+673
14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(2032) called with curMem=34047, maxMem=280248975
14/12/30 23:20:21 INFO storage.MemoryStore: Block rdd_1_0 stored as values in memory (estimated size 2032.0 B, free 267.2 MB)
14/12/30 23:20:21 INFO storage.BlockManagerInfo: Added rdd_1_0 in memory on zealot:61452 (size: 2032.0 B, free: 267.3 MB)
14/12/30 23:20:21 INFO storage.BlockManagerMaster: Updated info of block rdd_1_0
14/12/30 23:20:21 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 2300 bytes result sent to driver
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1264 bytes)
14/12/30 23:20:21 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
14/12/30 23:20:21 INFO spark.CacheManager: Partition rdd_1_1 not found, computing it
14/12/30 23:20:21 INFO rdd.HadoopRDD: Input split: file:/E:/ApacheSpark/usb/usb/spark/bin/README.md:673+673
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3507 ms on localhost (1/2)
14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(1912) called with curMem=36079, maxMem=280248975
14/12/30 23:20:21 INFO storage.MemoryStore: Block rdd_1_1 stored as values in memory (estimated size 1912.0 B, free 267.2 MB)
14/12/30 23:20:21 INFO storage.BlockManagerInfo: Added rdd_1_1 in memory on zealot:61452 (size: 1912.0 B, free: 267.3 MB)
14/12/30 23:20:21 INFO storage.BlockManagerMaster: Updated info of block rdd_1_1
14/12/30 23:20:21 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 2300 bytes result sent to driver
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 261 ms on localhost (2/2)
14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Stage 0 (count at Test1.scala:19) finished in 3.811 s
14/12/30 23:20:21 INFO spark.SparkContext: Job finished: count at Test1.scala:19, took 3.997365232 s
14/12/30 23:20:21 INFO spark.SparkContext: Starting job: count at Test1.scala:20
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Got job 1 (count at Test1.scala:20) with 2 output partitions (allowLocal=false)
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Final stage: Stage 1(count at Test1.scala:20)
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Parents of final stage: List()
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Missing parents: List()
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Submitting Stage 1 (FilteredRDD[3] at filter at Test1.scala:20), which has no missing parents
14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(2600) called with curMem=37991, maxMem=280248975
14/12/30 23:20:21 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.5 KB, free 267.2 MB)
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 1 (FilteredRDD[3] at filter at Test1.scala:20)
14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, ANY, 1264 bytes)
14/12/30 23:20:21 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 2)
14/12/30 23:20:21 INFO storage.BlockManager: Found block rdd_1_0 locally
14/12/30 23:20:21 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 2). 1731 bytes result sent to driver
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, ANY, 1264 bytes)
14/12/30 23:20:21 INFO executor.Executor: Running task 1.0 in stage 1.0 (TID 3)
14/12/30 23:20:21 INFO storage.BlockManager: Found block rdd_1_1 locally
14/12/30 23:20:21 INFO executor.Executor: Finished task 1.0 in stage 1.0 (TID 3). 1731 bytes result sent to driver
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 7 ms on localhost (1/2)
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 16 ms on localhost (2/2)
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Stage 1 (count at Test1.scala:20) finished in 0.016 s
14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
14/12/30 23:20:21 INFO spark.SparkContext: Job finished: count at Test1.scala:20, took 0.041709824 s
Lines with a: 24, Lines with b: 15
14/12/30 23:20:21 ERROR util.Utils: Uncaught exception in thread SparkListenerBus
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1301)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:48)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
[success] Total time: 12 s, completed Dec 30, 2014 11:20:21 PM
at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)
14/12/30 23:20:21 ERROR spark.ContextCleaner: Error in cleaning thread
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:136)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at org.apache.spark.ContextCleaner.org<http://org.apache.spark.contextcleaner.org/>$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:133)
at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65)
14/12/30 23:20:21 INFO network.ConnectionManager: Selector thread was interrupted!
14/12/30 23:20:21 INFO storage.BlockManager: Removing broadcast 2
14/12/30 23:20:21 INFO storage.BlockManager: Removing block broadcast_2
14/12/30 23:20:21 INFO storage.MemoryStore: Block broadcast_2 of size 2600 dropped from memory (free 280210984)
14/12/30 23:20:21 INFO spark.ContextCleaner: Cleaned broadcast 2




Please let me know any pointers to debug the error.


Thanks a lot.




Re: Fwd: Sample Spark Program Error

Posted by Naveen Madhire <vm...@umail.iu.edu>.
Yes. The exception is gone now after adding stop() at the end.
Can you please tell me what this stop() does at the end. Does it disable
the spark context.

On Wed, Dec 31, 2014 at 10:09 AM, RK <pr...@yahoo.com> wrote:

> If you look at your program output closely, you can see the following
> output.
> Lines with a: 24, Lines with b: 15
>
> The exception seems to be happening with Spark cleanup after executing
> your code. Try adding sc.stop() at the end of your program to see if the
> exception goes away.
>
>
>
>
>   On Wednesday, December 31, 2014 6:40 AM, Naveen Madhire <
> vmadhire@umail.iu.edu> wrote:
>
>
>
>
> Hi All,
>
> I am trying to run a sample Spark program using Scala SBT,
>
> Below is the program,
>
> def main(args: Array[String]) {
>
>       val logFile = "E:/ApacheSpark/usb/usb/spark/bin/README.md" // Should
> be some file on your system
>       val sc = new SparkContext("local", "Simple App",
> "E:/ApacheSpark/usb/usb/spark/bin",List("target/scala-2.10/sbt2_2.10-1.0.jar"))
>       val logData = sc.textFile(logFile, 2).cache()
>
>       val numAs = logData.filter(line => line.contains("a")).count()
>       val numBs = logData.filter(line => line.contains("b")).count()
>
>       println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
>
>     }
>
>
> Below is the error log,
>
>
> 14/12/30 23:20:21 INFO rdd.HadoopRDD: Input split:
> file:/E:/ApacheSpark/usb/usb/spark/bin/README.md:0+673
> 14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(2032) called
> with curMem=34047, maxMem=280248975
> 14/12/30 23:20:21 INFO storage.MemoryStore: Block rdd_1_0 stored as values
> in memory (estimated size 2032.0 B, free 267.2 MB)
> 14/12/30 23:20:21 INFO storage.BlockManagerInfo: Added rdd_1_0 in memory
> on zealot:61452 (size: 2032.0 B, free: 267.3 MB)
> 14/12/30 23:20:21 INFO storage.BlockManagerMaster: Updated info of block
> rdd_1_0
> 14/12/30 23:20:21 INFO executor.Executor: Finished task 0.0 in stage 0.0
> (TID 0). 2300 bytes result sent to driver
> 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 1.0 in
> stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1264 bytes)
> 14/12/30 23:20:21 INFO executor.Executor: Running task 1.0 in stage 0.0
> (TID 1)
> 14/12/30 23:20:21 INFO spark.CacheManager: Partition rdd_1_1 not found,
> computing it
> 14/12/30 23:20:21 INFO rdd.HadoopRDD: Input split:
> file:/E:/ApacheSpark/usb/usb/spark/bin/README.md:673+673
> 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 0.0 in
> stage 0.0 (TID 0) in 3507 ms on localhost (1/2)
> 14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(1912) called
> with curMem=36079, maxMem=280248975
> 14/12/30 23:20:21 INFO storage.MemoryStore: Block rdd_1_1 stored as values
> in memory (estimated size 1912.0 B, free 267.2 MB)
> 14/12/30 23:20:21 INFO storage.BlockManagerInfo: Added rdd_1_1 in memory
> on zealot:61452 (size: 1912.0 B, free: 267.3 MB)
> 14/12/30 23:20:21 INFO storage.BlockManagerMaster: Updated info of block
> rdd_1_1
> 14/12/30 23:20:21 INFO executor.Executor: Finished task 1.0 in stage 0.0
> (TID 1). 2300 bytes result sent to driver
> 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 1.0 in
> stage 0.0 (TID 1) in 261 ms on localhost (2/2)
> 14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0,
> whose tasks have all completed, from pool
> 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Stage 0 (count at
> Test1.scala:19) finished in 3.811 s
> 14/12/30 23:20:21 INFO spark.SparkContext: Job finished: count at
> Test1.scala:19, took 3.997365232 s
> 14/12/30 23:20:21 INFO spark.SparkContext: Starting job: count at
> Test1.scala:20
> 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Got job 1 (count at
> Test1.scala:20) with 2 output partitions (allowLocal=false)
> 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Final stage: Stage 1(count
> at Test1.scala:20)
> 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Parents of final stage:
> List()
> 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Missing parents: List()
> 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Submitting Stage 1
> (FilteredRDD[3] at filter at Test1.scala:20), which has no missing parents
> 14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(2600) called
> with curMem=37991, maxMem=280248975
> 14/12/30 23:20:21 INFO storage.MemoryStore: Block broadcast_2 stored as
> values in memory (estimated size 2.5 KB, free 267.2 MB)
> 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Submitting 2 missing tasks
> from Stage 1 (FilteredRDD[3] at filter at Test1.scala:20)
> 14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0
> with 2 tasks
> 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 0.0 in
> stage 1.0 (TID 2, localhost, ANY, 1264 bytes)
> 14/12/30 23:20:21 INFO executor.Executor: Running task 0.0 in stage 1.0
> (TID 2)
> 14/12/30 23:20:21 INFO storage.BlockManager: Found block rdd_1_0 locally
> 14/12/30 23:20:21 INFO executor.Executor: Finished task 0.0 in stage 1.0
> (TID 2). 1731 bytes result sent to driver
> 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 1.0 in
> stage 1.0 (TID 3, localhost, ANY, 1264 bytes)
> 14/12/30 23:20:21 INFO executor.Executor: Running task 1.0 in stage 1.0
> (TID 3)
> 14/12/30 23:20:21 INFO storage.BlockManager: Found block rdd_1_1 locally
> 14/12/30 23:20:21 INFO executor.Executor: Finished task 1.0 in stage 1.0
> (TID 3). 1731 bytes result sent to driver
> 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 1.0 in
> stage 1.0 (TID 3) in 7 ms on localhost (1/2)
> 14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 0.0 in
> stage 1.0 (TID 2) in 16 ms on localhost (2/2)
> 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Stage 1 (count at
> Test1.scala:20) finished in 0.016 s
> 14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0,
> whose tasks have all completed, from pool
> 14/12/30 23:20:21 INFO spark.SparkContext: Job finished: count at
> Test1.scala:20, took 0.041709824 s
> Lines with a: 24, Lines with b: 15
> 14/12/30 23:20:21 ERROR util.Utils: Uncaught exception in thread
> SparkListenerBus
> java.lang.InterruptedException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1301)
> at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
> at
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:48)
> at
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
> at
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> [success] Total time: 12 s, completed Dec 30, 2014 11:20:21 PM
> at
> org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)
> 14/12/30 23:20:21 ERROR spark.ContextCleaner: Error in cleaning thread
> java.lang.InterruptedException
> at java.lang.Object.wait(Native Method)
> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
> at
> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:136)
> at
> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134)
> at
> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> at org.apache.spark.ContextCleaner.org
> <http://org.apache.spark.contextcleaner.org/>
> $apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:133)
> at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65)
> 14/12/30 23:20:21 INFO network.ConnectionManager: Selector thread was
> interrupted!
> 14/12/30 23:20:21 INFO storage.BlockManager: Removing broadcast 2
> 14/12/30 23:20:21 INFO storage.BlockManager: Removing block broadcast_2
> 14/12/30 23:20:21 INFO storage.MemoryStore: Block broadcast_2 of size 2600
> dropped from memory (free 280210984)
> 14/12/30 23:20:21 INFO spark.ContextCleaner: Cleaned broadcast 2
>
>
>
>
> Please let me know any pointers to debug the error.
>
>
> Thanks a lot.
>
>
>
>

Re: Fwd: Sample Spark Program Error

Posted by RK <pr...@yahoo.com.INVALID>.
If you look at your program output closely, you can see the following output. Lines with a: 24, Lines with b: 15

The exception seems to be happening with Spark cleanup after executing your code. Try adding sc.stop() at the end of your program to see if the exception goes away.

 

     On Wednesday, December 31, 2014 6:40 AM, Naveen Madhire <vm...@umail.iu.edu> wrote:
   

 

Hi All,
I am trying to run a sample Spark program using Scala SBT,
Below is the program,
def main(args: Array[String]) {
      val logFile = "E:/ApacheSpark/usb/usb/spark/bin/README.md" // Should be some file on your system      val sc = new SparkContext("local", "Simple App", "E:/ApacheSpark/usb/usb/spark/bin",List("target/scala-2.10/sbt2_2.10-1.0.jar"))      val logData = sc.textFile(logFile, 2).cache()
      val numAs = logData.filter(line => line.contains("a")).count()      val numBs = logData.filter(line => line.contains("b")).count()
      println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
    }

Below is the error log,

14/12/30 23:20:21 INFO rdd.HadoopRDD: Input split: file:/E:/ApacheSpark/usb/usb/spark/bin/README.md:0+67314/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(2032) called with curMem=34047, maxMem=28024897514/12/30 23:20:21 INFO storage.MemoryStore: Block rdd_1_0 stored as values in memory (estimated size 2032.0 B, free 267.2 MB)14/12/30 23:20:21 INFO storage.BlockManagerInfo: Added rdd_1_0 in memory on zealot:61452 (size: 2032.0 B, free: 267.3 MB)14/12/30 23:20:21 INFO storage.BlockManagerMaster: Updated info of block rdd_1_014/12/30 23:20:21 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 2300 bytes result sent to driver14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1264 bytes)14/12/30 23:20:21 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)14/12/30 23:20:21 INFO spark.CacheManager: Partition rdd_1_1 not found, computing it14/12/30 23:20:21 INFO rdd.HadoopRDD: Input split: file:/E:/ApacheSpark/usb/usb/spark/bin/README.md:673+67314/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3507 ms on localhost (1/2)14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(1912) called with curMem=36079, maxMem=28024897514/12/30 23:20:21 INFO storage.MemoryStore: Block rdd_1_1 stored as values in memory (estimated size 1912.0 B, free 267.2 MB)14/12/30 23:20:21 INFO storage.BlockManagerInfo: Added rdd_1_1 in memory on zealot:61452 (size: 1912.0 B, free: 267.3 MB)14/12/30 23:20:21 INFO storage.BlockManagerMaster: Updated info of block rdd_1_114/12/30 23:20:21 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 2300 bytes result sent to driver14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 261 ms on localhost (2/2)14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 14/12/30 23:20:21 INFO scheduler.DAGScheduler: Stage 0 (count at Test1.scala:19) finished in 3.811 s14/12/30 23:20:21 INFO spark.SparkContext: Job finished: count at Test1.scala:19, took 3.997365232 s14/12/30 23:20:21 INFO spark.SparkContext: Starting job: count at Test1.scala:2014/12/30 23:20:21 INFO scheduler.DAGScheduler: Got job 1 (count at Test1.scala:20) with 2 output partitions (allowLocal=false)14/12/30 23:20:21 INFO scheduler.DAGScheduler: Final stage: Stage 1(count at Test1.scala:20)14/12/30 23:20:21 INFO scheduler.DAGScheduler: Parents of final stage: List()14/12/30 23:20:21 INFO scheduler.DAGScheduler: Missing parents: List()14/12/30 23:20:21 INFO scheduler.DAGScheduler: Submitting Stage 1 (FilteredRDD[3] at filter at Test1.scala:20), which has no missing parents14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(2600) called with curMem=37991, maxMem=28024897514/12/30 23:20:21 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.5 KB, free 267.2 MB)14/12/30 23:20:21 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 1 (FilteredRDD[3] at filter at Test1.scala:20)14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, ANY, 1264 bytes)14/12/30 23:20:21 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 2)14/12/30 23:20:21 INFO storage.BlockManager: Found block rdd_1_0 locally14/12/30 23:20:21 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 2). 1731 bytes result sent to driver14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, ANY, 1264 bytes)14/12/30 23:20:21 INFO executor.Executor: Running task 1.0 in stage 1.0 (TID 3)14/12/30 23:20:21 INFO storage.BlockManager: Found block rdd_1_1 locally14/12/30 23:20:21 INFO executor.Executor: Finished task 1.0 in stage 1.0 (TID 3). 1731 bytes result sent to driver14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 7 ms on localhost (1/2)14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 16 ms on localhost (2/2)14/12/30 23:20:21 INFO scheduler.DAGScheduler: Stage 1 (count at Test1.scala:20) finished in 0.016 s14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 14/12/30 23:20:21 INFO spark.SparkContext: Job finished: count at Test1.scala:20, took 0.041709824 sLines with a: 24, Lines with b: 1514/12/30 23:20:21 ERROR util.Utils: Uncaught exception in thread SparkListenerBusjava.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1301) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:48) at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47) at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)[success] Total time: 12 s, completed Dec 30, 2014 11:20:21 PM at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)14/12/30 23:20:21 ERROR spark.ContextCleaner: Error in cleaning threadjava.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:136) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:133) at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65)14/12/30 23:20:21 INFO network.ConnectionManager: Selector thread was interrupted!14/12/30 23:20:21 INFO storage.BlockManager: Removing broadcast 214/12/30 23:20:21 INFO storage.BlockManager: Removing block broadcast_214/12/30 23:20:21 INFO storage.MemoryStore: Block broadcast_2 of size 2600 dropped from memory (free 280210984)14/12/30 23:20:21 INFO spark.ContextCleaner: Cleaned broadcast 2



Please let me know any pointers to debug the error.

Thanks a lot.


   

Fwd: Sample Spark Program Error

Posted by Naveen Madhire <vm...@umail.iu.edu>.
Hi All,

I am trying to run a sample Spark program using Scala SBT,

Below is the program,

def main(args: Array[String]) {

      val logFile = "E:/ApacheSpark/usb/usb/spark/bin/README.md" // Should
be some file on your system
      val sc = new SparkContext("local", "Simple App",
"E:/ApacheSpark/usb/usb/spark/bin",List("target/scala-2.10/sbt2_2.10-1.0.jar"))
      val logData = sc.textFile(logFile, 2).cache()

      val numAs = logData.filter(line => line.contains("a")).count()
      val numBs = logData.filter(line => line.contains("b")).count()

      println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))

    }


Below is the error log,


14/12/30 23:20:21 INFO rdd.HadoopRDD: Input split:
file:/E:/ApacheSpark/usb/usb/spark/bin/README.md:0+673
14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(2032) called
with curMem=34047, maxMem=280248975
14/12/30 23:20:21 INFO storage.MemoryStore: Block rdd_1_0 stored as values
in memory (estimated size 2032.0 B, free 267.2 MB)
14/12/30 23:20:21 INFO storage.BlockManagerInfo: Added rdd_1_0 in memory on
zealot:61452 (size: 2032.0 B, free: 267.3 MB)
14/12/30 23:20:21 INFO storage.BlockManagerMaster: Updated info of block
rdd_1_0
14/12/30 23:20:21 INFO executor.Executor: Finished task 0.0 in stage 0.0
(TID 0). 2300 bytes result sent to driver
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 1.0 in stage
0.0 (TID 1, localhost, PROCESS_LOCAL, 1264 bytes)
14/12/30 23:20:21 INFO executor.Executor: Running task 1.0 in stage 0.0
(TID 1)
14/12/30 23:20:21 INFO spark.CacheManager: Partition rdd_1_1 not found,
computing it
14/12/30 23:20:21 INFO rdd.HadoopRDD: Input split:
file:/E:/ApacheSpark/usb/usb/spark/bin/README.md:673+673
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 0.0 in stage
0.0 (TID 0) in 3507 ms on localhost (1/2)
14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(1912) called
with curMem=36079, maxMem=280248975
14/12/30 23:20:21 INFO storage.MemoryStore: Block rdd_1_1 stored as values
in memory (estimated size 1912.0 B, free 267.2 MB)
14/12/30 23:20:21 INFO storage.BlockManagerInfo: Added rdd_1_1 in memory on
zealot:61452 (size: 1912.0 B, free: 267.3 MB)
14/12/30 23:20:21 INFO storage.BlockManagerMaster: Updated info of block
rdd_1_1
14/12/30 23:20:21 INFO executor.Executor: Finished task 1.0 in stage 0.0
(TID 1). 2300 bytes result sent to driver
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 1.0 in stage
0.0 (TID 1) in 261 ms on localhost (2/2)
14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0,
whose tasks have all completed, from pool
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Stage 0 (count at
Test1.scala:19) finished in 3.811 s
14/12/30 23:20:21 INFO spark.SparkContext: Job finished: count at
Test1.scala:19, took 3.997365232 s
14/12/30 23:20:21 INFO spark.SparkContext: Starting job: count at
Test1.scala:20
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Got job 1 (count at
Test1.scala:20) with 2 output partitions (allowLocal=false)
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Final stage: Stage 1(count
at Test1.scala:20)
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Parents of final stage:
List()
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Missing parents: List()
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Submitting Stage 1
(FilteredRDD[3] at filter at Test1.scala:20), which has no missing parents
14/12/30 23:20:21 INFO storage.MemoryStore: ensureFreeSpace(2600) called
with curMem=37991, maxMem=280248975
14/12/30 23:20:21 INFO storage.MemoryStore: Block broadcast_2 stored as
values in memory (estimated size 2.5 KB, free 267.2 MB)
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Submitting 2 missing tasks
from Stage 1 (FilteredRDD[3] at filter at Test1.scala:20)
14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0
with 2 tasks
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 0.0 in stage
1.0 (TID 2, localhost, ANY, 1264 bytes)
14/12/30 23:20:21 INFO executor.Executor: Running task 0.0 in stage 1.0
(TID 2)
14/12/30 23:20:21 INFO storage.BlockManager: Found block rdd_1_0 locally
14/12/30 23:20:21 INFO executor.Executor: Finished task 0.0 in stage 1.0
(TID 2). 1731 bytes result sent to driver
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task 1.0 in stage
1.0 (TID 3, localhost, ANY, 1264 bytes)
14/12/30 23:20:21 INFO executor.Executor: Running task 1.0 in stage 1.0
(TID 3)
14/12/30 23:20:21 INFO storage.BlockManager: Found block rdd_1_1 locally
14/12/30 23:20:21 INFO executor.Executor: Finished task 1.0 in stage 1.0
(TID 3). 1731 bytes result sent to driver
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 1.0 in stage
1.0 (TID 3) in 7 ms on localhost (1/2)
14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished task 0.0 in stage
1.0 (TID 2) in 16 ms on localhost (2/2)
14/12/30 23:20:21 INFO scheduler.DAGScheduler: Stage 1 (count at
Test1.scala:20) finished in 0.016 s
14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0,
whose tasks have all completed, from pool
14/12/30 23:20:21 INFO spark.SparkContext: Job finished: count at
Test1.scala:20, took 0.041709824 s
Lines with a: 24, Lines with b: 15
14/12/30 23:20:21 ERROR util.Utils: Uncaught exception in thread
SparkListenerBus
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1301)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
at
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:48)
at
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
at
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
[success] Total time: 12 s, completed Dec 30, 2014 11:20:21 PM
at
org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)
14/12/30 23:20:21 ERROR spark.ContextCleaner: Error in cleaning thread
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
at
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:136)
at
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134)
at
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at org.apache.spark.ContextCleaner.org
$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:133)
at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65)
14/12/30 23:20:21 INFO network.ConnectionManager: Selector thread was
interrupted!
14/12/30 23:20:21 INFO storage.BlockManager: Removing broadcast 2
14/12/30 23:20:21 INFO storage.BlockManager: Removing block broadcast_2
14/12/30 23:20:21 INFO storage.MemoryStore: Block broadcast_2 of size 2600
dropped from memory (free 280210984)
14/12/30 23:20:21 INFO spark.ContextCleaner: Cleaned broadcast 2




Please let me know any pointers to debug the error.


Thanks a lot.