You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by fenghaixiong <98...@qq.com> on 2015/03/06 08:39:15 UTC

spark-stream programme failed on yarn-client

Hi all,                                                                                                                                                                                                                                        I'm try to write a spark stream programme so i read the spark online document ,according the document i write the programe like this :

import org.apache.spark.SparkConf
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._

object SparkStreamTest {
  def main(args: Array[String]) {
    val conf = new SparkConf()
    val ssc = new StreamingContext(conf, Seconds(1))
    val lines = ssc.socketTextStream(args(0), args(1).toInt)
    val words = lines.flatMap(_.split(" ")) 
    val pairs = words.map(word => (word, 1)) 
    val wordCounts = pairs.reduceByKey(_ + _)
    wordCounts.print() 
    ssc.start()             // Start the computation
    ssc.awaitTermination()  // Wait for the computation to terminate
  }

}



for test i first start listen a port by this:
 nc -lk 9999 

and then i submit job by
spark-submit  --master local[2] --class com.nd.hxf.SparkStreamTest spark-test-tream-1.0-SNAPSHOT-job.jar  localhost 9999

everything is okay


but when i run it on yarn by this :
spark-submit  --master yarn-client --class com.nd.hxf.SparkStreamTest spark-test-tream-1.0-SNAPSHOT-job.jar  localhost 9999

it wait for a longtime and repeat output somemessage a apart of the output is like this:          







15/03/06 15:30:24 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
15/03/06 15:30:24 INFO ReceiverTracker: ReceiverTracker started
15/03/06 15:30:24 INFO ForEachDStream: metadataCleanupDelay = -1
15/03/06 15:30:24 INFO ShuffledDStream: metadataCleanupDelay = -1
15/03/06 15:30:24 INFO MappedDStream: metadataCleanupDelay = -1
15/03/06 15:30:24 INFO FlatMappedDStream: metadataCleanupDelay = -1
15/03/06 15:30:24 INFO SocketInputDStream: metadataCleanupDelay = -1
15/03/06 15:30:24 INFO SocketInputDStream: Slide time = 1000 ms
15/03/06 15:30:24 INFO SocketInputDStream: Storage level = StorageLevel(false, false, false, false, 1)
15/03/06 15:30:24 INFO SocketInputDStream: Checkpoint interval = null
15/03/06 15:30:24 INFO SocketInputDStream: Remember duration = 1000 ms
15/03/06 15:30:24 INFO SocketInputDStream: Initialized and validated org.apache.spark.streaming.dstream.SocketInputDStream@b01c5f8
15/03/06 15:30:24 INFO FlatMappedDStream: Slide time = 1000 ms
15/03/06 15:30:24 INFO FlatMappedDStream: Storage level = StorageLevel(false, false, false, false, 1)
15/03/06 15:30:24 INFO FlatMappedDStream: Checkpoint interval = null
15/03/06 15:30:24 INFO FlatMappedDStream: Remember duration = 1000 ms
15/03/06 15:30:24 INFO FlatMappedDStream: Initialized and validated org.apache.spark.streaming.dstream.FlatMappedDStream@6bd47453
15/03/06 15:30:24 INFO MappedDStream: Slide time = 1000 ms
15/03/06 15:30:24 INFO MappedDStream: Storage level = StorageLevel(false, false, false, false, 1)
15/03/06 15:30:24 INFO MappedDStream: Checkpoint interval = null
15/03/06 15:30:24 INFO MappedDStream: Remember duration = 1000 ms
15/03/06 15:30:24 INFO MappedDStream: Initialized and validated org.apache.spark.streaming.dstream.MappedDStream@941451f
15/03/06 15:30:24 INFO ShuffledDStream: Slide time = 1000 ms
15/03/06 15:30:24 INFO ShuffledDStream: Storage level = StorageLevel(false, false, false, false, 1)
15/03/06 15:30:24 INFO ShuffledDStream: Checkpoint interval = null
15/03/06 15:30:24 INFO ShuffledDStream: Remember duration = 1000 ms
15/03/06 15:30:24 INFO ShuffledDStream: Initialized and validated org.apache.spark.streaming.dstream.ShuffledDStream@42eba6ee
15/03/06 15:30:24 INFO ForEachDStream: Slide time = 1000 ms
15/03/06 15:30:24 INFO ForEachDStream: Storage level = StorageLevel(false, false, false, false, 1)
15/03/06 15:30:24 INFO ForEachDStream: Checkpoint interval = null
15/03/06 15:30:24 INFO ForEachDStream: Remember duration = 1000 ms
15/03/06 15:30:24 INFO ForEachDStream: Initialized and validated org.apache.spark.streaming.dstream.ForEachDStream@48d166b5
15/03/06 15:30:24 INFO SparkContext: Starting job: start at SparkStreamTest.scala:21
15/03/06 15:30:24 INFO RecurringTimer: Started timer for JobGenerator at time 1425627025000
15/03/06 15:30:24 INFO JobGenerator: Started JobGenerator at 1425627025000 ms
15/03/06 15:30:24 INFO JobScheduler: Started JobScheduler
15/03/06 15:30:24 INFO DAGScheduler: Registering RDD 2 (start at SparkStreamTest.scala:21)
15/03/06 15:30:24 INFO DAGScheduler: Got job 0 (start at SparkStreamTest.scala:21) with 20 output partitions (allowLocal=false)
15/03/06 15:30:24 INFO DAGScheduler: Final stage: Stage 0(start at SparkStreamTest.scala:21)
15/03/06 15:30:24 INFO DAGScheduler: Parents of final stage: List(Stage 1)
15/03/06 15:30:24 INFO DAGScheduler: Missing parents: List(Stage 1)
15/03/06 15:30:24 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[2] at start at SparkStreamTest.scala:21), which has no missing parents
15/03/06 15:30:24 INFO MemoryStore: ensureFreeSpace(2720) called with curMem=0, maxMem=277842493
15/03/06 15:30:24 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.7 KB, free 265.0 MB)
15/03/06 15:30:24 INFO MemoryStore: ensureFreeSpace(1594) called with curMem=2720, maxMem=277842493
15/03/06 15:30:24 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1594.0 B, free 265.0 MB)
15/03/06 15:30:24 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.124.1:57216 (size: 1594.0 B, free: 265.0 MB)
15/03/06 15:30:24 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/03/06 15:30:24 INFO DAGScheduler: Submitting 50 missing tasks from Stage 1 (MappedRDD[2] at start at SparkStreamTest.scala:21)
15/03/06 15:30:24 INFO YarnClientClusterScheduler: Adding task set 1.0 with 50 tasks
15/03/06 15:30:25 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:25 INFO JobScheduler: Added jobs for time 1425627025000 ms
15/03/06 15:30:25 INFO JobScheduler: Starting job streaming job 1425627025000 ms.0 from job set of time 1425627025000 ms
15/03/06 15:30:25 INFO SparkContext: Starting job: getCallSite at DStream.scala:294
15/03/06 15:30:25 INFO DAGScheduler: Registering RDD 6 (map at SparkStreamTest.scala:18)
15/03/06 15:30:25 INFO DAGScheduler: Got job 1 (getCallSite at DStream.scala:294) with 1 output partitions (allowLocal=true)
15/03/06 15:30:25 INFO DAGScheduler: Final stage: Stage 2(getCallSite at DStream.scala:294)
15/03/06 15:30:25 INFO DAGScheduler: Parents of final stage: List(Stage 3)
15/03/06 15:30:25 INFO DAGScheduler: Missing parents: List()
15/03/06 15:30:25 INFO DAGScheduler: Submitting Stage 2 (ShuffledRDD[7] at reduceByKey at SparkStreamTest.scala:19), which has no missing parents
15/03/06 15:30:25 INFO MemoryStore: ensureFreeSpace(2136) called with curMem=4314, maxMem=277842493
15/03/06 15:30:25 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.1 KB, free 265.0 MB)
15/03/06 15:30:25 INFO MemoryStore: ensureFreeSpace(1333) called with curMem=6450, maxMem=277842493
15/03/06 15:30:25 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1333.0 B, free 265.0 MB)
15/03/06 15:30:25 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.124.1:57216 (size: 1333.0 B, free: 265.0 MB)
15/03/06 15:30:25 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
15/03/06 15:30:25 INFO DAGScheduler: Submitting 1 missing tasks from Stage 2 (ShuffledRDD[7] at reduceByKey at SparkStreamTest.scala:19)
15/03/06 15:30:25 INFO YarnClientClusterScheduler: Adding task set 2.0 with 1 tasks
15/03/06 15:30:26 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:26 INFO JobScheduler: Added jobs for time 1425627026000 ms
15/03/06 15:30:27 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:27 INFO JobScheduler: Added jobs for time 1425627027000 ms
15/03/06 15:30:28 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:28 INFO JobScheduler: Added jobs for time 1425627028000 ms
15/03/06 15:30:29 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:29 INFO JobScheduler: Added jobs for time 1425627029000 ms
15/03/06 15:30:30 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:30 INFO JobScheduler: Added jobs for time 1425627030000 ms
15/03/06 15:30:31 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:31 INFO JobScheduler: Added jobs for time 1425627031000 ms
15/03/06 15:30:32 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:32 INFO JobScheduler: Added jobs for time 1425627032000 ms
15/03/06 15:30:33 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:33 INFO JobScheduler: Added jobs for time 1425627033000 ms
15/03/06 15:30:34 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:34 INFO JobScheduler: Added jobs for time 1425627034000 ms
15/03/06 15:30:35 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:35 INFO JobScheduler: Added jobs for time 1425627035000 ms
15/03/06 15:30:36 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:36 INFO JobScheduler: Added jobs for time 1425627036000 ms
15/03/06 15:30:37 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:37 INFO JobScheduler: Added jobs for time 1425627037000 ms
15/03/06 15:30:38 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:38 INFO JobScheduler: Added jobs for time 1425627038000 ms
15/03/06 15:30:39 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:39 INFO JobScheduler: Added jobs for time 1425627039000 ms
15/03/06 15:30:39 WARN YarnClientClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
15/03/06 15:30:40 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:40 INFO JobScheduler: Added jobs for time 1425627040000 ms
15/03/06 15:30:41 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:41 INFO JobScheduler: Added jobs for time 1425627041000 ms
15/03/06 15:30:42 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:42 INFO JobScheduler: Added jobs for time 1425627042000 ms
15/03/06 15:30:43 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:43 INFO JobScheduler: Added jobs for time 1425627043000 ms
15/03/06 15:30:44 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:44 INFO JobScheduler: Added jobs for time 1425627044000 ms
15/03/06 15:30:45 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:45 INFO JobScheduler: Added jobs for time 1425627045000 ms
15/03/06 15:30:46 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:46 INFO JobScheduler: Added jobs for time 1425627046000 ms
15/03/06 15:30:47 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/06 15:30:47 INFO JobScheduler: Added jobs for time 1425627047000 ms



thanks for any help



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: spark-stream programme failed on yarn-client

Posted by fenghaixiong <98...@qq.com>.
Thanks ,you advise is usefull I just submit my job on my spark client which config with simple configure file so it failed 
when i run my job on service machine everything is okay 

 
On Fri, Mar 06, 2015 at 02:10:04PM +0530, Akhil Das wrote:
> Looks like an issue with your yarn setup, could you try doing a simple
> example with spark-shell?
> 
> Start the spark shell as:
> 
> $*MASTER=yarn-client bin/spark-shell*
> *spark-shell> *sc.parallelize(1 to 1000).collect
> 
> ​If that doesn't work, then make sure your yarn services are up and running
> and in your spark-env.sh you may set the corresponding configurations from
> the following:​
> 
> 
> # Options read in YARN client mode
> # - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
> # - SPARK_EXECUTOR_INSTANCES, Number of workers to start (Default: 2)
> # - SPARK_EXECUTOR_CORES, Number of cores for the workers (Default: 1).
> # - SPARK_EXECUTOR_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
> # - SPARK_DRIVER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512
> Mb)
> 
> 
> Thanks
> Best Regards
> 
> On Fri, Mar 6, 2015 at 1:09 PM, fenghaixiong <98...@qq.com> wrote:
> 
> > Hi all,
> >
> >
> >             I'm try to write a spark stream programme so i read the spark
> > online document ,according the document i write the programe like this :
> >
> > import org.apache.spark.SparkConf
> > import org.apache.spark.streaming._
> > import org.apache.spark.streaming.StreamingContext._
> >
> > object SparkStreamTest {
> >   def main(args: Array[String]) {
> >     val conf = new SparkConf()
> >     val ssc = new StreamingContext(conf, Seconds(1))
> >     val lines = ssc.socketTextStream(args(0), args(1).toInt)
> >     val words = lines.flatMap(_.split(" "))
> >     val pairs = words.map(word => (word, 1))
> >     val wordCounts = pairs.reduceByKey(_ + _)
> >     wordCounts.print()
> >     ssc.start()             // Start the computation
> >     ssc.awaitTermination()  // Wait for the computation to terminate
> >   }
> >
> > }
> >
> >
> >
> > for test i first start listen a port by this:
> >  nc -lk 9999
> >
> > and then i submit job by
> > spark-submit  --master local[2] --class com.nd.hxf.SparkStreamTest
> > spark-test-tream-1.0-SNAPSHOT-job.jar  localhost 9999
> >
> > everything is okay
> >
> >
> > but when i run it on yarn by this :
> > spark-submit  --master yarn-client --class com.nd.hxf.SparkStreamTest
> > spark-test-tream-1.0-SNAPSHOT-job.jar  localhost 9999
> >
> > it wait for a longtime and repeat output somemessage a apart of the output
> > is like this:
> >
> >
> >
> >
> >
> >
> >
> > 15/03/06 15:30:24 INFO YarnClientSchedulerBackend: SchedulerBackend is
> > ready for scheduling beginning after waiting
> > maxRegisteredResourcesWaitingTime: 30000(ms)
> > 15/03/06 15:30:24 INFO ReceiverTracker: ReceiverTracker started
> > 15/03/06 15:30:24 INFO ForEachDStream: metadataCleanupDelay = -1
> > 15/03/06 15:30:24 INFO ShuffledDStream: metadataCleanupDelay = -1
> > 15/03/06 15:30:24 INFO MappedDStream: metadataCleanupDelay = -1
> > 15/03/06 15:30:24 INFO FlatMappedDStream: metadataCleanupDelay = -1
> > 15/03/06 15:30:24 INFO SocketInputDStream: metadataCleanupDelay = -1
> > 15/03/06 15:30:24 INFO SocketInputDStream: Slide time = 1000 ms
> > 15/03/06 15:30:24 INFO SocketInputDStream: Storage level =
> > StorageLevel(false, false, false, false, 1)
> > 15/03/06 15:30:24 INFO SocketInputDStream: Checkpoint interval = null
> > 15/03/06 15:30:24 INFO SocketInputDStream: Remember duration = 1000 ms
> > 15/03/06 15:30:24 INFO SocketInputDStream: Initialized and validated
> > org.apache.spark.streaming.dstream.SocketInputDStream@b01c5f8
> > 15/03/06 15:30:24 INFO FlatMappedDStream: Slide time = 1000 ms
> > 15/03/06 15:30:24 INFO FlatMappedDStream: Storage level =
> > StorageLevel(false, false, false, false, 1)
> > 15/03/06 15:30:24 INFO FlatMappedDStream: Checkpoint interval = null
> > 15/03/06 15:30:24 INFO FlatMappedDStream: Remember duration = 1000 ms
> > 15/03/06 15:30:24 INFO FlatMappedDStream: Initialized and validated
> > org.apache.spark.streaming.dstream.FlatMappedDStream@6bd47453
> > 15/03/06 15:30:24 INFO MappedDStream: Slide time = 1000 ms
> > 15/03/06 15:30:24 INFO MappedDStream: Storage level = StorageLevel(false,
> > false, false, false, 1)
> > 15/03/06 15:30:24 INFO MappedDStream: Checkpoint interval = null
> > 15/03/06 15:30:24 INFO MappedDStream: Remember duration = 1000 ms
> > 15/03/06 15:30:24 INFO MappedDStream: Initialized and validated
> > org.apache.spark.streaming.dstream.MappedDStream@941451f
> > 15/03/06 15:30:24 INFO ShuffledDStream: Slide time = 1000 ms
> > 15/03/06 15:30:24 INFO ShuffledDStream: Storage level =
> > StorageLevel(false, false, false, false, 1)
> > 15/03/06 15:30:24 INFO ShuffledDStream: Checkpoint interval = null
> > 15/03/06 15:30:24 INFO ShuffledDStream: Remember duration = 1000 ms
> > 15/03/06 15:30:24 INFO ShuffledDStream: Initialized and validated
> > org.apache.spark.streaming.dstream.ShuffledDStream@42eba6ee
> > 15/03/06 15:30:24 INFO ForEachDStream: Slide time = 1000 ms
> > 15/03/06 15:30:24 INFO ForEachDStream: Storage level = StorageLevel(false,
> > false, false, false, 1)
> > 15/03/06 15:30:24 INFO ForEachDStream: Checkpoint interval = null
> > 15/03/06 15:30:24 INFO ForEachDStream: Remember duration = 1000 ms
> > 15/03/06 15:30:24 INFO ForEachDStream: Initialized and validated
> > org.apache.spark.streaming.dstream.ForEachDStream@48d166b5
> > 15/03/06 15:30:24 INFO SparkContext: Starting job: start at
> > SparkStreamTest.scala:21
> > 15/03/06 15:30:24 INFO RecurringTimer: Started timer for JobGenerator at
> > time 1425627025000
> > 15/03/06 15:30:24 INFO JobGenerator: Started JobGenerator at 1425627025000
> > ms
> > 15/03/06 15:30:24 INFO JobScheduler: Started JobScheduler
> > 15/03/06 15:30:24 INFO DAGScheduler: Registering RDD 2 (start at
> > SparkStreamTest.scala:21)
> > 15/03/06 15:30:24 INFO DAGScheduler: Got job 0 (start at
> > SparkStreamTest.scala:21) with 20 output partitions (allowLocal=false)
> > 15/03/06 15:30:24 INFO DAGScheduler: Final stage: Stage 0(start at
> > SparkStreamTest.scala:21)
> > 15/03/06 15:30:24 INFO DAGScheduler: Parents of final stage: List(Stage 1)
> > 15/03/06 15:30:24 INFO DAGScheduler: Missing parents: List(Stage 1)
> > 15/03/06 15:30:24 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[2] at
> > start at SparkStreamTest.scala:21), which has no missing parents
> > 15/03/06 15:30:24 INFO MemoryStore: ensureFreeSpace(2720) called with
> > curMem=0, maxMem=277842493
> > 15/03/06 15:30:24 INFO MemoryStore: Block broadcast_0 stored as values in
> > memory (estimated size 2.7 KB, free 265.0 MB)
> > 15/03/06 15:30:24 INFO MemoryStore: ensureFreeSpace(1594) called with
> > curMem=2720, maxMem=277842493
> > 15/03/06 15:30:24 INFO MemoryStore: Block broadcast_0_piece0 stored as
> > bytes in memory (estimated size 1594.0 B, free 265.0 MB)
> > 15/03/06 15:30:24 INFO BlockManagerInfo: Added broadcast_0_piece0 in
> > memory on 192.168.124.1:57216 (size: 1594.0 B, free: 265.0 MB)
> > 15/03/06 15:30:24 INFO BlockManagerMaster: Updated info of block
> > broadcast_0_piece0
> > 15/03/06 15:30:24 INFO DAGScheduler: Submitting 50 missing tasks from
> > Stage 1 (MappedRDD[2] at start at SparkStreamTest.scala:21)
> > 15/03/06 15:30:24 INFO YarnClientClusterScheduler: Adding task set 1.0
> > with 50 tasks
> > 15/03/06 15:30:25 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:25 INFO JobScheduler: Added jobs for time 1425627025000 ms
> > 15/03/06 15:30:25 INFO JobScheduler: Starting job streaming job
> > 1425627025000 ms.0 from job set of time 1425627025000 ms
> > 15/03/06 15:30:25 INFO SparkContext: Starting job: getCallSite at
> > DStream.scala:294
> > 15/03/06 15:30:25 INFO DAGScheduler: Registering RDD 6 (map at
> > SparkStreamTest.scala:18)
> > 15/03/06 15:30:25 INFO DAGScheduler: Got job 1 (getCallSite at
> > DStream.scala:294) with 1 output partitions (allowLocal=true)
> > 15/03/06 15:30:25 INFO DAGScheduler: Final stage: Stage 2(getCallSite at
> > DStream.scala:294)
> > 15/03/06 15:30:25 INFO DAGScheduler: Parents of final stage: List(Stage 3)
> > 15/03/06 15:30:25 INFO DAGScheduler: Missing parents: List()
> > 15/03/06 15:30:25 INFO DAGScheduler: Submitting Stage 2 (ShuffledRDD[7] at
> > reduceByKey at SparkStreamTest.scala:19), which has no missing parents
> > 15/03/06 15:30:25 INFO MemoryStore: ensureFreeSpace(2136) called with
> > curMem=4314, maxMem=277842493
> > 15/03/06 15:30:25 INFO MemoryStore: Block broadcast_1 stored as values in
> > memory (estimated size 2.1 KB, free 265.0 MB)
> > 15/03/06 15:30:25 INFO MemoryStore: ensureFreeSpace(1333) called with
> > curMem=6450, maxMem=277842493
> > 15/03/06 15:30:25 INFO MemoryStore: Block broadcast_1_piece0 stored as
> > bytes in memory (estimated size 1333.0 B, free 265.0 MB)
> > 15/03/06 15:30:25 INFO BlockManagerInfo: Added broadcast_1_piece0 in
> > memory on 192.168.124.1:57216 (size: 1333.0 B, free: 265.0 MB)
> > 15/03/06 15:30:25 INFO BlockManagerMaster: Updated info of block
> > broadcast_1_piece0
> > 15/03/06 15:30:25 INFO DAGScheduler: Submitting 1 missing tasks from Stage
> > 2 (ShuffledRDD[7] at reduceByKey at SparkStreamTest.scala:19)
> > 15/03/06 15:30:25 INFO YarnClientClusterScheduler: Adding task set 2.0
> > with 1 tasks
> > 15/03/06 15:30:26 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:26 INFO JobScheduler: Added jobs for time 1425627026000 ms
> > 15/03/06 15:30:27 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:27 INFO JobScheduler: Added jobs for time 1425627027000 ms
> > 15/03/06 15:30:28 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:28 INFO JobScheduler: Added jobs for time 1425627028000 ms
> > 15/03/06 15:30:29 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:29 INFO JobScheduler: Added jobs for time 1425627029000 ms
> > 15/03/06 15:30:30 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:30 INFO JobScheduler: Added jobs for time 1425627030000 ms
> > 15/03/06 15:30:31 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:31 INFO JobScheduler: Added jobs for time 1425627031000 ms
> > 15/03/06 15:30:32 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:32 INFO JobScheduler: Added jobs for time 1425627032000 ms
> > 15/03/06 15:30:33 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:33 INFO JobScheduler: Added jobs for time 1425627033000 ms
> > 15/03/06 15:30:34 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:34 INFO JobScheduler: Added jobs for time 1425627034000 ms
> > 15/03/06 15:30:35 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:35 INFO JobScheduler: Added jobs for time 1425627035000 ms
> > 15/03/06 15:30:36 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:36 INFO JobScheduler: Added jobs for time 1425627036000 ms
> > 15/03/06 15:30:37 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:37 INFO JobScheduler: Added jobs for time 1425627037000 ms
> > 15/03/06 15:30:38 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:38 INFO JobScheduler: Added jobs for time 1425627038000 ms
> > 15/03/06 15:30:39 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:39 INFO JobScheduler: Added jobs for time 1425627039000 ms
> > 15/03/06 15:30:39 WARN YarnClientClusterScheduler: Initial job has not
> > accepted any resources; check your cluster UI to ensure that workers are
> > registered and have sufficient memory
> > 15/03/06 15:30:40 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:40 INFO JobScheduler: Added jobs for time 1425627040000 ms
> > 15/03/06 15:30:41 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:41 INFO JobScheduler: Added jobs for time 1425627041000 ms
> > 15/03/06 15:30:42 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:42 INFO JobScheduler: Added jobs for time 1425627042000 ms
> > 15/03/06 15:30:43 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:43 INFO JobScheduler: Added jobs for time 1425627043000 ms
> > 15/03/06 15:30:44 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:44 INFO JobScheduler: Added jobs for time 1425627044000 ms
> > 15/03/06 15:30:45 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:45 INFO JobScheduler: Added jobs for time 1425627045000 ms
> > 15/03/06 15:30:46 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:46 INFO JobScheduler: Added jobs for time 1425627046000 ms
> > 15/03/06 15:30:47 INFO ReceiverTracker: Stream 0 received 0 blocks
> > 15/03/06 15:30:47 INFO JobScheduler: Added jobs for time 1425627047000 ms
> >
> >
> >
> > thanks for any help
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> > For additional commands, e-mail: user-help@spark.apache.org
> >
> >



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: spark-stream programme failed on yarn-client

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
Looks like an issue with your yarn setup, could you try doing a simple
example with spark-shell?

Start the spark shell as:

$*MASTER=yarn-client bin/spark-shell*
*spark-shell> *sc.parallelize(1 to 1000).collect

​If that doesn't work, then make sure your yarn services are up and running
and in your spark-env.sh you may set the corresponding configurations from
the following:​


# Options read in YARN client mode
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_EXECUTOR_INSTANCES, Number of workers to start (Default: 2)
# - SPARK_EXECUTOR_CORES, Number of cores for the workers (Default: 1).
# - SPARK_EXECUTOR_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
# - SPARK_DRIVER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512
Mb)


Thanks
Best Regards

On Fri, Mar 6, 2015 at 1:09 PM, fenghaixiong <98...@qq.com> wrote:

> Hi all,
>
>
>             I'm try to write a spark stream programme so i read the spark
> online document ,according the document i write the programe like this :
>
> import org.apache.spark.SparkConf
> import org.apache.spark.streaming._
> import org.apache.spark.streaming.StreamingContext._
>
> object SparkStreamTest {
>   def main(args: Array[String]) {
>     val conf = new SparkConf()
>     val ssc = new StreamingContext(conf, Seconds(1))
>     val lines = ssc.socketTextStream(args(0), args(1).toInt)
>     val words = lines.flatMap(_.split(" "))
>     val pairs = words.map(word => (word, 1))
>     val wordCounts = pairs.reduceByKey(_ + _)
>     wordCounts.print()
>     ssc.start()             // Start the computation
>     ssc.awaitTermination()  // Wait for the computation to terminate
>   }
>
> }
>
>
>
> for test i first start listen a port by this:
>  nc -lk 9999
>
> and then i submit job by
> spark-submit  --master local[2] --class com.nd.hxf.SparkStreamTest
> spark-test-tream-1.0-SNAPSHOT-job.jar  localhost 9999
>
> everything is okay
>
>
> but when i run it on yarn by this :
> spark-submit  --master yarn-client --class com.nd.hxf.SparkStreamTest
> spark-test-tream-1.0-SNAPSHOT-job.jar  localhost 9999
>
> it wait for a longtime and repeat output somemessage a apart of the output
> is like this:
>
>
>
>
>
>
>
> 15/03/06 15:30:24 INFO YarnClientSchedulerBackend: SchedulerBackend is
> ready for scheduling beginning after waiting
> maxRegisteredResourcesWaitingTime: 30000(ms)
> 15/03/06 15:30:24 INFO ReceiverTracker: ReceiverTracker started
> 15/03/06 15:30:24 INFO ForEachDStream: metadataCleanupDelay = -1
> 15/03/06 15:30:24 INFO ShuffledDStream: metadataCleanupDelay = -1
> 15/03/06 15:30:24 INFO MappedDStream: metadataCleanupDelay = -1
> 15/03/06 15:30:24 INFO FlatMappedDStream: metadataCleanupDelay = -1
> 15/03/06 15:30:24 INFO SocketInputDStream: metadataCleanupDelay = -1
> 15/03/06 15:30:24 INFO SocketInputDStream: Slide time = 1000 ms
> 15/03/06 15:30:24 INFO SocketInputDStream: Storage level =
> StorageLevel(false, false, false, false, 1)
> 15/03/06 15:30:24 INFO SocketInputDStream: Checkpoint interval = null
> 15/03/06 15:30:24 INFO SocketInputDStream: Remember duration = 1000 ms
> 15/03/06 15:30:24 INFO SocketInputDStream: Initialized and validated
> org.apache.spark.streaming.dstream.SocketInputDStream@b01c5f8
> 15/03/06 15:30:24 INFO FlatMappedDStream: Slide time = 1000 ms
> 15/03/06 15:30:24 INFO FlatMappedDStream: Storage level =
> StorageLevel(false, false, false, false, 1)
> 15/03/06 15:30:24 INFO FlatMappedDStream: Checkpoint interval = null
> 15/03/06 15:30:24 INFO FlatMappedDStream: Remember duration = 1000 ms
> 15/03/06 15:30:24 INFO FlatMappedDStream: Initialized and validated
> org.apache.spark.streaming.dstream.FlatMappedDStream@6bd47453
> 15/03/06 15:30:24 INFO MappedDStream: Slide time = 1000 ms
> 15/03/06 15:30:24 INFO MappedDStream: Storage level = StorageLevel(false,
> false, false, false, 1)
> 15/03/06 15:30:24 INFO MappedDStream: Checkpoint interval = null
> 15/03/06 15:30:24 INFO MappedDStream: Remember duration = 1000 ms
> 15/03/06 15:30:24 INFO MappedDStream: Initialized and validated
> org.apache.spark.streaming.dstream.MappedDStream@941451f
> 15/03/06 15:30:24 INFO ShuffledDStream: Slide time = 1000 ms
> 15/03/06 15:30:24 INFO ShuffledDStream: Storage level =
> StorageLevel(false, false, false, false, 1)
> 15/03/06 15:30:24 INFO ShuffledDStream: Checkpoint interval = null
> 15/03/06 15:30:24 INFO ShuffledDStream: Remember duration = 1000 ms
> 15/03/06 15:30:24 INFO ShuffledDStream: Initialized and validated
> org.apache.spark.streaming.dstream.ShuffledDStream@42eba6ee
> 15/03/06 15:30:24 INFO ForEachDStream: Slide time = 1000 ms
> 15/03/06 15:30:24 INFO ForEachDStream: Storage level = StorageLevel(false,
> false, false, false, 1)
> 15/03/06 15:30:24 INFO ForEachDStream: Checkpoint interval = null
> 15/03/06 15:30:24 INFO ForEachDStream: Remember duration = 1000 ms
> 15/03/06 15:30:24 INFO ForEachDStream: Initialized and validated
> org.apache.spark.streaming.dstream.ForEachDStream@48d166b5
> 15/03/06 15:30:24 INFO SparkContext: Starting job: start at
> SparkStreamTest.scala:21
> 15/03/06 15:30:24 INFO RecurringTimer: Started timer for JobGenerator at
> time 1425627025000
> 15/03/06 15:30:24 INFO JobGenerator: Started JobGenerator at 1425627025000
> ms
> 15/03/06 15:30:24 INFO JobScheduler: Started JobScheduler
> 15/03/06 15:30:24 INFO DAGScheduler: Registering RDD 2 (start at
> SparkStreamTest.scala:21)
> 15/03/06 15:30:24 INFO DAGScheduler: Got job 0 (start at
> SparkStreamTest.scala:21) with 20 output partitions (allowLocal=false)
> 15/03/06 15:30:24 INFO DAGScheduler: Final stage: Stage 0(start at
> SparkStreamTest.scala:21)
> 15/03/06 15:30:24 INFO DAGScheduler: Parents of final stage: List(Stage 1)
> 15/03/06 15:30:24 INFO DAGScheduler: Missing parents: List(Stage 1)
> 15/03/06 15:30:24 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[2] at
> start at SparkStreamTest.scala:21), which has no missing parents
> 15/03/06 15:30:24 INFO MemoryStore: ensureFreeSpace(2720) called with
> curMem=0, maxMem=277842493
> 15/03/06 15:30:24 INFO MemoryStore: Block broadcast_0 stored as values in
> memory (estimated size 2.7 KB, free 265.0 MB)
> 15/03/06 15:30:24 INFO MemoryStore: ensureFreeSpace(1594) called with
> curMem=2720, maxMem=277842493
> 15/03/06 15:30:24 INFO MemoryStore: Block broadcast_0_piece0 stored as
> bytes in memory (estimated size 1594.0 B, free 265.0 MB)
> 15/03/06 15:30:24 INFO BlockManagerInfo: Added broadcast_0_piece0 in
> memory on 192.168.124.1:57216 (size: 1594.0 B, free: 265.0 MB)
> 15/03/06 15:30:24 INFO BlockManagerMaster: Updated info of block
> broadcast_0_piece0
> 15/03/06 15:30:24 INFO DAGScheduler: Submitting 50 missing tasks from
> Stage 1 (MappedRDD[2] at start at SparkStreamTest.scala:21)
> 15/03/06 15:30:24 INFO YarnClientClusterScheduler: Adding task set 1.0
> with 50 tasks
> 15/03/06 15:30:25 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:25 INFO JobScheduler: Added jobs for time 1425627025000 ms
> 15/03/06 15:30:25 INFO JobScheduler: Starting job streaming job
> 1425627025000 ms.0 from job set of time 1425627025000 ms
> 15/03/06 15:30:25 INFO SparkContext: Starting job: getCallSite at
> DStream.scala:294
> 15/03/06 15:30:25 INFO DAGScheduler: Registering RDD 6 (map at
> SparkStreamTest.scala:18)
> 15/03/06 15:30:25 INFO DAGScheduler: Got job 1 (getCallSite at
> DStream.scala:294) with 1 output partitions (allowLocal=true)
> 15/03/06 15:30:25 INFO DAGScheduler: Final stage: Stage 2(getCallSite at
> DStream.scala:294)
> 15/03/06 15:30:25 INFO DAGScheduler: Parents of final stage: List(Stage 3)
> 15/03/06 15:30:25 INFO DAGScheduler: Missing parents: List()
> 15/03/06 15:30:25 INFO DAGScheduler: Submitting Stage 2 (ShuffledRDD[7] at
> reduceByKey at SparkStreamTest.scala:19), which has no missing parents
> 15/03/06 15:30:25 INFO MemoryStore: ensureFreeSpace(2136) called with
> curMem=4314, maxMem=277842493
> 15/03/06 15:30:25 INFO MemoryStore: Block broadcast_1 stored as values in
> memory (estimated size 2.1 KB, free 265.0 MB)
> 15/03/06 15:30:25 INFO MemoryStore: ensureFreeSpace(1333) called with
> curMem=6450, maxMem=277842493
> 15/03/06 15:30:25 INFO MemoryStore: Block broadcast_1_piece0 stored as
> bytes in memory (estimated size 1333.0 B, free 265.0 MB)
> 15/03/06 15:30:25 INFO BlockManagerInfo: Added broadcast_1_piece0 in
> memory on 192.168.124.1:57216 (size: 1333.0 B, free: 265.0 MB)
> 15/03/06 15:30:25 INFO BlockManagerMaster: Updated info of block
> broadcast_1_piece0
> 15/03/06 15:30:25 INFO DAGScheduler: Submitting 1 missing tasks from Stage
> 2 (ShuffledRDD[7] at reduceByKey at SparkStreamTest.scala:19)
> 15/03/06 15:30:25 INFO YarnClientClusterScheduler: Adding task set 2.0
> with 1 tasks
> 15/03/06 15:30:26 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:26 INFO JobScheduler: Added jobs for time 1425627026000 ms
> 15/03/06 15:30:27 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:27 INFO JobScheduler: Added jobs for time 1425627027000 ms
> 15/03/06 15:30:28 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:28 INFO JobScheduler: Added jobs for time 1425627028000 ms
> 15/03/06 15:30:29 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:29 INFO JobScheduler: Added jobs for time 1425627029000 ms
> 15/03/06 15:30:30 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:30 INFO JobScheduler: Added jobs for time 1425627030000 ms
> 15/03/06 15:30:31 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:31 INFO JobScheduler: Added jobs for time 1425627031000 ms
> 15/03/06 15:30:32 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:32 INFO JobScheduler: Added jobs for time 1425627032000 ms
> 15/03/06 15:30:33 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:33 INFO JobScheduler: Added jobs for time 1425627033000 ms
> 15/03/06 15:30:34 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:34 INFO JobScheduler: Added jobs for time 1425627034000 ms
> 15/03/06 15:30:35 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:35 INFO JobScheduler: Added jobs for time 1425627035000 ms
> 15/03/06 15:30:36 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:36 INFO JobScheduler: Added jobs for time 1425627036000 ms
> 15/03/06 15:30:37 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:37 INFO JobScheduler: Added jobs for time 1425627037000 ms
> 15/03/06 15:30:38 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:38 INFO JobScheduler: Added jobs for time 1425627038000 ms
> 15/03/06 15:30:39 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:39 INFO JobScheduler: Added jobs for time 1425627039000 ms
> 15/03/06 15:30:39 WARN YarnClientClusterScheduler: Initial job has not
> accepted any resources; check your cluster UI to ensure that workers are
> registered and have sufficient memory
> 15/03/06 15:30:40 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:40 INFO JobScheduler: Added jobs for time 1425627040000 ms
> 15/03/06 15:30:41 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:41 INFO JobScheduler: Added jobs for time 1425627041000 ms
> 15/03/06 15:30:42 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:42 INFO JobScheduler: Added jobs for time 1425627042000 ms
> 15/03/06 15:30:43 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:43 INFO JobScheduler: Added jobs for time 1425627043000 ms
> 15/03/06 15:30:44 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:44 INFO JobScheduler: Added jobs for time 1425627044000 ms
> 15/03/06 15:30:45 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:45 INFO JobScheduler: Added jobs for time 1425627045000 ms
> 15/03/06 15:30:46 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:46 INFO JobScheduler: Added jobs for time 1425627046000 ms
> 15/03/06 15:30:47 INFO ReceiverTracker: Stream 0 received 0 blocks
> 15/03/06 15:30:47 INFO JobScheduler: Added jobs for time 1425627047000 ms
>
>
>
> thanks for any help
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>