You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Reddy Raja <ar...@gmail.com> on 2014/09/24 12:35:23 UTC

Spark Streaming

Given this program.. I have the following queries..

val sparkConf = new SparkConf().setAppName("NetworkWordCount")

    sparkConf.set("spark.master", "local[2]")

    val ssc = new StreamingContext(sparkConf, Seconds(10))

    val lines = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.
MEMORY_AND_DISK_SER)

    val words = lines.flatMap(_.split(" "))

    val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)

    wordCounts.print()

    ssc.start()

    ssc.awaitTermination()


Q1) How do I know which part of the program is executing every 10 sec..

   My requirements is that, I want to execute a method and insert data into
Cassandra every time a set of messages comes in

Q2) Is there a function I can pass, so that, it gets executed when the next
set of messages comes in.

Q3) If I have a method in-beween the following lines

  val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)

    wordCounts.print()

    my_method(stread rdd)..

    ssc.start()


   The method is not getting executed..


Can some one answer these questions?

--Reddy

Re: Spark Streaming

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
See the inline response.

On Wed, Sep 24, 2014 at 4:05 PM, Reddy Raja <ar...@gmail.com> wrote:

> Given this program.. I have the following queries..
>
> val sparkConf = new SparkConf().setAppName("NetworkWordCount")
>
>     sparkConf.set("spark.master", "local[2]")
>
>     val ssc = new StreamingContext(sparkConf, Seconds(10))
>
>     val
> *​​*
> *lines* = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.
> MEMORY_AND_DISK_SER)
>
>
> *​​*
> *​​val words = lines.flatMap(_.split(" "))*
>
>
> *​​*
> * val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)*
>
>
> *​​*
> *wordCounts.print()*
>
>     ssc.start()
>
>     ssc.awaitTermination()
>
>
> Q1) How do I know which part of the program is executing every 10 sec..
>
>    My requirements is that, I want to execute a method and insert data
> into Cassandra every time a set of messages comes in
>
​==> Those highlighted lines will be executed in every 10 sec.​

​Basically whatever operations that you are doing on *lines* will be
executed in every 10 secs, So to solve your problem you need to have a map
function on the lines which will do your data insertion to Cassandra.
Eg:

> *​​val dumdum = lines.map(x => { whatever you want to do with x (like
> insert into Cassandra) })​*

Q2) Is there a function I can pass, so that, it gets executed when the next
> set of messages comes in.
>
​==> Hope the first answer covers it.​


> Q3) If I have a method in-beween the following lines
>
>   val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
>
>     wordCounts.print()
>
> *​​*
> *    my_method(stread rdd)..*
>
>     ssc.start()
>
>
> ​==> No!! my_method will only execute one time​

​.​
​

>    The method is not getting executed..
>
>
> Can some one answer these questions?
>
> --Reddy
>