You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Reddy Raja <ar...@gmail.com> on 2014/09/24 12:35:23 UTC
Spark Streaming
Given this program.. I have the following queries..
val sparkConf = new SparkConf().setAppName("NetworkWordCount")
sparkConf.set("spark.master", "local[2]")
val ssc = new StreamingContext(sparkConf, Seconds(10))
val lines = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.
MEMORY_AND_DISK_SER)
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
wordCounts.print()
ssc.start()
ssc.awaitTermination()
Q1) How do I know which part of the program is executing every 10 sec..
My requirements is that, I want to execute a method and insert data into
Cassandra every time a set of messages comes in
Q2) Is there a function I can pass, so that, it gets executed when the next
set of messages comes in.
Q3) If I have a method in-beween the following lines
val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
wordCounts.print()
my_method(stread rdd)..
ssc.start()
The method is not getting executed..
Can some one answer these questions?
--Reddy
Re: Spark Streaming
Posted by Akhil Das <ak...@sigmoidanalytics.com>.
See the inline response.
On Wed, Sep 24, 2014 at 4:05 PM, Reddy Raja <ar...@gmail.com> wrote:
> Given this program.. I have the following queries..
>
> val sparkConf = new SparkConf().setAppName("NetworkWordCount")
>
> sparkConf.set("spark.master", "local[2]")
>
> val ssc = new StreamingContext(sparkConf, Seconds(10))
>
> val
> **
> *lines* = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.
> MEMORY_AND_DISK_SER)
>
>
> **
> *val words = lines.flatMap(_.split(" "))*
>
>
> **
> * val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)*
>
>
> **
> *wordCounts.print()*
>
> ssc.start()
>
> ssc.awaitTermination()
>
>
> Q1) How do I know which part of the program is executing every 10 sec..
>
> My requirements is that, I want to execute a method and insert data
> into Cassandra every time a set of messages comes in
>
==> Those highlighted lines will be executed in every 10 sec.
Basically whatever operations that you are doing on *lines* will be
executed in every 10 secs, So to solve your problem you need to have a map
function on the lines which will do your data insertion to Cassandra.
Eg:
> *val dumdum = lines.map(x => { whatever you want to do with x (like
> insert into Cassandra) })*
Q2) Is there a function I can pass, so that, it gets executed when the next
> set of messages comes in.
>
==> Hope the first answer covers it.
> Q3) If I have a method in-beween the following lines
>
> val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
>
> wordCounts.print()
>
> **
> * my_method(stread rdd)..*
>
> ssc.start()
>
>
> ==> No!! my_method will only execute one time
.
> The method is not getting executed..
>
>
> Can some one answer these questions?
>
> --Reddy
>