You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Vinti Maheshwari <vi...@gmail.com> on 2016/02/23 22:13:20 UTC
Network Spark Streaming from multiple remote hosts
Hi All
I wrote program for Spark Streaming in Scala. In my program, i passed
'remote-host' and 'remote port' under socketTextStream.
And in the remote machine, i have one perl script who is calling system
command:
echo 'data_str' | nc <remote_host> <9999>
In that way, my spark program is able to get data, but it seems little bit
confusing as i have multiple remote machines which needs to send data to
spark machine. I wanted to know the right way of doing it. Infact, how will
i deal with data coming from multiple hosts?
For Reference, My current program:
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("HBaseStream")
val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(2))
val inputStream = ssc.socketTextStream(<remote-host>, 9999)
-------------------
-------------------
ssc.start()
// Wait for the computation to terminate
ssc.awaitTermination()
}}
Thanks in advance.
Regards,
~Vinti
Re: Network Spark Streaming from multiple remote hosts
Posted by Kevin Mellott <ke...@gmail.com>.
Hi Vinti,
That example is (in my opinion) more of a tutorial and not necessarily the
way you'd want to set it up for a "real world" application. I'd recommend
using something like Apache Kafka, which will allow the various hosts to
publish messages to a queue. Your Spark Streaming application is then
receiving messages from the queue and performing whatever processing you'd
like.
http://kafka.apache.org/documentation.html#introduction
Thanks,
Kevin
On Tue, Feb 23, 2016 at 3:13 PM, Vinti Maheshwari <vi...@gmail.com>
wrote:
> Hi All
>
> I wrote program for Spark Streaming in Scala. In my program, i passed
> 'remote-host' and 'remote port' under socketTextStream.
>
> And in the remote machine, i have one perl script who is calling system
> command:
>
> echo 'data_str' | nc <remote_host> <9999>
>
> In that way, my spark program is able to get data, but it seems little bit
> confusing as i have multiple remote machines which needs to send data to
> spark machine. I wanted to know the right way of doing it. Infact, how will
> i deal with data coming from multiple hosts?
>
> For Reference, My current program:
>
> def main(args: Array[String]): Unit = {
> val conf = new SparkConf().setAppName("HBaseStream")
> val sc = new SparkContext(conf)
>
> val ssc = new StreamingContext(sc, Seconds(2))
>
> val inputStream = ssc.socketTextStream(<remote-host>, 9999)
> -------------------
> -------------------
>
> ssc.start()
> // Wait for the computation to terminate
> ssc.awaitTermination()
>
> }}
>
> Thanks in advance.
>
> Regards,
> ~Vinti
>