You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Vinti Maheshwari <vi...@gmail.com> on 2016/02/23 22:13:20 UTC

Network Spark Streaming from multiple remote hosts

Hi All

I wrote program for Spark Streaming in Scala. In my program, i passed
'remote-host' and 'remote port' under socketTextStream.

And in the remote machine, i have one perl script who is calling system
command:

echo 'data_str' | nc <remote_host> <9999>

In that way, my spark program is able to get data, but it seems little bit
confusing as i have multiple remote machines which needs to send data to
spark machine. I wanted to know the right way of doing it. Infact, how will
i deal with data coming from multiple hosts?

For Reference, My current program:

def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("HBaseStream")
    val sc = new SparkContext(conf)

    val ssc = new StreamingContext(sc, Seconds(2))

    val inputStream = ssc.socketTextStream(<remote-host>, 9999)
    -------------------
    -------------------

    ssc.start()
    // Wait for the computation to terminate
    ssc.awaitTermination()

  }}

Thanks in advance.

Regards,
~Vinti

Re: Network Spark Streaming from multiple remote hosts

Posted by Kevin Mellott <ke...@gmail.com>.
Hi Vinti,

That example is (in my opinion) more of a tutorial and not necessarily the
way you'd want to set it up for a "real world" application. I'd recommend
using something like Apache Kafka, which will allow the various hosts to
publish messages to a queue. Your Spark Streaming application is then
receiving messages from the queue and performing whatever processing you'd
like.

http://kafka.apache.org/documentation.html#introduction

Thanks,
Kevin

On Tue, Feb 23, 2016 at 3:13 PM, Vinti Maheshwari <vi...@gmail.com>
wrote:

> Hi All
>
> I wrote program for Spark Streaming in Scala. In my program, i passed
> 'remote-host' and 'remote port' under socketTextStream.
>
> And in the remote machine, i have one perl script who is calling system
> command:
>
> echo 'data_str' | nc <remote_host> <9999>
>
> In that way, my spark program is able to get data, but it seems little bit
> confusing as i have multiple remote machines which needs to send data to
> spark machine. I wanted to know the right way of doing it. Infact, how will
> i deal with data coming from multiple hosts?
>
> For Reference, My current program:
>
> def main(args: Array[String]): Unit = {
>     val conf = new SparkConf().setAppName("HBaseStream")
>     val sc = new SparkContext(conf)
>
>     val ssc = new StreamingContext(sc, Seconds(2))
>
>     val inputStream = ssc.socketTextStream(<remote-host>, 9999)
>     -------------------
>     -------------------
>
>     ssc.start()
>     // Wait for the computation to terminate
>     ssc.awaitTermination()
>
>   }}
>
> Thanks in advance.
>
> Regards,
> ~Vinti
>