You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Kulkarni, Vikram" <vi...@hp.com> on 2014/04/21 09:15:17 UTC

socketTextStream() call on Cluster stream no records

Hello Spark-users,
  I modified the org.apache.spark.streaming.examples.JavaNetworkWordCount that uses a Netcat server to instead read data from a SocketServer implementation. The SocketServer Java program accepts connections on port 9999; simulates 1 million records and streams it to the socket client (in this case, the Spark Streaming driver program). The records simulated are String of format: [msisdn|ip_addr|start_time|end_time]

The Spark Streaming driver program connects to this SocketServer program on port 9999 and uses a batch interval of 3s to process this
data. I am able to run this program locally (i.e. using local[2]). The Spark Streaming program is able to establish a socket
connection with the SocketServer Java program (running on the same node i.e. localhost), all 1 million records are received via the
socket stream and processed in batch of 3s.

However, I am unable to get this to work when submitted to a Spark Cluster of 2 nodes. The Spark job is submitted to the cluster successfully, the driver reports that socket connection is established with the SocketServer. The program goes inside the Spark streaming loop and keeps looking for data in the stream; however, no data is received on the stream. On the SocketServer, no client connection seems to have been established and it keeps blocking on the accept() call. Therefore, it doesn't get to simulate the 1 million records at all.

Spark Streaming driver program code snippet:

JavaDStream<String> lines = ssc.socketTextStream(config.hostname, Integer.parseInt(config.port));
System.out.println("Socket connection to read raw data established with server: " + config.hostname + ":" + config.port);

The above s.o.p is printed.

SocketServer Java program code snippet:

while (true) {
    Socket clientSock = serverSocket.accept();
    System.out.println("Connection accepted from Spark Streaming job: " + clientSock);

This s.o.p. is never printed when running the Spark Job on a cluster. The same works if cluster is: local[2]. I am stuck on how to debug this; any advice on why socketTextStream() doesn't work when using cluster?

Regards,
Vikram