You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tathagata Das (JIRA)" <ji...@apache.org> on 2016/06/09 18:07:21 UTC

[jira] [Commented] (SPARK-15842) Add support for socket stream.

    [ https://issues.apache.org/jira/browse/SPARK-15842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323006#comment-15323006 ] 

Tathagata Das commented on SPARK-15842:
---------------------------------------

This is slightly at odds with the fundamental design of structured streaming source. The semantics of such sources is that it should be able to exactly replay an arbitrary sequence of past data in the stream, using a range of offsets. This means that only streaming sources like Kafka and Kinesis (which have the concept of per-record offset) fit into this model. This is the assumption we have made to achieve end-to-end exactly-once guarantees. 

So a socket stream does not quite fit into this model. 

> Add support for socket stream.
> ------------------------------
>
>                 Key: SPARK-15842
>                 URL: https://issues.apache.org/jira/browse/SPARK-15842
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL, Streaming
>            Reporter: Prashant Sharma
>            Assignee: Prashant Sharma
>
> Streaming so far has offset based sources with all the available sources like file-source and memory-source that do not need additional capabilities to implement offset for any given range.
> Socket stream at OS level has a very tiny buffer. Many message queues have the ability to keep the message lingering until it is read by the receiver end. ZeroMQ is one such example. However in the case of socket stream, this is not supported. 
> The challenge here would be to implement a way to  buffer for a configurable amount of time and discuss strategies for overflow and underflow.
> This JIRA will form the basis for implementing sources which do not have native support for lingering a message for any amount of time until it is read. It deals with design doc if necessary and supporting code to implement such sources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org