You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tathagata Das (JIRA)" <ji...@apache.org> on 2015/04/29 23:06:05 UTC

[jira] [Commented] (SPARK-7111) Add a tracker to track the direct (receiver-less) streams

    [ https://issues.apache.org/jira/browse/SPARK-7111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520241#comment-14520241 ] 

Tathagata Das commented on SPARK-7111:
--------------------------------------

My thought in general about this is that its not clear whether we should define a clear concrete Direct Stream (like ReceiverInputDStreams) in the core streaming system based on the one case of the Kafka. All subclasses of InputDStream is logically a "direct" stream as it does not use a receiver. This includes file stream, but its incredibly hard to get information like num records, etc. prior to executing the jobs and therefore cannot be really used like a direct stream. So I dont think this approach of introducing a separate tracker for "direct" streams in the streaming infra is right. However, we definitely want to expose the stream's input information like number of records, etc in the UI. I think there is a better design possible for this. I have been thinking about this since last night, and I will post something shortly. 


> Add a tracker to track the direct (receiver-less) streams
> ---------------------------------------------------------
>
>                 Key: SPARK-7111
>                 URL: https://issues.apache.org/jira/browse/SPARK-7111
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>            Reporter: Saisai Shao
>
> Currently for receiver-based input streams, Spark Streaming offers ReceiverTracker and ReceivedBlockTracker to track the status of receivers as well as block information. Also this status and block information can be retrieved from StreamingListener to expose to the users.
> But for direct-based (receiver-less) input streams, Current Spark Streaming lacks such mechanism to track the registered direct streams, also lacks the way to track the processed number of data for direct-based input streams.
> Here propose a mechanism to track the register direct stream, also expose the processing statistics to the BatchInfo and StreamingListener.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org