You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 05:35:33 UTC

[jira] [Updated] (SPARK-4174) Streaming: Optionally provide notifications to Receivers when DStream has been generated

     [ https://issues.apache.org/jira/browse/SPARK-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-4174:
--------------------------------
    Labels: bulk-closed  (was: )

> Streaming: Optionally provide notifications to Receivers when DStream has been generated
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-4174
>                 URL: https://issues.apache.org/jira/browse/SPARK-4174
>             Project: Spark
>          Issue Type: Improvement
>          Components: DStreams
>            Reporter: Hari Shreedharan
>            Assignee: Hari Shreedharan
>            Priority: Major
>              Labels: bulk-closed
>
> Receivers receiving data from Message Queues, like Active MQ, Kafka etc can replay messages if required. Using the HDFS WAL mechanism for such systems affects efficiency as we are incurring an unnecessary HDFS write when we can recover the data from the queue anyway.
> We can fix this by providing a notification to the receiver when the RDD is generated from the blocks. We need to consider the case where a receiver might fail before the RDD is generated and come back on a different executor when the RDD is generated. Either way, this is likely to cause duplicates and not data loss -- so we may be ok.
> I am thinking about something of the order of accepting a callback function which gets called when the RDD is generated. We can keep the function local in a map of batch id -> function, which gets called when the function gets generated (we can inform the ReceiverSupervisorImpl via Akka when the driver generates the RDD). Of course, just an early thought - I will work on a design doc for this one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org