You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 05:35:39 UTC

[jira] [Updated] (SPARK-5042) Updated Receiver API to make it easier to write reliable receivers that ack source

     [ https://issues.apache.org/jira/browse/SPARK-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-5042:
--------------------------------
    Labels: bulk-closed  (was: )

> Updated Receiver API to make it easier to write reliable receivers that ack source
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-5042
>                 URL: https://issues.apache.org/jira/browse/SPARK-5042
>             Project: Spark
>          Issue Type: Improvement
>          Components: DStreams
>            Reporter: Tathagata Das
>            Assignee: Tathagata Das
>            Priority: Major
>              Labels: bulk-closed
>
> Receivers in Spark Streaming receive data from different sources and push them into Spark’s block manager. However, the received records must be chunked into blocks before being pushed into the BlockManager. Related to this, the Receiver API provides two kinds of store() - 
> 1. store(single record) - The receiver implementation submits one record-at-a-time and the system takes care of dividing it into right sized blocks, and limiting the ingestion rates. In future, it should also be able to do automatic rate / flow control. However, there is no feedback to the receiver on when blocks are formed thus no way to ensure reliability guarantees. Overall, receivers using this are easy to implement.
> 2. store(multiple records)- The  receiver submits multiple records and that forms the blocks that are stored in the block manager. The receiver implementation has full control over block generation, which allows the receiver acknowledge source when blocks have been reliably received by BlockManager and/or WriteAheadLog. However, the implementation of the receivers will not get automatic block sizing and rate controlling; the developer will have to take care of that. All this adds to the complexity of the receiver implementation.
> So, to summarize, the (2) has the advantage of full control over block generation, but the users have to deal with the complexity of generating blocks of the right block size and rate control. 
> So we want to update this API such that it is becomes easier for developers to achieve reliable receiving of records without sacrificing automatic block sizing and rate control. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org