You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tathagata Das (JIRA)" <ji...@apache.org> on 2015/08/05 00:30:05 UTC

[jira] [Created] (SPARK-9619) Restarting the receiver's BlockGenerator does clear previous data

Tathagata Das created SPARK-9619:
------------------------------------

             Summary: Restarting the receiver's BlockGenerator does clear previous data
                 Key: SPARK-9619
                 URL: https://issues.apache.org/jira/browse/SPARK-9619
             Project: Spark
          Issue Type: Bug
          Components: Streaming
            Reporter: Tathagata Das
            Assignee: Tathagata Das
            Priority: Minor


The internal default block generator that is used by receivers gets reused across receiver restarts. This can lead to duplicate data. This is sort-of-okay as receivers really provide at-least once guarantee at best. Furthermore Reliable receivers like the ReliableKafkaReceiver, did not reuse BlockGenerator objects hence did not have the problem.

The solution is to ensure that the internal buffer of the BlockGenerator is cleared every time it is started.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org