You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gaurav Shah (JIRA)" <ji...@apache.org> on 2017/01/06 12:16:58 UTC

[jira] [Commented] (SPARK-9215) Implement WAL-free Kinesis receiver that give at-least once guarantee

    [ https://issues.apache.org/jira/browse/SPARK-9215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804425#comment-15804425 ] 

Gaurav Shah commented on SPARK-9215:
------------------------------------

[~tdas] I know this is an old pull request but was still wondering if you can help. I was wondering can we enhance this to make sure that we checkpoint only after blocks of data has been written. So we need to implement Spark checkpoint in the first place. Each block has a start and end seq number.

> Implement WAL-free Kinesis receiver that give at-least once guarantee
> ---------------------------------------------------------------------
>
>                 Key: SPARK-9215
>                 URL: https://issues.apache.org/jira/browse/SPARK-9215
>             Project: Spark
>          Issue Type: Improvement
>          Components: DStreams
>    Affects Versions: 1.4.1
>            Reporter: Tathagata Das
>            Assignee: Tathagata Das
>             Fix For: 1.5.0
>
>
> Currently, the KinesisReceiver can loose some data in the case of certain failures (receiver and driver failures). Using the write ahead logs can mitigate some of the problem, but it is not ideal because WALs dont work with S3 (eventually consistency, etc.) which is the most likely file system to be used in the EC2 environment. Hence, we have to take a different approach to improving reliability for Kinesis.
> Detailed design doc - https://docs.google.com/document/d/1k0dl270EnK7uExrsCE7jYw7PYx0YC935uBcxn3p0f58/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org