You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "hustfxj (JIRA)" <ji...@apache.org> on 2016/12/04 15:05:58 UTC

[jira] [Commented] (SPARK-18706) Can spark support exactly once based kafka ? Due to these following question.

    [ https://issues.apache.org/jira/browse/SPARK-18706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15720087#comment-15720087 ] 

hustfxj commented on SPARK-18706:
---------------------------------

1. If a task complete the operation, it will notify driver. The driver may not receive the message due to the network, and think the task is still running. Then the child stage won't be scheduled ?
2. how do spark guarantee the downstream-task  can receive the shuffle-data completely. As fact, I can't find the checksum for blocks in spark. For example, the upstream-task may shuffle 100Mb data, but the downstream-task may receive 99Mb data due to network. Can spark verify the data is received completely based size ?

> Can spark  support exactly once based kafka ? Due to these following question.
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-18706
>                 URL: https://issues.apache.org/jira/browse/SPARK-18706
>             Project: Spark
>          Issue Type: Question
>            Reporter: hustfxj
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org