You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/10/04 16:12:00 UTC

[jira] [Commented] (FLINK-10205) Batch Job: InputSplit Fault tolerant for DataSourceTask

    [ https://issues.apache.org/jira/browse/FLINK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638449#comment-16638449 ] 

ASF GitHub Bot commented on FLINK-10205:
----------------------------------------

StefanRRichter commented on issue #6684:     [FLINK-10205] Batch Job: InputSplit Fault tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6684#issuecomment-427078004
 
 
   @isunjin in the issue/design doc you are talking about potential data inconsistency/corruption that this PR is trying to fix. However, I wonder what sort of corruption you have in mind that is fixed here. Can you provide a concrete example of a problematic case? In my understanding, graph components are either connected and need a connected restart or they are independent and can recover fine-grained but then then it should also not matter in which order splits are reprocessed.
   
   Besides that, I wonder if the general approach is a good fit for the current and future architecture of this component. In particular, we pull the concern of `InputSplit` down to the level of `Executions`. `Execution` or `ExecutionJobVertex` are used in batch and streaming and to me it does not seem like a good step to introduce batch-specific code into those classes if we can avoid it. Another thing that I question here is if it would not make sense to think about a way that allows us also to release the assignment from an input split to a certain task, so that another task can pick it up in case that there is a longer lasting problem with the original task. Last, we are currently thinking about a general redesign of the source interface and how input is assigned to the source instances. @aljoscha has a WIP branch to experiment with the possible changes here https://github.com/aljoscha/flink/tree/refactor-source-interface, but we should keep in mind that sources might be split into two operators in the future.  

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Batch Job: InputSplit Fault tolerant for DataSourceTask
> -------------------------------------------------------
>
>                 Key: FLINK-10205
>                 URL: https://issues.apache.org/jira/browse/FLINK-10205
>             Project: Flink
>          Issue Type: Sub-task
>          Components: JobManager
>    Affects Versions: 1.6.1, 1.7.0, 1.6.2
>            Reporter: JIN SUN
>            Assignee: JIN SUN
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.7.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Today DataSource Task pull InputSplits from JobManager to achieve better performance, however, when a DataSourceTask failed and rerun, it will not get the same splits as its previous version. this will introduce inconsistent result or even data corruption.
> Furthermore,  if there are two executions run at the same time (in batch scenario), this two executions should process same splits.
> we need to fix the issue to make the inputs of a DataSourceTask deterministic. The propose is save all splits into ExecutionVertex and DataSourceTask will pull split from there.
>  document:
> [https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)