You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/12/10 15:44:01 UTC

[jira] [Commented] (FLINK-10205) Batch Job: InputSplit Fault tolerant for DataSourceTask

    [ https://issues.apache.org/jira/browse/FLINK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714953#comment-16714953 ] 

ASF GitHub Bot commented on FLINK-10205:
----------------------------------------

StefanRRichter commented on a change in pull request #6684:     [FLINK-10205] Batch Job: InputSplit Fault tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6684#discussion_r240253142
 
 

 ##########
 File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionVertex.java
 ##########
 @@ -590,6 +604,15 @@ public Execution resetForNewExecution(final long timestamp, final long originati
 
 				this.currentExecution = newExecution;
 
+				synchronized (this.inputSplits){
+					for (InputSplit split: this.inputSplits){
 
 Review comment:
   General formatting/style, this line (and others) lack some whitespaces, e.g. `split :` and `) {`. Please have another look at the formatting.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Batch Job: InputSplit Fault tolerant for DataSourceTask
> -------------------------------------------------------
>
>                 Key: FLINK-10205
>                 URL: https://issues.apache.org/jira/browse/FLINK-10205
>             Project: Flink
>          Issue Type: Sub-task
>          Components: JobManager
>    Affects Versions: 1.6.1, 1.6.2, 1.7.0
>            Reporter: JIN SUN
>            Assignee: JIN SUN
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.8.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Today DataSource Task pull InputSplits from JobManager to achieve better performance, however, when a DataSourceTask failed and rerun, it will not get the same splits as its previous version. this will introduce inconsistent result or even data corruption.
> Furthermore,  if there are two executions run at the same time (in batch scenario), this two executions should process same splits.
> we need to fix the issue to make the inputs of a DataSourceTask deterministic. The propose is save all splits into ExecutionVertex and DataSourceTask will pull split from there.
>  document:
> [https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)