You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Zhu Zhu (JIRA)" <ji...@apache.org> on 2019/04/09 04:54:00 UTC
[jira] [Created] (FLINK-12131) Resetting ExecutionVertex in region
failover may cause inconsistency of IntermediateResult status
Zhu Zhu created FLINK-12131:
-------------------------------
Summary: Resetting ExecutionVertex in region failover may cause inconsistency of IntermediateResult status
Key: FLINK-12131
URL: https://issues.apache.org/jira/browse/FLINK-12131
Project: Flink
Issue Type: Bug
Components: Runtime / Coordination
Affects Versions: 1.9.0
Reporter: Zhu Zhu
Assignee: Zhu Zhu
Currently the *IntermediateResult* status is only reset when its producer *ExecutionJobVertex* is reset.
When region failover strategy is enabled, the failed region vertices are reset through *ExecutionVertex.resetForNewExecution()*. The *numberOfRunningProducers* counter in
IntermediateResult, however, is not properly adjusted in this case.
So if a FINISHED vertex is restarted and finishes again, the counter may drop below 0.
Besides, the consumable property of the partition is not reset as well. This may lead to incorrect input state check result for lazy scheduling.
I'd propose to invoke *IntermediateResultPartition.resetForNewExecution()* in *ExecutionVertex.resetForNewExecution()* and reset the *numberOfRunningProducers* counter and *IntermediateResultPartition* there.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)