You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Yun Tang (JIRA)" <ji...@apache.org> on 2019/02/14 15:57:00 UTC
[jira] [Updated] (FLINK-11618) [state] Refactor operator state
repartition mechanism
[ https://issues.apache.org/jira/browse/FLINK-11618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yun Tang updated FLINK-11618:
-----------------------------
Description:
Currently we have state assignment strategy of operator state below:
* When parallelism not changed:
** If we only have even-split redistributed state, state assignment would try to keep as the same as previously (actually not always the same).
** If we have union redistributed state, all the operator state would be redistributed as the new state assignment.
* When parallelism changed:
** all the operator state would be redistributed as the new state assignment.
There existed two problems *when parallelism not changed*:
# If we only have even-split redistributed state, current implementation actually cannot ensure state assignment to keep as the same as previously. This is because current {{StateAssignmentOperation#collectPartitionableStates}} would repartition {{managedOperatorStates}} without subtask-index information. Take an example, if we have a operator-state with parallelism as 2, and subtask-0's managed-operatorstate is empty while subtask-1 not. Although new parallelism still keeps as 2, after {{StateAssignmentOperation#collectPartitionableStates}} and state assigned, subtask-0 would be assigned the managed-operatorstate while subtask-1 got none.
# We should only redistribute union state and not touch the even-split state. Redistribute even-split state would cause unexpected behavior after {{RestartPipelinedRegionStrategy}} supported to restore state.
We should fix the above two problems and this issue is a prerequisite of FLINK-10712 and FLINK-10713 .
was:
Currently we have state assignment strategy of operator state below:
* When parallelism not changed:
** If we only have even-split redistributed state, state assignment would try to keep as the same as previously (actually not always the same).
** If we have union redistributed state, all the operator state would be redistributed as the new state assignment.
* When parallelism changed:
** all the operator state would be redistributed as the new state assignment.
There existed two problems *when parallelism not changed*:
# If we only have even-split redistributed state, current implementation actually cannot ensure state assignment to keep as the same as previously. This is because current {{StateAssignmentOperation#collectPartitionableStates}} would repartition {{managedOperatorStates}} without subtask-index information. Take and example, if we have a operator-state with parallelism as 2, and subtask-0's managed-operatorstate is empty while subtask-1 not. Although new parallelism still keeps as 2, after {{StateAssignmentOperation#collectPartitionableStates}}, subtask-0 would be assigned the managed-operatorstate but subtask-1 get none.
# We should only redistribute union state and not touch the even-split state. Redistribute even-split state would cause unexpected behavior after {{RestartPipelinedRegionStrategy}} supported to restore state.
We should fix the above two problems and this issue is a prerequisite of FLINK-10712 and FLINK-10713 .
> [state] Refactor operator state repartition mechanism
> -----------------------------------------------------
>
> Key: FLINK-11618
> URL: https://issues.apache.org/jira/browse/FLINK-11618
> Project: Flink
> Issue Type: Improvement
> Components: State Backends, Checkpointing
> Affects Versions: 1.7.0
> Reporter: Yun Tang
> Assignee: Yun Tang
> Priority: Major
> Fix For: 1.8.0
>
>
> Currently we have state assignment strategy of operator state below:
> * When parallelism not changed:
> ** If we only have even-split redistributed state, state assignment would try to keep as the same as previously (actually not always the same).
> ** If we have union redistributed state, all the operator state would be redistributed as the new state assignment.
> * When parallelism changed:
> ** all the operator state would be redistributed as the new state assignment.
> There existed two problems *when parallelism not changed*:
> # If we only have even-split redistributed state, current implementation actually cannot ensure state assignment to keep as the same as previously. This is because current {{StateAssignmentOperation#collectPartitionableStates}} would repartition {{managedOperatorStates}} without subtask-index information. Take an example, if we have a operator-state with parallelism as 2, and subtask-0's managed-operatorstate is empty while subtask-1 not. Although new parallelism still keeps as 2, after {{StateAssignmentOperation#collectPartitionableStates}} and state assigned, subtask-0 would be assigned the managed-operatorstate while subtask-1 got none.
> # We should only redistribute union state and not touch the even-split state. Redistribute even-split state would cause unexpected behavior after {{RestartPipelinedRegionStrategy}} supported to restore state.
> We should fix the above two problems and this issue is a prerequisite of FLINK-10712 and FLINK-10713 .
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)