You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/05/27 10:16:44 UTC

[GitHub] [flink] rkhachatryan edited a comment on pull request #12292: [FLINK-17861][task][checkpointing] Split channel state handles sent to JM

rkhachatryan edited a comment on pull request #12292:
URL: https://github.com/apache/flink/pull/12292#issuecomment-633727252


   Thanks for the feedback @pnowojski ,
   
   I've addressed the issues (except [this one](https://github.com/apache/flink/pull/12292#discussion_r429887132)).
   
   Answering your question:
   > Could you elaborate a bit more? What's the alternative? How would it avoid more data duplication? Are we still duplicating data with this PR?
   
   Current structure is the following (this PR doesn't change it):
   ```
   Each subtask reports to JM TaskStateSnapshot, 
       each with zero ore more OperatorSubtaskState,
           each with zero or more InputChannelStateHandle and ResultSubpartitionStateHandle
               each referencing an underlying StreamStateHandle
   ```
   The underlying `StreamStateHandle` duplicates filename (`ByteStreamStateHandle` has it too at least because of `equals/hashcode` I guess).
   
   An alternative would be something like 
   ```
   Each subtask reports to JM TaskStateSnapshot, 
       each with zero ore more OperatorSubtaskState,
           each with zero or one StreamStateHandle (for channel state)
           each with zero or more InputChannelStateHandle and ResultSubpartitionStateHandle
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org