You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by "Yu Li (JIRA)" <ji...@apache.org> on 2019/05/31 13:42:00 UTC

[jira] [Created] (FLINK-12699) Reduce CPU consumption when snapshot/restore the spilled key-group

Yu Li created FLINK-12699:
-----------------------------

             Summary: Reduce CPU consumption when snapshot/restore the spilled key-group
                 Key: FLINK-12699
                 URL: https://issues.apache.org/jira/browse/FLINK-12699
             Project: Flink
          Issue Type: Sub-task
          Components: Runtime / State Backends
            Reporter: Yu Li
            Assignee: Yu Li


We need to prevent the unnecessary de/serialization when snapshotting/restoring the spilled state key-group. To achieve this, we need to:
1. Add meta information for {{HeapKeyedStatebackend}} checkpoint on DFS, separating the on-heap and on-disk part
2. Write the off-heap bytes directly to DFS when checkpointing and mark it as on-disk
3. Directly write the bytes onto disk when restoring the data back from DFS, if it's marked as on-disk

Notice that we cannot directly use file copy since we use mmap meanwhile support copy-on-write.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)