You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/02/26 16:24:00 UTC
[jira] [Commented] (FLINK-8790) Improve performance for recovery from incremental checkpoint

    [ https://issues.apache.org/jira/browse/FLINK-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377111#comment-16377111 ] 

ASF GitHub Bot commented on FLINK-8790:
---------------------------------------

GitHub user sihuazhou opened a pull request:

    https://github.com/apache/flink/pull/5582

    [FLINK-8790][State] Improve performance for recovery from incremental checkpoint

    ## What is the purpose of the change
    
    This PR fixes [FLINK-8790](https://issues.apache.org/jira/browse/FLINK-8790). When there are multi state handle to be restored, we can improve the performance as follow:
    
    - 1. Choose the best state handle to init the target db
    - 2. Use the other state handles to create tmp db, and clip the tmp db according to the target key group range (via rocksdb.deleteRange()), this can help use get rid of the `key group check` in 
    `data insertion loop` and also help us get rid of traversing the useless records.
    
    ## Brief change log
    
      - Improve the performance when restoring from multi state handles
    
    ## Verifying this change
    The changes can be verified by the exists tests and below unit test can also help to verify it.
     - RocksDBIncrementalCheckpointUtilsTest.java
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (no)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
      - The serializers: (no)
      - The runtime per-record code paths (performance sensitive): (no)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (no)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sihuazhou/flink improve_recovery_from_increment_checkpoint

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5582.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5582
    
----
commit ac4b479c1ca1e73f08b533ea63fb7bd4017f641b
Author: sihuazhou <su...@...>
Date:   2018-02-26T05:00:06Z

    Improve the recovery performance for incremental checkpoint when hasKey = true.

commit b2ea4c1c32b177e6b999a26f309162c7e3df81cb
Author: sihuazhou <su...@...>
Date:   2018-02-26T15:55:56Z

    add tests.

----


> Improve performance for recovery from incremental checkpoint
> ------------------------------------------------------------
>
>                 Key: FLINK-8790
>                 URL: https://issues.apache.org/jira/browse/FLINK-8790
>             Project: Flink
>          Issue Type: Improvement
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.5.0
>            Reporter: Sihua Zhou
>            Assignee: Sihua Zhou
>            Priority: Major
>             Fix For: 1.5.0
>
>
> When there are multi state handle to be restored, we can improve the performance as follow:
> 1. Choose the best state handle to init the target db
> 2. Use the other state handles to create temp db, and clip the db according to the target key group range (via rocksdb.deleteRange()), this can help use get rid of the `key group check` in 
>  `data insertion loop` and also help us get rid of traversing the useless record.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)