You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/06/30 15:20:00 UTC

[jira] [Commented] (FLINK-7057) move BLOB ref-counting from LibraryCacheManager to BlobCache

    [ https://issues.apache.org/jira/browse/FLINK-7057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070255#comment-16070255 ] 

ASF GitHub Bot commented on FLINK-7057:
---------------------------------------

GitHub user NicoK opened a pull request:

    https://github.com/apache/flink/pull/4238

    [FLINK-7057][blob] move BLOB ref-counting from LibraryCacheManager to BlobCache

    Currently, the `LibraryCacheManager` is doing some ref-counting for JAR files managed by it. Instead, we want the `BlobCache` to do that itself for **all** job-related BLOBs. Also, we do not want to operate on a per-BlobKey level but rather per job. Job-unrelated BLOBs should be cleaned manually as done for the Web-UI logs. A future API change will reflect the different use cases in a better way. For now, we need to also adapt the cleanup appropriately.
    
    On the `BlobServer`, the JAR files should remain locally as well as in the HA store until the job enters a final state. Then they can be deleted.
    
    With this intermediate state, job-unrelated BLOBs will remain in the file system until deleted manually. This is the same as the previous API use when working with a `BlobService` directly instead of going through the `LibraryCacheManager`. The aforementioned API extension will include TTL fields for those BLOBs in order to have a proper cleanup, too.
    
    This PR is based upon #4237  in a series to implement FLINK-6916.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/NicoK/flink flink-6916-7057

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4238.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4238
    
----
commit d54a316cfffd8243980df561fd4fcbd99934a40b
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2016-12-20T15:49:57Z

    [FLINK-6008][docs] minor improvements in the BlobService docs

commit b215515fa14d3f6af218e86b67bc2c27ae9d4f4f
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2016-12-20T17:27:13Z

    [FLINK-6008] refactor BlobCache#getURL() for cleaner code

commit bbcde52b3105fcf379c852b568f3893cc6052ce6
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2016-12-21T15:23:29Z

    [FLINK-6008] do not fail the BlobServer if delete fails
    
    also extend the delete tests and remove one code duplication

commit dda1a12e40027724efb0e50005e5b57058a220f0
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-01-06T17:42:58Z

    [FLINK-6008][docs] update some config options to the new, non-deprecated ones

commit e12c2348b237207a50649d515a0fbbd19f92e6a0
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-03-09T17:14:02Z

    [FLINK-6008] use Preconditions.checkArgument in BlobClient

commit 24060e01332c6df9fd01f1dc5f321c3fda9301c1
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-03-17T15:21:40Z

    [FLINK-6008] fix concurrent job directory creation
    
    also add according unit tests

commit 2e0d16ab8bf8a48a2d028602a3a7693fc4b76039
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-14T16:01:47Z

    [FLINK-6008] do not guard a delete() call with a check for existence

commit 7ba911d7ecb4861261dff8509996be0bd64d6d27
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-04-18T14:37:37Z

    [FLINK-6008] some comments about BlobLibraryCacheManager cleanup

commit d3f50d595f85356ae6ed0a85e1f8b8e8ac630bde
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-04-19T13:39:03Z

    [hotfix] minor typos

commit 79b6ce35a9e246b35415a388295f9ee2fc19a82e
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-04-19T14:10:16Z

    [FLINK-6008] further cleanup tests for BlobLibraryCacheManager

commit 23fb6ecd6c43c86d762503339c67953290236dca
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-30T14:03:16Z

    [FLINK-6008] address PR comments

commit 794764ceeed6b9bbbac08662f5754b218ff86c9c
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-16T08:51:04Z

    [FLINK-7052][blob] remove (unused) NAME_ADDRESSABLE mode

commit 774bafa85f242110a2ce7907c1150f8c62d73b3f
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-21T15:05:57Z

    [FLINK-7052][blob] remove further unused code due to the NAME_ADDRESSABLE removal

commit 4da3b3f6269e43bf1c66621099528824cad9373f
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-22T15:31:17Z

    [FLINK-7053][blob] remove code duplication in BlobClientSslTest
    
    This lets BlobClientSslTest extend BlobClientTest as most of its implementation
    came from there and was simply copied.

commit aa9cdc820f9ca1a38a19708bf45a2099e42eaf48
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-23T09:40:34Z

    [FLINK-7053][blob] verify some of the buffers returned by GET

commit c9b693a46053b55b3939ff471184796f12d36a72
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-23T10:04:10Z

    [FLINK-7053][blob] use TemporaryFolder for local BLOB dir in unit tests
    
    This replaces the use of some temporary directory where it is not guaranteed
    that it will be deleted after the test.

commit 11db399d5103d9ffe9083c9b6029a7e81afa9abe
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-21T12:45:31Z

    [FLINK-7054][blob] remove LibraryCacheManager#getFile()
    
    This was only used in tests where it is avoidable but if used anywhere else, it
    may have caused cleanup issues.

commit 4ae04b68453d4b099f752d6c6fd3c09335ede33a
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-21T14:14:15Z

    [FLINK-7055][blob] refactor getURL() to the more generic getFile()
    
    The fact that we always returned URL objects is a relic of the BlobServer's only
    use for URLClassLoader. Since we'd like to extend its use, returning File
    objects instead is more generic.

commit 8397d6aa5dc0aac07626d0af9ee3d8623dd7b60c
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-21T16:04:43Z

    [FLINK-7056][blob] add API to allow job-related BLOBs to be stored

commit 0a4c4e9bc483e4f1f885ef1e3b8feba40c057204
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-23T17:17:07Z

    [FLINK-7056][blob] refactor the new API for job-related BLOBs
    
    For a cleaner API, instead of having a nullable jobId parameter, use two methods:
    one for job-related BLOBs, another for job-unrelated ones.

commit 13fd7623d1aafd3e853e39071c650cbfda865649
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-27T10:14:08Z

    [FLINK-7012] remove user-JAR upload when disposing a savepoint the old way

commit 8331fbb208d975e0c1ec990344c14315ea08dd4a
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-27T16:29:44Z

    [FLINK-7057][blob] move ref-counting from the LibraryCacheManager to the BlobCache
    
    Also change from BlobKey-based ref-counting to job-based ref-counting which is
    simpler and the mode we want to use from now on. Deferred cleanup (as before)
    is currently not implemented yet (TODO).
    At the BlobServer, no ref-counting will be used but the cleanup will happen
    when the job enters a final state (TODO).

commit 0bc11d590c493ff2cdb8de63960c17f49ba5efb5
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-28T09:31:39Z

    [FLINK-7057][blob] change to a cleaner API for BlobService#registerJob()

commit e9a0d6893156ca818847d1b04519472111c3047d
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-28T12:09:11Z

    [FLINK-7057][blob] implement deferred cleanup at the BlobCache
    
    Whenever a job is not referenced at the BlobCache anymore, we set a TTL and let
    the cleanup task remove it when this is hit and the task is run. For now, this
    means that a BLOB will be retained at most
    (2 * ConfigConstants.LIBRARY_CACHE_MANAGER_CLEANUP_INTERVAL) seconds after not
    being referenced anymore. We do this so that a recovery still has the chance to
    use existing files rather than to download them again.

commit 0c3e8032634e722b432c484bdbf789d0244397b3
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-28T15:17:06Z

    [FLINK-7057][blob] integrate cleanup of job-related JARs from the BlobServer
    
    TODO: an integration test that verifies that this is actually done when desired
    and not performed when not, e.g. if the job did not reach a final execution
    state

commit 2d9f4cb5740f48edfaa95f94de93d0334e8c279d
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-30T12:52:19Z

    [FLINK-7057][tests] extract FailingBlockingInvokable from CoordinatorShutdownTest

commit b0cc398d40299acb1a3cddb81e64719fdb450459
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-30T12:56:14Z

    [FLINK-7057][blob] add an integration test for the BlobServer cleanup
    
    This ensures that BLOB files are actually deleted when a job enters a final
    state.

----


> move BLOB ref-counting from LibraryCacheManager to BlobCache
> ------------------------------------------------------------
>
>                 Key: FLINK-7057
>                 URL: https://issues.apache.org/jira/browse/FLINK-7057
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Distributed Coordination, Network
>    Affects Versions: 1.4.0
>            Reporter: Nico Kruber
>            Assignee: Nico Kruber
>
> Currently, the {{LibraryCacheManager}} is doing some ref-counting for JAR files managed by it. Instead, we want the {{BlobCache}} to do that itself for all job-related BLOBs. Also, we do not want to operate on a per-{{BlobKey}} level but rather per job. Therefore, the cleanup process should be adapted, too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)