You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by NicoK <gi...@git.apache.org> on 2017/07/18 14:07:20 UTC

[GitHub] flink pull request #4359: [FLINK-7140][blob] add an additional random compon...

GitHub user NicoK opened a pull request:

    https://github.com/apache/flink/pull/4359

    [FLINK-7140][blob] add an additional random component into the BlobKey

    This should guard us from uploading (and deleting) the same file more than once and also from hash collisions.
    
    This PR is based upon #4358 in a series to implement FLINK-6916.
    
    - [X] General
      - The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
      - The pull request addresses only one issue
      - Each commit in the PR has a meaningful commit message (including the JIRA id)
    
    - [X] Documentation
      - Documentation has been added for new functionality
      - Old documentation affected by the pull request has been updated
      - JavaDoc for public methods has been added
    
    - [X] Tests & Build
      - Functionality added by the pull request is covered by tests
      - `mvn clean verify` has been executed successfully locally or a Travis build has passed


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/NicoK/flink flink-6916-7140

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4359.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4359
    
----
commit 1f86efd013a81e84ba1556de0a04e4ac70229f79
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-22T15:31:17Z

    [FLINK-7053][blob] remove code duplication in BlobClientSslTest
    
    This lets BlobClientSslTest extend BlobClientTest as most of its implementation
    came from there and was simply copied.

commit ff083e850edb8f5f383f54f19519116e10308d61
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-23T09:40:34Z

    [FLINK-7053][blob] verify some of the buffers returned by GET

commit e138569019f5c75b70a21085e4829cb6cd1e93bc
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-23T10:04:10Z

    [FLINK-7053][blob] use TemporaryFolder for local BLOB dir in unit tests
    
    This replaces the use of some temporary directory where it is not guaranteed
    that it will be deleted after the test.

commit 70f06d9b38c5d8afb50b6c18b7b3a4e07e2bb3da
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-21T12:45:31Z

    [FLINK-7054][blob] remove LibraryCacheManager#getFile()
    
    This was only used in tests where it is avoidable but if used anywhere else, it
    may have caused cleanup issues.

commit 5463f65aeb3ed519b51ccc78abfcc003bf02b3f8
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-07-16T21:13:56Z

    [FLINK-7054][hotfix] fix a checkstyle error

commit 12052c0b8c8b869f9b6bd6f9d53c9ed6e362c631
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-21T14:14:15Z

    [FLINK-7055][blob] refactor getURL() to the more generic getFile()
    
    The fact that we always returned URL objects is a relic of the BlobServer's only
    use for URLClassLoader. Since we'd like to extend its use, returning File
    objects instead is more generic.

commit 9dd7ba652ff1b09fb5bf44e8b6b5b885a56873ab
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-21T16:04:43Z

    [FLINK-7056][blob] add API to allow job-related BLOBs to be stored

commit 90f779ba66728979977bd794e8f68b207471dbc2
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-23T17:17:07Z

    [FLINK-7056][blob] refactor the new API for job-related BLOBs
    
    For a cleaner API, instead of having a nullable jobId parameter, use two methods:
    one for job-related BLOBs, another for job-unrelated ones.

commit 6d1fda80c6f9869676030fd86177bbc7e981b1f9
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-07-16T21:22:12Z

    [FLINK-7056][hotfix] fix a checkstyle error

commit cdbb0fb32052c8163bc83df151a0c8354846e4dc
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-27T16:29:44Z

    [FLINK-7057][blob] move ref-counting from the LibraryCacheManager to the BlobCache
    
    Also change from BlobKey-based ref-counting to job-based ref-counting which is
    simpler and the mode we want to use from now on. Deferred cleanup (as before)
    is currently not implemented yet (TODO).
    At the BlobServer, no ref-counting will be used but the cleanup will happen
    when the job enters a final state (TODO).

commit 8617c9e8ffbdece8d02815ef71dc09f2d764453a
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-28T09:31:39Z

    [FLINK-7057][blob] change to a cleaner API for BlobService#registerJob()

commit 20cf9bf1fd2b382b6d45dc7b15c7be9bed7f5f16
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-28T12:09:11Z

    [FLINK-7057][blob] implement deferred cleanup at the BlobCache
    
    Whenever a job is not referenced at the BlobCache anymore, we set a TTL and let
    the cleanup task remove it when this is hit and the task is run. For now, this
    means that a BLOB will be retained at most
    (2 * ConfigConstants.LIBRARY_CACHE_MANAGER_CLEANUP_INTERVAL) seconds after not
    being referenced anymore. We do this so that a recovery still has the chance to
    use existing files rather than to download them again.

commit 270f3957b6d5e9061d528158a86eb1837ad2786c
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-28T15:17:06Z

    [FLINK-7057][blob] integrate cleanup of job-related JARs from the BlobServer
    
    TODO: an integration test that verifies that this is actually done when desired
    and not performed when not, e.g. if the job did not reach a final execution
    state

commit 39300a32c90c82f8044cfd6d5fcaf1b83b250c87
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-30T12:52:19Z

    [FLINK-7057][tests] extract FailingBlockingInvokable from CoordinatorShutdownTest

commit 56113349799669326074d269d6c07bb19bd45788
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-06-30T12:56:14Z

    [FLINK-7057][blob] add an integration test for the BlobServer cleanup
    
    This ensures that BLOB files are actually deleted when a job enters a final
    state.

commit 93861f1f624c5b514fcdd041b592e0c40c66d103
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-07-03T09:25:29Z

    [FLINK-7057][tests] refrain from catching an exception just to fail the test
    
    removes code like this in the BLOB store unit tests:
    
    catch (Exception e) {
        e.printStackTrace();
        fail(e.getMessage());
    }

commit 80a05142a68f8a3d0aaf1d447ab87a1e506364a4
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-07-03T11:45:33Z

    [FLINK-7057][blob] fix BlobServer#cleanupJob() being too eager
    
    Instead of deleting the job's directory, it was deleting the parent storage
    directory.

commit de4ca2a7c676e7e1a82172ce680ae5a5f6693df4
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-07-03T15:03:50Z

    [FLINK-7057][blob] fix BlobServer cleanup integration
    
    * the test did not check the correct directories for cleanup
    * the test did not honour the test timeout

commit dc58fa5d5ffc299b493dc1a78f26aa1914036151
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-07-03T15:11:18Z

    [FLINK-7057][blob] test and fix BlobServer cleanup for a failed job submission

commit d1de87bfd34c5bb1f2dde0402bcceac6013c150b
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-07-04T07:55:29Z

    [FLINK-7057][blob] rework the LibraryCacheManager API
    
    Since ref-counting has moved to the BlobCache, the BlobLibraryCacheManager is
    just a thin wrapper to get a user class loader by retrieving BLOBs from the
    BlobCache/BlobServer. Therefore, move the job-registration/-release out of it,
    too, and restrict its use to the task manager where the BlobCache is used (on
    the BlobServer, jobs do not need registration since they are only used once and
    will be deleted when they enter a final state).
    
    This makes the BlobServer and BlobCache instances available at the JobManager
    and TaskManager instances, respectively, also enabling future use cases outside
    of the LibraryCacheManager.

commit b62f1f640e33126523eaaa7bd51f0b24962c67fd
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-07-04T09:50:07Z

    [hotfix] increase Scala checkstyle maxParameters to 20

commit 5e9a9a7cd779a8e0d28947bcfa64cc16c02857f2
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-07-04T09:51:28Z

    [FLINK-7057][blob] address PR comments

commit 5fa960676281fc1473105c849fbce9cb22cacacb
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-07-04T12:41:18Z

    [FLINK-7057][blob] fix JobManagerLeaderElectionTest

commit a6ca6812af7b4130e77775fd3896f3f44a357334
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-07-04T20:49:12Z

    [FLINK-7057][blob] re-introduce some ref-counting for BlobLibraryCacheManager
    
    Apparently, we do need to return the same ClassLoader for different (parallel)
    tasks of a job running on the same task manager. Therefore, keep the initial
    task registration implementation that was removed with
    8331fbb208d975e0c1ec990344c14315ea08dd4a and only adapt it here. This also
    restores some tests and adds new combinations not tested before.

commit da1be581529813ec359f1ac09a00bc564a3812dd
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-07-06T07:38:17Z

    [FLINK-7057][blob] address PR comments

commit b02807e75cea960fd2c89e49656be2e6f6e32394
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-07-10T11:40:00Z

    [FLINK-7057][tests] fix (manual/ignored) BlobCacheCleanupTest#testJobDeferredCleanup()

commit 0f9f7d7f90497708cd2589b34573f10be0b9d331
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-07-16T21:30:43Z

    [FLINK-7057][hotfix] fix a checkstyle error

commit 742488145b6e5ae94f29371915b2d5b08e4ff655
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-07-11T09:19:20Z

    [FLINK-7068][blob] start introducing a new BLOB storage abstraction
    
    This is incomplete and may not compile and/or run tests successfully yet.

commit de88cede820418628779957af76a9d49b535c26b
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-07-11T10:29:38Z

    [FLINK-7068][blob] remove BlobView from TransientBlobCache
    
    The transient BLOB cache is not supposed to work with the HA store since it only
    serves non-HA files.

commit 99edd8966e3032110be8d2e240440da13dd2f655
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-07-11T13:04:00Z

    [FLINK-7068][blob] remove unnecessary use of BlobClient

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4359: [FLINK-7140][blob] add an additional random compon...

Posted by greghogan <gi...@git.apache.org>.
Github user greghogan commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4359#discussion_r128129730
  
    --- Diff: flink-core/src/main/java/org/apache/flink/util/AbstractID.java ---
    @@ -46,7 +46,7 @@
     	/** The lower part of the actual ID */
     	protected final long lowerPart;
     
    -	/** The memoized value returned by toString() */
    +	/** The memorized value returned by toString() */
    --- End diff --
    
    Oh, hey, that was me. I was just thinking about this the other day and how the JobManager used to bottleneck for large jobs when logging and converting these `AbstractID` to string. Anyway, this original should be the correct spelling: https://en.wikipedia.org/wiki/Memoization


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4359: [FLINK-7140][blob] add an additional random compon...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/4359


---

[GitHub] flink pull request #4359: [FLINK-7140][blob] add an additional random compon...

Posted by NicoK <gi...@git.apache.org>.
Github user NicoK commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4359#discussion_r128165494
  
    --- Diff: flink-core/src/main/java/org/apache/flink/util/AbstractID.java ---
    @@ -46,7 +46,7 @@
     	/** The lower part of the actual ID */
     	protected final long lowerPart;
     
    -	/** The memoized value returned by toString() */
    +	/** The memorized value returned by toString() */
    --- End diff --
    
    you're right...learned something today :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4359: [FLINK-7140][blob] add an additional random component int...

Posted by NicoK <gi...@git.apache.org>.
Github user NicoK commented on the issue:

    https://github.com/apache/flink/pull/4359
  
    rebased onto the latest #4358 


---