You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@maven.apache.org by "Michael Osipov (Jira)" <ji...@apache.org> on 2022/02/27 18:39:00 UTC

[jira] [Comment Edited] (MNG-7389) Incremental .m2 cache cleanup for CI

    [ https://issues.apache.org/jira/browse/MNG-7389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472498#comment-17472498 ] 

Michael Osipov edited comment on MNG-7389 at 2/27/22, 6:38 PM:
---------------------------------------------------------------

Yes, you need a multilayer cache which changes with the version of your project. Per-artifact notion of layer version and a version management for the project to purge the cache. Cloud is not a panacea. Sometimes a ProLiant for 5000 € is cheaper than fighting for CI minutes. I am now almost 100% certain that we will not deliver something like this because it is too specialized for the broad audience.


was (Author: michael-o):
Yes, you need a multilayer cache which changes with the version of your project. Per-artifact notion of layer version and a version management for the project to purge the cache. Cloud is not a panacea. Sometimes a ProLiant for 5000 € is cheaper than fighting for CI minutes. I am now almost 100% certain that we not deliver something like this because it is too specialized for the broad audience.

> Incremental .m2 cache cleanup for CI
> ------------------------------------
>
>                 Key: MNG-7389
>                 URL: https://issues.apache.org/jira/browse/MNG-7389
>             Project: Maven
>          Issue Type: New Feature
>          Components: Dependencies
>            Reporter: Thomas Skjølberg
>            Priority: Minor
>
> One or more popular continous integration are unable to properly manage the .m2 repository cache, resulting in wasted resources in the form of increased CI runtime and bandwidth consumption.
> *CircleCI cache behaviour:*
>  - immutable cache entries
>  - default behaviour is to wipe the cache each time a pom file is modified (i.e. using pom hash as a cache key)
>  - cache entries TTL > weeks
> So CircleCI always has a cache containing only the necessary artifacts, but has to download all dependencies every time the pom file changes.
> *Github Actions cache behaviour*
>  - (effectively) mutable cache entries
>  - incremental cache (if it gets too big, it is wiped).
>  - cache entries TTL 1 week
> So Github actions work well if the cache entries expire from time to time, otherwise the cache keeps growing.
> *Summary*
> Perhaps this does not look so bad at first glance, but for a project under active development, with a lot of artifacts, the pom file changes often. For example we have apps with 100 dependencies and automatic dependency bumping via Renovate, in addition to an hierarchy of libraries.
> Key takeaways; time is wasted
>  - saving caches in CI
>  - loading cache in CI
>  - loading artifacts from external artifact store
> This happens quite a lot. From the artifact store perspective, this probably multiplies the load by a factor of 10.
> Possible solution: A way to define a "transaction" for artifact use, i.e.
> 1. run command to mark start of transaction 
> 2. run one or more maven commands
> 3. run command to mark end of transaction, deleting artifacts not in use.
> For reference, Gradle has the same problem.
> Proof of concept:
>  * CircleCI : [https://github.com/entur/maven-orb]
>  * Github actions: [https://github.com/skjolber/tidy-cache-github-action]
> The implementation uses instrumentation to record artifact access, then delete the artifacts not recorded. 
> *Alternatives:*
> I did try the last-accessed file timestamp first, turns out most CI filesystems are mounted without that option. However it should also be possible to update the modified timestamp and/or add read access to some existing metadata file. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)