You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@maven.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/02/02 14:12:00 UTC

[jira] [Commented] (MRESOLVER-250) Usage of descriptors map in DataPool prevents gargabe collection

    [ https://issues.apache.org/jira/browse/MRESOLVER-250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683435#comment-17683435 ] 

ASF GitHub Bot commented on MRESOLVER-250:
------------------------------------------

psiroky commented on PR #166:
URL: https://github.com/apache/maven-resolver/pull/166#issuecomment-1413808333

   I _believe_ this change is the cause behind the "slowness" I am seeing when comparing Maven 3.9.0 and 3.8.7. See https://lists.apache.org/thread/r9p236z8kvqqk7ykvkgmc5wgps6n1hkf for some numbers comparing couple of different builds on 3.9.0 and 3.8.7.
   
   From what I was able to observe (using async-profiler and JFR/JMC) the number of created instances of `org.eclipse.aether.artifact.DefaultArtifact` is about 4 times higher with 3.9.0 (and the logic in the constructor is somewhat "complex", merging maps, which takes time and creates quite a lot "garbage"). For small projects this is not really noticeable (e.g. if you create 1k or 4k instances that is not something to noticeable slow down the build). However, for big multi-module projects (like e.g. Quarkus which I am using for testing), the difference is ~2mil objects vs ~8mil objects and that adds up.
   
   I have tried to revert this commit (on current maven-resolver master, since it is basically the same as version 1.9.4 which is used in 3.9.0) and then build the 3.9.x branch with that version (1.9.5-SNAPSHOT) 

> Usage of descriptors map in DataPool prevents gargabe collection
> ----------------------------------------------------------------
>
>                 Key: MRESOLVER-250
>                 URL: https://issues.apache.org/jira/browse/MRESOLVER-250
>             Project: Maven Resolver
>          Issue Type: Bug
>          Components: Resolver
>    Affects Versions: 1.6.3
>         Environment: Linux, Java 11
>            Reporter: Frank Upgang
>            Assignee: Tamas Cservenak
>            Priority: Major
>             Fix For: 1.8.0
>
>         Attachments: image-2022-04-12-16-36-19-783.png, image-2022-04-12-16-36-49-903.png, image-2022-04-13-12-35-15-582.png
>
>
> While resolving a lot of rather huge dependency trees in my application I observed a high heap consumption by {_}org.eclipse.aether.internal.impl.collect.DataPool{_}.
> The _DataPool_ holds _Descriptor_ references in a {_}WeakHashMap{_}.
> Unfortunately, the key is indirectly referenced by the value - thus is is never eligible for garbage collection.
> The key is _an Artifact_ taken from the {_}ArtifactDescriptorRequest{_}.
> The value is an _ArtifactDescriptorResult_ which references the _Artifact_ (key) and the {_}ArtifactDescriptorRequest{_}.
> To fix this the value should be wrapped in a _WeakReference_ or {_}SoftReference{_}.
> This is what the _ObjectPools_ does which is used by the _DataPool_ for _Artifacts_ and {_}Dependencies{_}.
>  
> My use case is an application that indexes the dependency trees of all our services. Therefore there is a lot of dependency resolution when building the index.
> I implemented a workaround by wrapping the value in a WeakReference. The memory consumption is significantly lower afterwards as shown by the attached screenshots.
>  
> If desired, I would create a pull request, but I haven't set up a maven workspace yet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)