You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@maven.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/05/08 06:22:00 UTC

[jira] [Commented] (MRESOLVER-247) Avoid unnecessary dependency resolution by a Skip solution based on BFS

    [ https://issues.apache.org/jira/browse/MRESOLVER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533415#comment-17533415 ] 

ASF GitHub Bot commented on MRESOLVER-247:
------------------------------------------

caiwei-ebay2 commented on PR #158:
URL: https://github.com/apache/maven-resolver/pull/158#issuecomment-1120358732

   @michael-o @cstamas @weibo1995 @ibabiankou 
   
   Echoed weibo1995's feedback in https://github.com/apache/maven-resolver/pull/172
   
   > At present, 5 or 6 long tail applications have been piloted, and the compilation time can be reduced by half, that is, about 100% performance gain. Later, we will use BF for more applications, and then feedback and communicate.
   thanks.
   
   Really glad that this patch could help more developers.
   
   Also I'm currently working on parallel pom downloading, the idea is a bit different than https://github.com/ibabiankou/maven-resolver/pull/2/files:
   
   1. only parallelize VersionRange and ArtifactDesciptor resolutioon part (poms & metadata.xml downloading), please check resolveArtifactDescriptorAsync for more details.
   
   Here is the commit for preview and still evolving, I will certify the approach by dry-running large numbers of apps in our company, please share your comments with the idea.
   https://github.com/parallel-pom-downloading/maven-resolver/commit/2c675527666f7bc0cec9543d7b3b1fd916c6fe0d
   
   Note: I'm resetting the 2FA auth of original "caiwei-ebay" account and unable to fire PR with "caiwei-ebay2" (getting message saying: **It looks like this is your first time opening a pull request in this project!**).  Will fire draft PR once 2FA auth is reset and we can discuss there later,
   
   




> Avoid unnecessary dependency resolution by a Skip solution based on BFS
> -----------------------------------------------------------------------
>
>                 Key: MRESOLVER-247
>                 URL: https://issues.apache.org/jira/browse/MRESOLVER-247
>             Project: Maven Resolver
>          Issue Type: Improvement
>          Components: Resolver
>    Affects Versions: 1.7.3
>            Reporter: wei cai
>            Assignee: Michael Osipov
>            Priority: Major
>             Fix For: 1.8.0
>
>         Attachments: exclusion-matters.png, skip-duplicate.png, skip-version-conflicts.png
>
>
> h1. *Background*
> This Jira is related with MRESOLVER-240 and MRESOLVER-228
> There were discussions about the DFS or BFS algorithm for maven resolver in MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much easier to implement. Here is the plan for multiple changes requested recently:
>  * DFS > BFS - preparation for parallel download
>  * Skip approach - avoid unnecessary version resolution (Covered in this JIRA)
>  * Download descriptors in parallel (MRESOLVER-7)
> h1. *The phenomenon*
> When comes to resolve the huge amount of dependencies of an enterprise level project, the maven resolver is very slow to resolve the dependency graph/tree. Take one of our app as example, it could take *10minutes+ and 16G memory* to print out the result of {*}mvn dependency:tree{*}. 
> This is because there are many dependencies declared in the project, and some of the dependencies would introduce *600+* transitive dependencies, and exclusions are widely used to solve dependency conflicts. 
> By checking the [code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500], we know the exclusion is also part of the cache key. This means when the exclusions up the tree differs, the cached resolution result for the same GAV won't be picked up and need s to be recalculated. 
>  
> !exclusion-matters.png!
>  
> From above figure, we know:
>  * In 1st case, D will be resolved only once as there are no exclusions/same exclusions up the tree.
>  * In 2nd case, the B and C have different exclusions and D needs to be recalculated, if D is a heavy dependency which introduce many transitive dependencies, all D and its children (D & E in this case) need to be recalculated.  Recalculating all of these nodes introduces 2 issues:
>  ** Slow in resolving dependencies.
>  ** Lots of DependencyNodes cached (all calculated/recalculated nodes would be cached) and will consume huge memory.
> h1. Solution
> To improve the speed of maven resolver's dependency resolution,  I implemented a skip approach to avoid unnecessary dependency resolution.
> h2. *CASE 1: Skip duplicate node (omitted duplicate case in dependency tree)*
> !skip-duplicate.png!
> From above figure:
>  * The R#3 is resolved at depth 2, in the BFS solution, we already know R#3 is the winner.
>  * The R#5 won't be the winner, however here we force resolved this node as it is more left than the last resolved R#3 node. This is because maven picks up widest scope present among conflicting dependencies (scopes of the conflicts may differ), we need to *retain the conflict paths* by using a strategy like:
>  ** R#3 locates in coordinate (2 - depth, 2 - sequence in given depth) while R#5 locates in (3,1), R#5 is more left than R#3, so we need to force resolve R#5.
>  ** If there is a R#6 which is more left than R#5, then need to force resolve R#6.
>  * The R#8 is at depth 3 and the R#12 at depth 4 are simply skipped as R#3 is already resolved at depth 2. This is because the same node with deeper depth won't be picked up by maven as maven employs a "nearest transitive dependency in the tree depth and the first in resolution" strategy.
> h2. *CASE 2: Skip version conflict (omitted conflict case in dependency tree)*
> *!skip-version-conflicts.png!*
> In above figure.
>  * The D1 (D with version 1, #4) is resolved, in the BFS solution, we already know D1 is the winner
>  * When comes to resolve D2 (D with version 2, after #4), we know D2 is having a different version and it conflicts with D1, D2 will be skipped as it won't be picked up by maven,  all D2's children *won't be resolved* then. 
> After we enabled the resolver patch in maven, we are seeing 10% ~70% build time reduced for different projects depend on how complex the dependencies are, and the result of *mvn dependency:tree* and *mvn dependency:list* remain the same.
> We've verified the resolver performance patch leveraging an automation solution to certify 2000+ apps of our company by comparing the  *mvn dependency:tree* and *mvn dependency:list* result with/without the performance patch.
> Please help review the PR.
> https://github.com/apache/maven-resolver/pull/158
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)