You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@maven.apache.org by "wei cai (Jira)" <ji...@apache.org> on 2021/11/27 05:40:00 UTC

[jira] [Updated] (MRESOLVER-228) Improve the maven dependency resolution speed by a skip & reconcile approach

     [ https://issues.apache.org/jira/browse/MRESOLVER-228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

wei cai updated MRESOLVER-228:
------------------------------
    Description: 
When comes to resolve the dependencies of an enterprise level project, the maven resolver is very slow to resolve the dependency graph/tree. Take one of our app as example, it could take 10minutes+ to print the result of mvn dependency:tree.

This is because there are many dependencies declared in the project, and some of the dependencies would introduce 600+ transitive dependencies, and exclusions are widely used for solve dependency conflicts. 

By checking the [code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500], we know the exclusion is also part of the cache key, when the exclusions up the tree differs, the cached resolution result for won't be picked up and need be recalculated. 

!Screen Shot 2021-11-27 at 12.58.26 PM.png!

From above figure, we know in 2nd case, the B and C has different exclusions and D need be recalculated, if D is a heavy dependency which introduce many transitive dependencies, all D and its children needs be recalculated.  Recalculating all of these nodes introduces issues:


 * Slow in dependency resolve
 * Lots of DependencyNodes cached (all calculated/recalculated nodes would be cached) will consume huge memory.

To improve the maven resolver's dependency resolution speed,  I implemented a skip & reconcile approach. Here is the *skip* part.

!Screen Shot 2021-11-27 at 12.58.59 PM.png!

From above figure, the 1st R is resolved at depth 3, and the 2nd R is resolved again because the depth is at 2 which is lower, the 3rd R and 4th R are both skipped as R is already resolved at depth 2 which is lower in depth. This is because the same node with deeper depth is most likely won't be picked up by maven as maven adopts "{*}nearest{*} transitive dependency in the tree depth and the *first* in resolution" strategy.

The 3rd R and 4th R will have children set as 0 and marked as skipped by another dependency code.

 

Here is the *reconcile* part:

!Screen Shot 2021-11-27 at 12.59.32 PM.png!

From above figure, when there are dependency conflicts, some of the skipped nodes need to be reconciled.

There are 4 tree paths.
 * Here D1 (D with version 1) in the 1st tree path is first get resolved, based on the D1, children of E and R at depth 3 are then cached.
 * In the 2nd path, when resolving E & R of H, we simply skip these 2 nodes as they are in deeper depth (depth: 4) than the E & R in 1st tree path.
 * In the 3rd tree path, a R node with lower path is resolved, and a E node at depth 5 is skipped.
 * Then to resolve the 4th path, a D2 (D with version 2) node is resolved, as the depth is lower than D1, so maven will pick D2, this means the E & R's children cached in tree depth 1 should be discarded. So we need to reconcile the necessary E & R nodes in 2nd, 3rd and 4th tree paths.

Here only E in 2nd tree path needs to be reconciled. This is because:
 * R in 3rd tree path won't be picked up as there is a R in 2nd tree path with a lower depth.
 * E in 3rd tree path won't be picked as it is enough to reconcile the E in 2nd tree path as the E in 2nd tree path is deeper than E in 3rd tree path.

 

After we enabled the resolver patch in maven, we are seeing 10% ~70% build time reduced for different projects depend on how complex the dependencies are, and the result of *mvn dependency:tree* and *mvn dependency:list* remain the same.

We've verified the resolver performance patch leveraging an automation solution to certify 2000+ apps in our company by comparing the  *mvn dependency:tree* and *mvn dependency:list* result with/without the performance patch.

Please help review the PR.
[https://github.com/apache/maven-resolver/pull/136]

 

 

 

  was:
When comes to resolve the dependencies of an enterprise level project, the maven resolver is very slow to resolve the dependency graph/tree. Take one of our app as example, it could take 10minutes+ to print the result of mvn dependency:tree.

This is because the exclusion is considered as part of the cache key.

!Screen Shot 2021-11-27 at 12.58.26 PM.png!

To improve the maven resolver's dependency resolution speed,  I implemented a skip & reconcile approach. Here is the *skip* part.

!Screen Shot 2021-11-27 at 12.58.59 PM.png!

 

Here is the *reconcile* part:

!Screen Shot 2021-11-27 at 12.59.32 PM.png!

 

After we enabled the resolver patch in maven, we are seeing 10% ~70% build time reduced, and besides the result mvn dependency:tree and mvn dependency:list remain the same.

We've verified the resolver performance patch by dryrun 2000+ apps in our company by comparing the  mvn dependency:tree and mvn dependency:list result with/without the performance patch.

Please help review the PR.

 

 


> Improve the maven dependency resolution speed by a skip & reconcile approach
> ----------------------------------------------------------------------------
>
>                 Key: MRESOLVER-228
>                 URL: https://issues.apache.org/jira/browse/MRESOLVER-228
>             Project: Maven Resolver
>          Issue Type: Improvement
>          Components: Resolver
>    Affects Versions: 1.7.2
>            Reporter: wei cai
>            Priority: Major
>         Attachments: Screen Shot 2021-11-27 at 12.58.26 PM.png, Screen Shot 2021-11-27 at 12.58.59 PM.png, Screen Shot 2021-11-27 at 12.59.32 PM.png
>
>
> When comes to resolve the dependencies of an enterprise level project, the maven resolver is very slow to resolve the dependency graph/tree. Take one of our app as example, it could take 10minutes+ to print the result of mvn dependency:tree.
> This is because there are many dependencies declared in the project, and some of the dependencies would introduce 600+ transitive dependencies, and exclusions are widely used for solve dependency conflicts. 
> By checking the [code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500], we know the exclusion is also part of the cache key, when the exclusions up the tree differs, the cached resolution result for won't be picked up and need be recalculated. 
> !Screen Shot 2021-11-27 at 12.58.26 PM.png!
> From above figure, we know in 2nd case, the B and C has different exclusions and D need be recalculated, if D is a heavy dependency which introduce many transitive dependencies, all D and its children needs be recalculated.  Recalculating all of these nodes introduces issues:
>  * Slow in dependency resolve
>  * Lots of DependencyNodes cached (all calculated/recalculated nodes would be cached) will consume huge memory.
> To improve the maven resolver's dependency resolution speed,  I implemented a skip & reconcile approach. Here is the *skip* part.
> !Screen Shot 2021-11-27 at 12.58.59 PM.png!
> From above figure, the 1st R is resolved at depth 3, and the 2nd R is resolved again because the depth is at 2 which is lower, the 3rd R and 4th R are both skipped as R is already resolved at depth 2 which is lower in depth. This is because the same node with deeper depth is most likely won't be picked up by maven as maven adopts "{*}nearest{*} transitive dependency in the tree depth and the *first* in resolution" strategy.
> The 3rd R and 4th R will have children set as 0 and marked as skipped by another dependency code.
>  
> Here is the *reconcile* part:
> !Screen Shot 2021-11-27 at 12.59.32 PM.png!
> From above figure, when there are dependency conflicts, some of the skipped nodes need to be reconciled.
> There are 4 tree paths.
>  * Here D1 (D with version 1) in the 1st tree path is first get resolved, based on the D1, children of E and R at depth 3 are then cached.
>  * In the 2nd path, when resolving E & R of H, we simply skip these 2 nodes as they are in deeper depth (depth: 4) than the E & R in 1st tree path.
>  * In the 3rd tree path, a R node with lower path is resolved, and a E node at depth 5 is skipped.
>  * Then to resolve the 4th path, a D2 (D with version 2) node is resolved, as the depth is lower than D1, so maven will pick D2, this means the E & R's children cached in tree depth 1 should be discarded. So we need to reconcile the necessary E & R nodes in 2nd, 3rd and 4th tree paths.
> Here only E in 2nd tree path needs to be reconciled. This is because:
>  * R in 3rd tree path won't be picked up as there is a R in 2nd tree path with a lower depth.
>  * E in 3rd tree path won't be picked as it is enough to reconcile the E in 2nd tree path as the E in 2nd tree path is deeper than E in 3rd tree path.
>  
> After we enabled the resolver patch in maven, we are seeing 10% ~70% build time reduced for different projects depend on how complex the dependencies are, and the result of *mvn dependency:tree* and *mvn dependency:list* remain the same.
> We've verified the resolver performance patch leveraging an automation solution to certify 2000+ apps in our company by comparing the  *mvn dependency:tree* and *mvn dependency:list* result with/without the performance patch.
> Please help review the PR.
> [https://github.com/apache/maven-resolver/pull/136]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)