You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@maven.apache.org by wecai <cw...@126.com> on 2021/07/01 09:11:12 UTC

Apply exclusions later?

We have a large dependency which has 300+ transitive dependencies, let's name the dependency as BigDep1.


We have large numbers of libraries that depend on BigDep1. We may add exclusions when we use these libraries in our project.
<dependency>
<groupId>com.company...</groupId>
<artifactId>Lib1</artifactId>
<exclusion>
    <groupId>some_group_id</groupId>
    <artifactId>some_artifact_id</artifactId>
  </exclusion>
</dependency>  


It took long time and huge memory to buid the project, we saw the BigDep1 is resolved thousands of times without hit from memory cache...


By checking the code,  we can see Maven is trying to load the resolved result of BigDep1 from cache, but as debugged it always failed to get the cached result.
We can see the key is determined by GAV, repositories, childSelector, childManager, childTraverser, childFilter, this means exclusions is considered as part of the key.
https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L504
| Object key = |
| | args.pool.toKey( d.getArtifact(), childRepos, childSelector, childManager, childTraverser, childFilter ); |
| | |
| | List<DependencyNode> children = args.pool.getChildren( key ); |
| | if ( children == null ) => always null. need recalculate and again save to cache which takes long time and consumes large memory |
| | { |
| | args.pool.putChildren( key, child.getChildren() ); |
| | |
| | args.nodes.push( child ); |
| | |
| | process( args, results, descriptorResult.getDependencies(), childRepos, childSelector, childManager, |
| | childTraverser, childFilter ); |
| | |
| | args.nodes.pop(); |
| | } |


Let me use a simple pattern to describe the problem:


Lib1 -> BigDep1
Lib2 -> Lib3 (has exclusion) -> BigDep1
Lib4 -> Lib2
...


Now in our project, we use libraries: Lib1, Lib2 , Lib4 with exclusions.


Project -> Lib1
Project -> Lib2
Project -> Lib4 (has exclusion)


Here is how maven resolve the dependencies:
maven starts to resolve Lib1, Lib1 -> BigDep1. maven first resolves BigDep1 and caches BigDep1 in memory
maven starts to resolve Lib2, Lib2 -> Lib3 (has exclusion) -> BigDep1, as Lib3 has exclusion, so maven cannot load BigDep1 from cache and calculate BigDep1 again. 
maven starts to resolve Lib4 (has exclusion), Lib4 (has exclusion) -> Lib2 ->Lib3 -> BigDep1,  as Lib4 has exclusion, so maven cannot load Lib2, Lib3, BigDep1 from cache, all of them recalculated.


I'm thinking if we can use GAV as the cache key and apply the exclusions later. maven can resolve the dependencies in this way:
maven starts to resolve Lib1, maven first resolves BigDep1 and caches BigDep1 by using BigDep1's GAV as key.
maven starts to resolve Lib2, Lib2 -> Lib3 (has exclusion) -> BigDep1, maven get BigDep1 from cache, then calc Lib3 without applying exclusion and cache the result with Lib'3 GAV.
when maven comes to resolve Lib2, maven starts to apply Lib3's exclusion to Lib3, add Lib3 with exclusion as children of Lib2 and then cache Lib2. 
maven starts to resolve Lib4 (has exclusion), Lib4 (has exclusion) -> Lib2 ->Lib3 -> BigDep1,  maven get Lib2 from cache, then calc Lib4 without applying the exclusion and then cache Lib4.
when maven comes to resolve the current project, maven applies Lib4's exclusion, add Lib4 with exclusion as children of Project module, and then cache Project's resolved result. 


Does this make sense?


This means all libraries' resolved result are cached with its GAV.
Only the one which depends on it need to load the result from cache and apply exclusions if any.


Thanks,
Eric 

Re: Apply exclusions later?

Posted by Robert Scholte <rf...@apache.org>.
Dependency resolution is the heart of Maven and there are only a few that dare to touch that code.
Answering this is not that easy without good investigation.
I expect that simply removing the exclusions from the cache key will cause issues. 
It might require an extra abstraction layer where GAV are separated from meta info such as exclusions.
You could try to do a POC and verify it with the integration tests of both maven-artifact-resolver and Maven (= maven + maven-integration-testing) to see the impact and share the results.

thanks,
Robert
On 7-7-2021 07:19:33, Wei Cai <we...@ebay.com.invalid> wrote:
@Team,

Please share your insights for this issue.

On 2021/07/01 09:11:12, wecai wrote:
> We have a large dependency which has 300+ transitive dependencies, let's name the dependency as BigDep1.
>
>
> We have large numbers of libraries that depend on BigDep1. We may add exclusions when we use these libraries in our project.
>
> com.company...
> Lib1
>
> some_group_id
> some_artifact_id
>
>
>
>
> It took long time and huge memory to buid the project, we saw the BigDep1 is resolved thousands of times without hit from memory cache...
>
>
> By checking the code, we can see Maven is trying to load the resolved result of BigDep1 from cache, but as debugged it always failed to get the cached result.
> We can see the key is determined by GAV, repositories, childSelector, childManager, childTraverser, childFilter, this means exclusions is considered as part of the key.
> https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L504
> | Object key = |
> | | args.pool.toKey( d.getArtifact(), childRepos, childSelector, childManager, childTraverser, childFilter ); |
> | | |
> | | List children = args.pool.getChildren( key ); |
> | | if ( children == null ) => always null. need recalculate and again save to cache which takes long time and consumes large memory |
> | | { |
> | | args.pool.putChildren( key, child.getChildren() ); |
> | | |
> | | args.nodes.push( child ); |
> | | |
> | | process( args, results, descriptorResult.getDependencies(), childRepos, childSelector, childManager, |
> | | childTraverser, childFilter ); |
> | | |
> | | args.nodes.pop(); |
> | | } |
>
>
> Let me use a simple pattern to describe the problem:
>
>
> Lib1 -> BigDep1
> Lib2 -> Lib3 (has exclusion) -> BigDep1
> Lib4 -> Lib2
> ...
>
>
> Now in our project, we use libraries: Lib1, Lib2 , Lib4 with exclusions.
>
>
> Project -> Lib1
> Project -> Lib2
> Project -> Lib4 (has exclusion)
>
>
> Here is how maven resolve the dependencies:
> maven starts to resolve Lib1, Lib1 -> BigDep1. maven first resolves BigDep1 and caches BigDep1 in memory
> maven starts to resolve Lib2, Lib2 -> Lib3 (has exclusion) -> BigDep1, as Lib3 has exclusion, so maven cannot load BigDep1 from cache and calculate BigDep1 again.
> maven starts to resolve Lib4 (has exclusion), Lib4 (has exclusion) -> Lib2 ->Lib3 -> BigDep1, as Lib4 has exclusion, so maven cannot load Lib2, Lib3, BigDep1 from cache, all of them recalculated.
>
>
> I'm thinking if we can use GAV as the cache key and apply the exclusions later. maven can resolve the dependencies in this way:
> maven starts to resolve Lib1, maven first resolves BigDep1 and caches BigDep1 by using BigDep1's GAV as key.
> maven starts to resolve Lib2, Lib2 -> Lib3 (has exclusion) -> BigDep1, maven get BigDep1 from cache, then calc Lib3 without applying exclusion and cache the result with Lib'3 GAV.
> when maven comes to resolve Lib2, maven starts to apply Lib3's exclusion to Lib3, add Lib3 with exclusion as children of Lib2 and then cache Lib2.
> maven starts to resolve Lib4 (has exclusion), Lib4 (has exclusion) -> Lib2 ->Lib3 -> BigDep1, maven get Lib2 from cache, then calc Lib4 without applying the exclusion and then cache Lib4.
> when maven comes to resolve the current project, maven applies Lib4's exclusion, add Lib4 with exclusion as children of Project module, and then cache Project's resolved result.
>
>
> Does this make sense?
>
>
> This means all libraries' resolved result are cached with its GAV.
> Only the one which depends on it need to load the result from cache and apply exclusions if any.
>
>
> Thanks,
> Eric

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Apply exclusions later?

Posted by Wei Cai <we...@ebay.com.INVALID>.
@Team,

Please share your insights for this issue.

On 2021/07/01 09:11:12, wecai <cw...@126.com> wrote: 
> We have a large dependency which has 300+ transitive dependencies, let's name the dependency as BigDep1.
> 
> 
> We have large numbers of libraries that depend on BigDep1. We may add exclusions when we use these libraries in our project.
> <dependency>
> <groupId>com.company...</groupId>
> <artifactId>Lib1</artifactId>
> <exclusion>
>     <groupId>some_group_id</groupId>
>     <artifactId>some_artifact_id</artifactId>
>   </exclusion>
> </dependency>  
> 
> 
> It took long time and huge memory to buid the project, we saw the BigDep1 is resolved thousands of times without hit from memory cache...
> 
> 
> By checking the code,  we can see Maven is trying to load the resolved result of BigDep1 from cache, but as debugged it always failed to get the cached result.
> We can see the key is determined by GAV, repositories, childSelector, childManager, childTraverser, childFilter, this means exclusions is considered as part of the key.
> https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L504
> | Object key = |
> | | args.pool.toKey( d.getArtifact(), childRepos, childSelector, childManager, childTraverser, childFilter ); |
> | | |
> | | List<DependencyNode> children = args.pool.getChildren( key ); |
> | | if ( children == null ) => always null. need recalculate and again save to cache which takes long time and consumes large memory |
> | | { |
> | | args.pool.putChildren( key, child.getChildren() ); |
> | | |
> | | args.nodes.push( child ); |
> | | |
> | | process( args, results, descriptorResult.getDependencies(), childRepos, childSelector, childManager, |
> | | childTraverser, childFilter ); |
> | | |
> | | args.nodes.pop(); |
> | | } |
> 
> 
> Let me use a simple pattern to describe the problem:
> 
> 
> Lib1 -> BigDep1
> Lib2 -> Lib3 (has exclusion) -> BigDep1
> Lib4 -> Lib2
> ...
> 
> 
> Now in our project, we use libraries: Lib1, Lib2 , Lib4 with exclusions.
> 
> 
> Project -> Lib1
> Project -> Lib2
> Project -> Lib4 (has exclusion)
> 
> 
> Here is how maven resolve the dependencies:
> maven starts to resolve Lib1, Lib1 -> BigDep1. maven first resolves BigDep1 and caches BigDep1 in memory
> maven starts to resolve Lib2, Lib2 -> Lib3 (has exclusion) -> BigDep1, as Lib3 has exclusion, so maven cannot load BigDep1 from cache and calculate BigDep1 again. 
> maven starts to resolve Lib4 (has exclusion), Lib4 (has exclusion) -> Lib2 ->Lib3 -> BigDep1,  as Lib4 has exclusion, so maven cannot load Lib2, Lib3, BigDep1 from cache, all of them recalculated.
> 
> 
> I'm thinking if we can use GAV as the cache key and apply the exclusions later. maven can resolve the dependencies in this way:
> maven starts to resolve Lib1, maven first resolves BigDep1 and caches BigDep1 by using BigDep1's GAV as key.
> maven starts to resolve Lib2, Lib2 -> Lib3 (has exclusion) -> BigDep1, maven get BigDep1 from cache, then calc Lib3 without applying exclusion and cache the result with Lib'3 GAV.
> when maven comes to resolve Lib2, maven starts to apply Lib3's exclusion to Lib3, add Lib3 with exclusion as children of Lib2 and then cache Lib2. 
> maven starts to resolve Lib4 (has exclusion), Lib4 (has exclusion) -> Lib2 ->Lib3 -> BigDep1,  maven get Lib2 from cache, then calc Lib4 without applying the exclusion and then cache Lib4.
> when maven comes to resolve the current project, maven applies Lib4's exclusion, add Lib4 with exclusion as children of Project module, and then cache Project's resolved result. 
> 
> 
> Does this make sense?
> 
> 
> This means all libraries' resolved result are cached with its GAV.
> Only the one which depends on it need to load the result from cache and apply exclusions if any.
> 
> 
> Thanks,
> Eric 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org