You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@maven.apache.org by "Joseph Leonard (Jira)" <ji...@apache.org> on 2024/04/15 09:43:00 UTC
[jira] [Updated] (MNG-8096) Inconsistent dependency resolution behaviour for concurrent multi-module build can cause failures

     [ https://issues.apache.org/jira/browse/MNG-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joseph Leonard updated MNG-8096:
--------------------------------
    Description: 
h2. tl;dr

Maven can resolve dependencies either from:
 * an external repo
 * a class directory of a module being built within the reactor
 * a packaged jar of a module being built within the reactor

If you run a concurrent multi-module build it is possible to get a race condition whereby the build of module _Foo_ may resolve module _Bar_ from either of the three resolution {_}channels{_}. This inconsistency can result in the Maven _war_ plugin sometimes failing to build a functional war file. I would expect a consistent resolution would always take place.
h2. Full details
h4. Scenario

Consider you have a repo with the following structure:
{noformat}
                       App
                     /     \
                    /       \
       (compile scope)      (test scope)
                  /           \
                \/_           _\/
             ModuleA      TestSupportModule1
                /
               /
    (compile scope)
             /
           \/_ 
        ModuleB
           /
          /
    (test scope)
        /
      \/_   
TestSupportModule2
{noformat}
If you were to make a src code change to the following test support modules:
 * TestSupportModule1
 * TestSupportModule2

Then the minimum number of modules we need to build to verify the change set is OK is:
 * TestSupportModule1
 * TestSupportModule2
 * ModuleB
 * App

i.e. there is no requirement to build ModuleA because we know that none of the src code changes could impact the classpaths used in its maven build.

We know that despite 'App' depending (transitively) on ModuleB there is no need for the 'App' build to wait for ModuleB to complete its build because the src code change to TestSupportModule2 will not impact any of the classpaths used in the App maven build. Therefore to get the most efficient build possible we ideally would invoke Maven to run with 2 threads and with instruction to build *two distinct* 'dependency graphs':
 * TestSupportModule1 followed by ModuleB
 * TestSupportModule1 followed by App

The following Maven command achieves exactly what we want because the reactor build order is based only on the *direct* (i.e. non-transitive) dependencies of the modules provided to the reactor in the build command. Therefore the absence of ModuleA results in two distinct 'dependency graphs':
{noformat}
mvn clean verify -pl TestSupportModule1,TestSupportModule2,ModuleB,App -T 2
{noformat}
Note: In reality the code base I maintain has a very large monobuild with 100s of modules and this type of build optimisation makes a significant difference to the speed of our monobuild (we use [https://github.com/gitflow-incremental-builder/gitflow-incremental-builder] to automate the logic of determining which modules to include in the reactor based on our change set).
h4. Issue

We have encountered an issue in the above scenario because the 'App' build has a race condition with the ModuleB build which will result in one of the following three outcomes:
 * If the 'App' build starts before the ModuleB build has compiled its src classes then the 'App' build will resolve ModuleB from the external repo (i.e. equivalent to ModuleB not being in the reactor at all)
 * If the 'App' build starts after ModuleB has compiled its src classes but before it has packaged these classes into a jar then the 'App' build will resolve ModuleB's {{target/classes}} directory
 * If the 'App' build starts after ModuleB has packaged its jar file then the 'App' build will resolve ModuleB's {{target/ModuleB.jar}} file.

In many scenarios this dependency resolution inconsistency doesn't represent a challenge. However, it does cause an issue in our case because the 'App' POM has its Maven {{packaging}} stanza configured to {{war}} and in the scenario where ModuleB's {{target/classes}} directory is resolved by the 'App' then this results in the resultant 'App' war file being packaged with a completely empty ModuleB.jar file.
h4. Proposed solution

Ideally we would like the Maven reactor to retain isolation between the *two distinct* 'dependency graphs' it constructs at _instantiation_ throughout the entire Maven build. This would mean, in the simple example above, that the 'App' would *always* resolves ModuleB from the external repo (regardless of whether the reactor has built ModuleB or not in a _separate_ 'dependency graph' in the reactor).

 
h1. Reproducer

See [https://github.com/josple/mvn-multibuild-issue] (hopefully the README is clear enough – let me know if I can clarify anything).

  was:
h2. tl;dr

Maven can resolve dependencies either from:
 * an external repo
 * a class directory of a module being built within the reactor
 * a packaged jar of a module being built within the reactor

If you run a concurrent multi-module build it is possible to get a race condition whereby the build of module _Foo_ may resolve module _Bar_ from either of the three resolution {_}channels{_}. This inconsistency can result in the Maven _war_ plugin sometimes failing to build a functional war file. I would expect a consistent resolution would always take place.
h2. Full details
h4. Scenario

Consider you have a repo with the following structure:
{noformat}
                       App
                     /     \
                    /       \
       (compile scope)      (test scope)
                  /           \
                \/_           _\/
             ModuleA      TestSupportModule1
                /
               /
    (compile scope)
             /
           \/_ 
        ModuleB
           /
          /
    (test scope)
        /
      \/_   
TestSupportModule2
{noformat}
If you were to make a src code change to the following test support modules:
 * TestSupportModule1
 * TestSupportModule2

Then the minimum number of modules we need to build to verify the change set is OK is:
 * TestSupportModule1
 * TestSupportModule2
 * ModuleB
 * App

i.e. there is no requirement to build ModuleA because we know that none of the src code changes could impact the classpaths used in its maven build.

We know that despite 'App' depending (transitively) on ModuleB there is no need for the 'App' build to wait for ModuleB to complete its build because the src code change to TestSupportModule2 will not impact any of the classpaths used in the App maven build. Therefore to get the most efficient build possible we ideally would invoke Maven to run with 2 threads and with instruction to build *two distinct* 'dependency graphs':
 * TestSupportModule1 followed by ModuleB
 * TestSupportModule1 followed by App

The following Maven command achieves exactly what we want because the reactor build order is based only on the *direct* (i.e. non-transitive) dependencies of the modules provided to the reactor in the build command. Therefore the absence of ModuleA results in two distinct 'dependency graphs':
{noformat}
mvn clean verify -pl TestSupportModule1,TestSupportModule2,ModuleB,App -T 2
{noformat}
Note: In reality the code base I maintain has a very large monobuild with 100s of modules and this type of build optimisation makes a significant difference to the speed of our monobuild (we use [https://github.com/gitflow-incremental-builder/gitflow-incremental-builder] to automate the logic of determining which modules to include in the reactor based on our change set).
h4. Issue

We have encountered an issue in the above scenario because the 'App' build has a race condition with the ModuleB build which will result in one of the following three outcomes:
 * If the 'App' build starts before the ModuleB build has compiled its src classes then the 'App' build will resolve ModuleB from the external repo (i.e. equivalent to ModuleB not being in the reactor at all)
 * If the 'App' build starts after ModuleB has compiled its src classes but before it has packaged these classes into a jar then the 'App' build will resolve ModuleB's {{target/classes}} directory
 * If the 'App' build starts after ModuleB has packaged its jar file then the 'App' build will resolve ModuleB's {{target/ModuleB.jar}} file.

In many scenarios this dependency resolution inconsistency doesn't represent a challenge. However, it does cause an issue in our case because the 'App' POM has its Maven {{packaging}} stanza configured to {{war}} and in the scenario where ModuleB's {{target/classes}} directory is resolved by the 'App' then this results in the resultant 'App' war file being packaged with a completely empty ModuleB.jar file.
h4. Proposed solution

Ideally we would like the Maven reactor to retain isolation between the *two distinct* 'dependency graphs' it constructs at _instantiation_ throughout the entire Maven build. This would mean, in the simple example above, that the 'App' would *always* resolves ModuleB from the external repo (regardless of whether the reactor has built ModuleB or not in a _separate_ 'dependency graph' in the reactor).


> Inconsistent dependency resolution behaviour for concurrent multi-module build can cause failures
> -------------------------------------------------------------------------------------------------
>
>                 Key: MNG-8096
>                 URL: https://issues.apache.org/jira/browse/MNG-8096
>             Project: Maven
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.8.2, 3.9.6
>            Reporter: Joseph Leonard
>            Priority: Major
>
> h2. tl;dr
> Maven can resolve dependencies either from:
>  * an external repo
>  * a class directory of a module being built within the reactor
>  * a packaged jar of a module being built within the reactor
> If you run a concurrent multi-module build it is possible to get a race condition whereby the build of module _Foo_ may resolve module _Bar_ from either of the three resolution {_}channels{_}. This inconsistency can result in the Maven _war_ plugin sometimes failing to build a functional war file. I would expect a consistent resolution would always take place.
> h2. Full details
> h4. Scenario
> Consider you have a repo with the following structure:
> {noformat}
>                        App
>                      /     \
>                     /       \
>        (compile scope)      (test scope)
>                   /           \
>                 \/_           _\/
>              ModuleA      TestSupportModule1
>                 /
>                /
>     (compile scope)
>              /
>            \/_ 
>         ModuleB
>            /
>           /
>     (test scope)
>         /
>       \/_   
> TestSupportModule2
> {noformat}
> If you were to make a src code change to the following test support modules:
>  * TestSupportModule1
>  * TestSupportModule2
> Then the minimum number of modules we need to build to verify the change set is OK is:
>  * TestSupportModule1
>  * TestSupportModule2
>  * ModuleB
>  * App
> i.e. there is no requirement to build ModuleA because we know that none of the src code changes could impact the classpaths used in its maven build.
> We know that despite 'App' depending (transitively) on ModuleB there is no need for the 'App' build to wait for ModuleB to complete its build because the src code change to TestSupportModule2 will not impact any of the classpaths used in the App maven build. Therefore to get the most efficient build possible we ideally would invoke Maven to run with 2 threads and with instruction to build *two distinct* 'dependency graphs':
>  * TestSupportModule1 followed by ModuleB
>  * TestSupportModule1 followed by App
> The following Maven command achieves exactly what we want because the reactor build order is based only on the *direct* (i.e. non-transitive) dependencies of the modules provided to the reactor in the build command. Therefore the absence of ModuleA results in two distinct 'dependency graphs':
> {noformat}
> mvn clean verify -pl TestSupportModule1,TestSupportModule2,ModuleB,App -T 2
> {noformat}
> Note: In reality the code base I maintain has a very large monobuild with 100s of modules and this type of build optimisation makes a significant difference to the speed of our monobuild (we use [https://github.com/gitflow-incremental-builder/gitflow-incremental-builder] to automate the logic of determining which modules to include in the reactor based on our change set).
> h4. Issue
> We have encountered an issue in the above scenario because the 'App' build has a race condition with the ModuleB build which will result in one of the following three outcomes:
>  * If the 'App' build starts before the ModuleB build has compiled its src classes then the 'App' build will resolve ModuleB from the external repo (i.e. equivalent to ModuleB not being in the reactor at all)
>  * If the 'App' build starts after ModuleB has compiled its src classes but before it has packaged these classes into a jar then the 'App' build will resolve ModuleB's {{target/classes}} directory
>  * If the 'App' build starts after ModuleB has packaged its jar file then the 'App' build will resolve ModuleB's {{target/ModuleB.jar}} file.
> In many scenarios this dependency resolution inconsistency doesn't represent a challenge. However, it does cause an issue in our case because the 'App' POM has its Maven {{packaging}} stanza configured to {{war}} and in the scenario where ModuleB's {{target/classes}} directory is resolved by the 'App' then this results in the resultant 'App' war file being packaged with a completely empty ModuleB.jar file.
> h4. Proposed solution
> Ideally we would like the Maven reactor to retain isolation between the *two distinct* 'dependency graphs' it constructs at _instantiation_ throughout the entire Maven build. This would mean, in the simple example above, that the 'App' would *always* resolves ModuleB from the external repo (regardless of whether the reactor has built ModuleB or not in a _separate_ 'dependency graph' in the reactor).
>  
> h1. Reproducer
> See [https://github.com/josple/mvn-multibuild-issue] (hopefully the README is clear enough – let me know if I can clarify anything).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)