You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@maven.apache.org by Harald Wellmann <hw...@gmail.com> on 2015/12/26 11:39:09 UTC

MNG-5896: Speed up builds by parallel POM downloads

When building a project with dependencies not yet available in the local 
repository, I noticed that Maven first downloads the dependency POMs 
sequentially and then proceeds downloading the dependency JARs with up 
to 5 threads in parallel, which is not optimal when the POMs are served 
by a high-latency repository manager.

There wasn't a lot of feedback on my enhancement request [1] or my 
original StackOverflow question [2], so I started digging into the 
source code and ended up with a patch [3].

The patch only affects Aether, not Maven Core, but since Aether is not 
the top-level project from the end user perspective and doesn't appear 
to be very active, I thought I'd better contact this mailing list first.

The basic idea of the patch is a clean separation of POM downloading 
from POM processing in DefaultDependencyCollector. Once these steps are 
separated, it is possible to download dependency POMs asynchronously and 
in parallel, while still processing the POM models sequentially to build 
the dependency graph in the correct order.

Since DefaultDependencyCollector holds a lot of global state and has 
rather long methods, I started by refactoring this class into smaller 
chunks to make the underlying logic more transparent. That's why the 
patch looks a bit large, but essentially it only affects a single 
original class.

I did a local build of Maven 3.4.0-SNAPSHOT using Aether 1.1.0-SNAPSHOT 
with my patch, with all tests passing.

I also ran maven-integration-testing on this patched Maven 
3.4.0-SNAPSHOT, with no new tests failing. (There is just one test which 
is broken since a recent change on trunk, see [4].)

Thanks for reading this far - it would be great if someone would take 
the time to look into the issue and the patch, and advise how to go on.


[1] https://issues.apache.org/jira/browse/MNG-5896
[2] 
http://stackoverflow.com/questions/32299902/parallel-downloads-of-maven-artifacts
[3] 
https://github.com/hwellmann/aether-core/commit/cdab4c40094ccf621370647f83ecda54684066ce
[4] 
https://builds.apache.org/job/maven-3.3-release-status-test-linux/lastCompletedBuild/testReport/

Regards,
Harald


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: MNG-5896: Speed up builds by parallel POM downloads

Posted by Stephen Connolly <st...@gmail.com>.
This is interesting. I think the plan I to move aether under the maven
project (at least I was asked to grant my contributions to aether to allow
dual licensing under ASL on the basis of moving the code here... Still not
sure what the status is) so I suspect until that gets resolved aether will
be a fixed point. Once it gets resolved then this sounds like a good
contribution, but I am on my phone so cannot say for sure.

Keep an eye out for news of aether's code landing here/elsewhere and
re-ping at that point if we have forgotten ;-)

On Saturday 26 December 2015, Harald Wellmann <hw...@gmail.com>
wrote:

> When building a project with dependencies not yet available in the local
> repository, I noticed that Maven first downloads the dependency POMs
> sequentially and then proceeds downloading the dependency JARs with up to 5
> threads in parallel, which is not optimal when the POMs are served by a
> high-latency repository manager.
>
> There wasn't a lot of feedback on my enhancement request [1] or my
> original StackOverflow question [2], so I started digging into the source
> code and ended up with a patch [3].
>
> The patch only affects Aether, not Maven Core, but since Aether is not the
> top-level project from the end user perspective and doesn't appear to be
> very active, I thought I'd better contact this mailing list first.
>
> The basic idea of the patch is a clean separation of POM downloading from
> POM processing in DefaultDependencyCollector. Once these steps are
> separated, it is possible to download dependency POMs asynchronously and in
> parallel, while still processing the POM models sequentially to build the
> dependency graph in the correct order.
>
> Since DefaultDependencyCollector holds a lot of global state and has
> rather long methods, I started by refactoring this class into smaller
> chunks to make the underlying logic more transparent. That's why the patch
> looks a bit large, but essentially it only affects a single original class.
>
> I did a local build of Maven 3.4.0-SNAPSHOT using Aether 1.1.0-SNAPSHOT
> with my patch, with all tests passing.
>
> I also ran maven-integration-testing on this patched Maven 3.4.0-SNAPSHOT,
> with no new tests failing. (There is just one test which is broken since a
> recent change on trunk, see [4].)
>
> Thanks for reading this far - it would be great if someone would take the
> time to look into the issue and the patch, and advise how to go on.
>
>
> [1] https://issues.apache.org/jira/browse/MNG-5896
> [2]
> http://stackoverflow.com/questions/32299902/parallel-downloads-of-maven-artifacts
> [3]
> https://github.com/hwellmann/aether-core/commit/cdab4c40094ccf621370647f83ecda54684066ce
> [4]
> https://builds.apache.org/job/maven-3.3-release-status-test-linux/lastCompletedBuild/testReport/
>
> Regards,
> Harald
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>
>

-- 
Sent from my phone

Re: MNG-5896: Speed up builds by parallel POM downloads

Posted by Harald Wellmann <hw...@gmail.com>.
Thank you all for your feedback so far.

So it seems we'll have to wait and see what happens with Aether first...
That's an interesting topic of its own. I'll start a new thread about that.

Regards,
Harald

2015-12-27 22:56 GMT+01:00 Milos Kleint <mk...@gmail.com>:

> Nice work! Looking forward to have this integrated in maven codebase.
>
> We've solved the same problem internally by developing a service + maven
> extension that downloads the maven pom files in bulk (one or more zips). We
> are using bamboo aws elastic agents a lot and on a clean local repository,
> this can save a minute or two for large builds (when your nexus proxy is
> also placed in aws as well, when connecting to central even more).
>
> Milos
>
> On Sat, Dec 26, 2015 at 9:39 PM, Harald Wellmann <hw...@gmail.com>
> wrote:
>
> > When building a project with dependencies not yet available in the local
> > repository, I noticed that Maven first downloads the dependency POMs
> > sequentially and then proceeds downloading the dependency JARs with up
> to 5
> > threads in parallel, which is not optimal when the POMs are served by a
> > high-latency repository manager.
> >
> > There wasn't a lot of feedback on my enhancement request [1] or my
> > original StackOverflow question [2], so I started digging into the source
> > code and ended up with a patch [3].
> >
> > The patch only affects Aether, not Maven Core, but since Aether is not
> the
> > top-level project from the end user perspective and doesn't appear to be
> > very active, I thought I'd better contact this mailing list first.
> >
> > The basic idea of the patch is a clean separation of POM downloading from
> > POM processing in DefaultDependencyCollector. Once these steps are
> > separated, it is possible to download dependency POMs asynchronously and
> in
> > parallel, while still processing the POM models sequentially to build the
> > dependency graph in the correct order.
> >
> > Since DefaultDependencyCollector holds a lot of global state and has
> > rather long methods, I started by refactoring this class into smaller
> > chunks to make the underlying logic more transparent. That's why the
> patch
> > looks a bit large, but essentially it only affects a single original
> class.
> >
> > I did a local build of Maven 3.4.0-SNAPSHOT using Aether 1.1.0-SNAPSHOT
> > with my patch, with all tests passing.
> >
> > I also ran maven-integration-testing on this patched Maven
> 3.4.0-SNAPSHOT,
> > with no new tests failing. (There is just one test which is broken since
> a
> > recent change on trunk, see [4].)
> >
> > Thanks for reading this far - it would be great if someone would take the
> > time to look into the issue and the patch, and advise how to go on.
> >
> >
> > [1] https://issues.apache.org/jira/browse/MNG-5896
> > [2]
> >
> http://stackoverflow.com/questions/32299902/parallel-downloads-of-maven-artifacts
> > [3]
> >
> https://github.com/hwellmann/aether-core/commit/cdab4c40094ccf621370647f83ecda54684066ce
> > [4]
> >
> https://builds.apache.org/job/maven-3.3-release-status-test-linux/lastCompletedBuild/testReport/
> >
> > Regards,
> > Harald
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> > For additional commands, e-mail: dev-help@maven.apache.org
> >
> >
>

Re: MNG-5896: Speed up builds by parallel POM downloads

Posted by Milos Kleint <mk...@gmail.com>.
Nice work! Looking forward to have this integrated in maven codebase.

We've solved the same problem internally by developing a service + maven
extension that downloads the maven pom files in bulk (one or more zips). We
are using bamboo aws elastic agents a lot and on a clean local repository,
this can save a minute or two for large builds (when your nexus proxy is
also placed in aws as well, when connecting to central even more).

Milos

On Sat, Dec 26, 2015 at 9:39 PM, Harald Wellmann <hw...@gmail.com>
wrote:

> When building a project with dependencies not yet available in the local
> repository, I noticed that Maven first downloads the dependency POMs
> sequentially and then proceeds downloading the dependency JARs with up to 5
> threads in parallel, which is not optimal when the POMs are served by a
> high-latency repository manager.
>
> There wasn't a lot of feedback on my enhancement request [1] or my
> original StackOverflow question [2], so I started digging into the source
> code and ended up with a patch [3].
>
> The patch only affects Aether, not Maven Core, but since Aether is not the
> top-level project from the end user perspective and doesn't appear to be
> very active, I thought I'd better contact this mailing list first.
>
> The basic idea of the patch is a clean separation of POM downloading from
> POM processing in DefaultDependencyCollector. Once these steps are
> separated, it is possible to download dependency POMs asynchronously and in
> parallel, while still processing the POM models sequentially to build the
> dependency graph in the correct order.
>
> Since DefaultDependencyCollector holds a lot of global state and has
> rather long methods, I started by refactoring this class into smaller
> chunks to make the underlying logic more transparent. That's why the patch
> looks a bit large, but essentially it only affects a single original class.
>
> I did a local build of Maven 3.4.0-SNAPSHOT using Aether 1.1.0-SNAPSHOT
> with my patch, with all tests passing.
>
> I also ran maven-integration-testing on this patched Maven 3.4.0-SNAPSHOT,
> with no new tests failing. (There is just one test which is broken since a
> recent change on trunk, see [4].)
>
> Thanks for reading this far - it would be great if someone would take the
> time to look into the issue and the patch, and advise how to go on.
>
>
> [1] https://issues.apache.org/jira/browse/MNG-5896
> [2]
> http://stackoverflow.com/questions/32299902/parallel-downloads-of-maven-artifacts
> [3]
> https://github.com/hwellmann/aether-core/commit/cdab4c40094ccf621370647f83ecda54684066ce
> [4]
> https://builds.apache.org/job/maven-3.3-release-status-test-linux/lastCompletedBuild/testReport/
>
> Regards,
> Harald
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>
>