You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@maven.apache.org by Kristian Rosenvold <kr...@gmail.com> on 2009/11/20 09:29:30 UTC

MNG-3004/MNG-2802 - Achieving massive parallelity ?

I've been thinking further about parallelity within maven. The proposed
solution to MNG-3004
achieves parallelity by analyzing inter-module dependencies and scheduling
parallel dependencies in parallel.

A simple further evolution of this would be to collect and download all
external dependencies
for all modules immediately.

But this idea has been rummaging in my head while jogging for a week or so:

Would it be possible to achieve super-parallelity by describing
relationships between phases of the build, and even reordering some of the
phases ? I'll try to explain:

Assume that you can add transactional ACID (or maybe just AID) abilities
towards the local
repo for a full build. Simply put: All writes to a local repo is done in a
per-process-specific instance of the repo, that can be rolled back if the
build fails (or pushed to the local repo if
the build is ok)

If you do that you can re-order the life-cycle for most builds to be
something like this:

validate
compile
package
install
test
integration-test
deploy

Notice that I just moved all the "test" phases after the "install" phase.
Theoretically you could start any subsequent modules immediately after
"install" is done. Running of tests is really the big killer in most
multi-module projects I see.

Since your commit "push" towards the local repo only happens at the very end
of the build, you
will not publish artifacts when tests are failing (at leas not project
output artifacts)

You could actually make this a generic model that describes deifferent kinds
of
dependencies between lifecycle phases of different modules. The dependency I
immediately
see is "requiredForStarting" - which could be interpreted as meaning that
any upstream
dependencies must have reached at least that phase before the phase can be
started
for this project. I'm not sure if there's any value in a generic model, but
my perspective
may be limited to what I see on a daily basis.

Would this be feasible ?

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Posted by Brian Fox <br...@infinity.nu>.

> What about users doing multi-machine builds?  Earlier this week I wrote that
> users desiring to do multi-machine parallelism should deploy their builds to
> a remote repository shared between the machines.  Should their tests run
> post-deploy?
>

I think what you meant is "my" tests should run prior to deploy, but
in this case, "my" tests should run after my dependencies deploy so I
pick them up from the repo.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Posted by Kristian Rosenvold <kr...@gmail.com>.

I'm conceptually using the transactional "local" repository as a
get-out-of-jail card for possible violations of contract,
the full build must be ok for the real repository update to
come through, creating any publicly visible artifacts.

Obviously, you're right, the tests should be pushed
to after "deploy". 

I feel this would be comparable to the out-of-order
execution and pipelining that modern CPUs do; 
they guarantee that the underlying contract is
unchanged and only do these secret optimizations
because they can get away with it without telling you.

The question is really it could be pulled off in the
same manner as CPU's do it; without fundamentally
breaking any contracts...?

Kristian




al local repository as

On Fri, 2009-11-20 at 06:29 -0800, Dan Fabulich wrote:
> I've been meaning to reply to your earlier emails (it's been a busy week); 
> to this I'll just say that moving the "test" phase after the "install" 
> phase is a fascinating idea, which I personally like, but it seems like a 
> big violation of the contract for the lifecycle, and I suspect it won't be 
> popular. :-(
> 
> I've long felt that there should be a phase for testing after "install" 
> for similar reasons.  This might be SLIGHTLY more popular since users 
> would need to explicitly cause their tests to run during this phase.
> 
> What about users doing multi-machine builds?  Earlier this week I wrote 
> that users desiring to do multi-machine parallelism should deploy their 
> builds to a remote repository shared between the machines.  Should their 
> tests run post-deploy?
> 
> -Dan
> 
> 
> Kristian Rosenvold wrote:
> 
> > I've been thinking further about parallelity within maven. The proposed
> > solution to MNG-3004
> > achieves parallelity by analyzing inter-module dependencies and scheduling
> > parallel dependencies in parallel.
> >
> > A simple further evolution of this would be to collect and download all
> > external dependencies
> > for all modules immediately.
> >
> > But this idea has been rummaging in my head while jogging for a week or so:
> >
> > Would it be possible to achieve super-parallelity by describing
> > relationships between phases of the build, and even reordering some of the
> > phases ? I'll try to explain:
> >
> > Assume that you can add transactional ACID (or maybe just AID) abilities
> > towards the local
> > repo for a full build. Simply put: All writes to a local repo is done in a
> > per-process-specific instance of the repo, that can be rolled back if the
> > build fails (or pushed to the local repo if
> > the build is ok)
> >
> > If you do that you can re-order the life-cycle for most builds to be
> > something like this:
> >
> > validate
> > compile
> > package
> > install
> > test
> > integration-test
> > deploy
> >
> > Notice that I just moved all the "test" phases after the "install" phase.
> > Theoretically you could start any subsequent modules immediately after
> > "install" is done. Running of tests is really the big killer in most
> > multi-module projects I see.
> >
> > Since your commit "push" towards the local repo only happens at the very end
> > of the build, you
> > will not publish artifacts when tests are failing (at leas not project
> > output artifacts)
> >
> > You could actually make this a generic model that describes deifferent kinds
> > of
> > dependencies between lifecycle phases of different modules. The dependency I
> > immediately
> > see is "requiredForStarting" - which could be interpreted as meaning that
> > any upstream
> > dependencies must have reached at least that phase before the phase can be
> > started
> > for this project. I'm not sure if there's any value in a generic model, but
> > my perspective
> > may be limited to what I see on a daily basis.
> >
> > Would this be feasible ?
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Posted by Brett Porter <br...@apache.org>.

On 10/12/2009, at 3:38 AM, Kristian Rosenvold wrote:

> My personal short-list on MNG-3004 now only contains 1 thing, getting
> order on log output. I need some help on this one;
> 
> It seems like there's two sensible ways to handle this
> 
> A) Intercept plexus "Logger" and sort according to calling thread.
> B) Intercept System.out/err and sort according to calling thread.
> 
> I know B is going to work, but I really think A is a nicer option. But
> plexus scares me (it reminds me of a teenager, capable of throwing fits
> for no understandable reason and providing no explanation).
> 
> Assuming I am able to proxy the plexus logger, it should be able to
> capture output of all plugins too, right ? Anyone have any
> examples/explanation of how to proxy the logger ?
> 
> Anyone have any thoughts/preferences on this ?

You'll want something that works across all logging systems, since I'm sure some people use c-l, some slf4j, some plexus logging directly, and even some on System.out / System.err.

I would start by doing it in MavenLoggerManager and see what else needs to be dragged in. I think it's important to ensure all the logs are captured independently, regardless of how they are output to the user so that, for example, IDE integration could show several panels for the concurrent build output.

> 
> Kristian
> 
> P.S: The last time I tried to run with the integration tests they
> worked. Assuming i ran them correctly, that is. Maybe a good idea to
> update
> http://maven.apache.org/guides/development/guide-building-m2.html with
> descriptions on how to do run them correctly ?
> 

Hmm, there used to be a README in the core-integration-testing module.

"mvn clean install -Prun-its" from the root of core-integration-testing is the simplest instructions for running against the currently installed Maven, but there are several different ways depending on your workflow that might be more suitable (eg, running a subset of tests against an installed maven, or running the suite against a built-but-not-yet-installed maven).

Cheers,
Brett

--
Brett Porter
brett@apache.org
http://brettporter.wordpress.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Posted by Jason van Zyl <ja...@maven.org>.

Correct

On 2009-12-09, at 10:16 AM, Paul Benedict wrote:

> Typo? Nothing will appear *different* to users or plugin developers.?
> 
> On Wed, Dec 9, 2009 at 12:14 PM, Jason van Zyl <ja...@maven.org> wrote:
>> 
>> On 2009-12-09, at 9:15 AM, Kristian Rosenvold wrote:
>> 
>>> There's a lot of fun to be had with modern DI ;) I assume you're
>>> planning to keep the plexus interfaces and just change implementation ?
>>> 
>> 
>> Yes, we have to support all the old plugins with all the old metadata. We drop in Guice+Plexus shim and the user doesn't notice anything. It can't work any other way.
>> 
>>> Do you have any documentation/references on how you're intending to do
>>> this ? Will the plugins be wired up with guice too ?
>>> 
>> 
>> Nothing will appear the same to users or plugin developers.
>> 
>>> On the grand scale of things, this is just a private implementation
>>> detail. I wrote a System.out/System.err based threading interceptor in
>>> SUREFIRE-555 and I'll see if I can get that working. I've been spending
>>> too much time on this "patch" now, and there's just this one thing left
>>> before I ask the community to accept it. As long as I don't have to
>>> solve it with plexus I'm happy.
>>> 
>> 
>> That works for now and we'll push out a copy of the Guice-based Maven as time permits.
>> 
>>> Kristian
>>> 
>>> on., 09.12.2009 kl. 08.50 -0800, skrev Jason van Zyl:
>>>> On 2009-12-09, at 8:38 AM, Kristian Rosenvold wrote:
>>>> 
>>>>> My personal short-list on MNG-3004 now only contains 1 thing, getting
>>>>> order on log output. I need some help on this one;
>>>>> 
>>>>> It seems like there's two sensible ways to handle this
>>>>> 
>>>>> A) Intercept plexus "Logger" and sort according to calling thread.
>>>>> B) Intercept System.out/err and sort according to calling thread.
>>>>> 
>>>>> I know B is going to work, but I really think A is a nicer option. But
>>>>> plexus scares me (it reminds me of a teenager, capable of throwing fits
>>>>> for no understandable reason and providing no explanation).
>>>>> 
>>>>> Assuming I am able to proxy the plexus logger, it should be able to
>>>>> capture output of all plugins too, right ? Anyone have any
>>>>> examples/explanation of how to proxy the logger ?
>>>>> 
>>>>> Anyone have any thoughts/preferences on this ?
>>>>> 
>>>> 
>>>> You might want to wait a bit then as we have Nexus OSS completely running on Guice with a Plexus shim, almost have Nexus Pro running on the same system and probably about a week of work getting the same to work for Maven itself.
>>>> 
>>>> So if you can figure out a strategy to do this with Guice then that's what I plan to run Maven 3.x on. I will put this in a GIT repository to share when it's finished and then the community can decide if they want to have it be absorbed into Maven at Apache. I'm killing off Plexus as fast as possible and replacing it with Guice and we're close but Sonatype is focusing on Nexus first and then we'll try it with Maven and then we'll ask people here if they would like those changes here as I'm not presuming anything.
>>> 
>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>>> For additional commands, e-mail: dev-help@maven.apache.org
>>> 
>> 
>> Thanks,
>> 
>> Jason
>> 
>> ----------------------------------------------------------
>> Jason van Zyl
>> Founder,  Apache Maven
>> http://twitter.com/jvanzyl
>> ----------------------------------------------------------
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>> For additional commands, e-mail: dev-help@maven.apache.org
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
> 

Thanks,

Jason

----------------------------------------------------------
Jason van Zyl
Founder,  Apache Maven
http://twitter.com/jvanzyl
----------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Posted by Paul Benedict <pb...@apache.org>.

Typo? Nothing will appear *different* to users or plugin developers.?

On Wed, Dec 9, 2009 at 12:14 PM, Jason van Zyl <ja...@maven.org> wrote:
>
> On 2009-12-09, at 9:15 AM, Kristian Rosenvold wrote:
>
>> There's a lot of fun to be had with modern DI ;) I assume you're
>> planning to keep the plexus interfaces and just change implementation ?
>>
>
> Yes, we have to support all the old plugins with all the old metadata. We drop in Guice+Plexus shim and the user doesn't notice anything. It can't work any other way.
>
>> Do you have any documentation/references on how you're intending to do
>> this ? Will the plugins be wired up with guice too ?
>>
>
> Nothing will appear the same to users or plugin developers.
>
>> On the grand scale of things, this is just a private implementation
>> detail. I wrote a System.out/System.err based threading interceptor in
>> SUREFIRE-555 and I'll see if I can get that working. I've been spending
>> too much time on this "patch" now, and there's just this one thing left
>> before I ask the community to accept it. As long as I don't have to
>> solve it with plexus I'm happy.
>>
>
> That works for now and we'll push out a copy of the Guice-based Maven as time permits.
>
>> Kristian
>>
>> on., 09.12.2009 kl. 08.50 -0800, skrev Jason van Zyl:
>>> On 2009-12-09, at 8:38 AM, Kristian Rosenvold wrote:
>>>
>>>> My personal short-list on MNG-3004 now only contains 1 thing, getting
>>>> order on log output. I need some help on this one;
>>>>
>>>> It seems like there's two sensible ways to handle this
>>>>
>>>> A) Intercept plexus "Logger" and sort according to calling thread.
>>>> B) Intercept System.out/err and sort according to calling thread.
>>>>
>>>> I know B is going to work, but I really think A is a nicer option. But
>>>> plexus scares me (it reminds me of a teenager, capable of throwing fits
>>>> for no understandable reason and providing no explanation).
>>>>
>>>> Assuming I am able to proxy the plexus logger, it should be able to
>>>> capture output of all plugins too, right ? Anyone have any
>>>> examples/explanation of how to proxy the logger ?
>>>>
>>>> Anyone have any thoughts/preferences on this ?
>>>>
>>>
>>> You might want to wait a bit then as we have Nexus OSS completely running on Guice with a Plexus shim, almost have Nexus Pro running on the same system and probably about a week of work getting the same to work for Maven itself.
>>>
>>> So if you can figure out a strategy to do this with Guice then that's what I plan to run Maven 3.x on. I will put this in a GIT repository to share when it's finished and then the community can decide if they want to have it be absorbed into Maven at Apache. I'm killing off Plexus as fast as possible and replacing it with Guice and we're close but Sonatype is focusing on Nexus first and then we'll try it with Maven and then we'll ask people here if they would like those changes here as I'm not presuming anything.
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>> For additional commands, e-mail: dev-help@maven.apache.org
>>
>
> Thanks,
>
> Jason
>
> ----------------------------------------------------------
> Jason van Zyl
> Founder,  Apache Maven
> http://twitter.com/jvanzyl
> ----------------------------------------------------------
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Posted by Jason van Zyl <ja...@maven.org>.

On 2009-12-09, at 9:15 AM, Kristian Rosenvold wrote:

> There's a lot of fun to be had with modern DI ;) I assume you're
> planning to keep the plexus interfaces and just change implementation ?
> 

Yes, we have to support all the old plugins with all the old metadata. We drop in Guice+Plexus shim and the user doesn't notice anything. It can't work any other way.

> Do you have any documentation/references on how you're intending to do
> this ? Will the plugins be wired up with guice too ?
> 

Nothing will appear the same to users or plugin developers.

> On the grand scale of things, this is just a private implementation
> detail. I wrote a System.out/System.err based threading interceptor in
> SUREFIRE-555 and I'll see if I can get that working. I've been spending
> too much time on this "patch" now, and there's just this one thing left
> before I ask the community to accept it. As long as I don't have to
> solve it with plexus I'm happy.
> 

That works for now and we'll push out a copy of the Guice-based Maven as time permits.

> Kristian
> 
> on., 09.12.2009 kl. 08.50 -0800, skrev Jason van Zyl:
>> On 2009-12-09, at 8:38 AM, Kristian Rosenvold wrote:
>> 
>>> My personal short-list on MNG-3004 now only contains 1 thing, getting
>>> order on log output. I need some help on this one;
>>> 
>>> It seems like there's two sensible ways to handle this
>>> 
>>> A) Intercept plexus "Logger" and sort according to calling thread.
>>> B) Intercept System.out/err and sort according to calling thread.
>>> 
>>> I know B is going to work, but I really think A is a nicer option. But
>>> plexus scares me (it reminds me of a teenager, capable of throwing fits
>>> for no understandable reason and providing no explanation).
>>> 
>>> Assuming I am able to proxy the plexus logger, it should be able to
>>> capture output of all plugins too, right ? Anyone have any
>>> examples/explanation of how to proxy the logger ?
>>> 
>>> Anyone have any thoughts/preferences on this ?
>>> 
>> 
>> You might want to wait a bit then as we have Nexus OSS completely running on Guice with a Plexus shim, almost have Nexus Pro running on the same system and probably about a week of work getting the same to work for Maven itself.
>> 
>> So if you can figure out a strategy to do this with Guice then that's what I plan to run Maven 3.x on. I will put this in a GIT repository to share when it's finished and then the community can decide if they want to have it be absorbed into Maven at Apache. I'm killing off Plexus as fast as possible and replacing it with Guice and we're close but Sonatype is focusing on Nexus first and then we'll try it with Maven and then we'll ask people here if they would like those changes here as I'm not presuming anything.
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
> 

Thanks,

Jason

----------------------------------------------------------
Jason van Zyl
Founder,  Apache Maven
http://twitter.com/jvanzyl
----------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Posted by Kristian Rosenvold <kr...@gmail.com>.

I have now submitted my work as a patch attched to MNG3004. It still
only solves MNG3004 (I am considering 2802 a different patch or I'll
never finish anything ;)

The last version can be downloaded from 

http://cloud.github.com/downloads/krosenvold/maven3/apache-maven-3.0-SNAPSHOT-bin.tar.gz

(This version is based on M3 trunk as r890474 - latest as of 10 minutes
ago)

See some docs at http://github.com/krosenvold/maven3.

I would be very happy if someone could review this and hopefully accept
it into the 3.X range.

A few notes to reviewers:

The patch mainly affects DefaultLifecycleExecutor. Since I considered
this class to be way too large I did a large number of "extract class"
operations to this class - splitting it into a large number of (plexus)
components. This is mostly mechanical ide-based operations that does not
change the logic of the code.

The "essence" of the patch can be observed by starting at
DefaultLifecycleExecutor and working into LifecycleWeaveBuilder.

I have also added a significant number of new unit tests, as well as the
existing integration tests that Dan Fabulich wrote. This also includes
tests for some of the old logic that has now become unit-testable. There
are also stubs for most of components extracted from
DefaultLifecycleExecutor, meaning it is possible to unit-test these
further. I have not written tests for *all* the 2000 lines of code that
used to be in DefaultLifeCycleExecutor.

All integration tests run OK - unchanged.

Regarding compatibility:
There MAY be some unforseen/untested comptibility requirements in
DefaultLifecycleExecutor, LifecycleExecutor or MavenExecutionPlan 
that I am now aware of. Re-adding these is trivial if anyone knows
what they are. I have seen some bits and bobs on the wiki, but unsure
if these apply to 3.x.

Understanding this work:

I think of this work as building a foundation on which it is possible
to experiment/exploit further concurrency and constraints. There is
actually a declarative representation of concurrency
relationships/constraints (in DefaultLifecycles), and the
LifecycleWeaveBuilder implements this model. With such a model it should
be fairly simple to express further constraints and also work more on
different scheduling plans. I have no problems seeing this evolve
further into selectable "concurrency strategies" that can be applied to
different kinds of projects (if one shoe cannot be made to not fit all).

As can be seen from the docs (link), not all kinds of plugins are yet 
represented in the model, so this version will not build "any" project.
I have had feedback from several "enterprise" users that have
successfully used this patch to double their reactor build speeds.


Hope someone can spend the time it takes to review this. 

The patch is at http://jira.codehaus.org/browse/MNG-3004 specifically
http://jira.codehaus.org/secure/attachment/46434/mng3004.patch

Kristian



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Posted by Jason van Zyl <ja...@maven.org>.

On 2009-12-09, at 9:15 AM, Kristian Rosenvold wrote:

> There's a lot of fun to be had with modern DI ;) I assume you're
> planning to keep the plexus interfaces and just change implementation ?
> 
> Do you have any documentation/references on how you're intending to do
> this ? Will the plugins be wired up with guice too ?
> 
> On the grand scale of things, this is just a private implementation
> detail. I wrote a System.out/System.err based threading interceptor in
> SUREFIRE-555 and I'll see if I can get that working. I've been spending
> too much time on this "patch" now, and there's just this one thing left
> before I ask the community to accept it. As long as I don't have to
> solve it with plexus I'm happy.
> 

The loggers are still going to have a Plexus signature but it's going to be injected by Guice. There's nothing else you will need to look at except Guice and controlling the way the loggers are injected into components. I've been looking at what the JClouds guys have been doing:

http://code.google.com/p/jclouds/wiki/LogDesign

The interface for the plexus logger is not going to be removed anytime soon. Older components and plugins rely on that signature even if Plexus is not being used as the runtime. So all the injection will happen around that interface.

> Kristian
> 
> on., 09.12.2009 kl. 08.50 -0800, skrev Jason van Zyl:
>> On 2009-12-09, at 8:38 AM, Kristian Rosenvold wrote:
>> 
>>> My personal short-list on MNG-3004 now only contains 1 thing, getting
>>> order on log output. I need some help on this one;
>>> 
>>> It seems like there's two sensible ways to handle this
>>> 
>>> A) Intercept plexus "Logger" and sort according to calling thread.
>>> B) Intercept System.out/err and sort according to calling thread.
>>> 
>>> I know B is going to work, but I really think A is a nicer option. But
>>> plexus scares me (it reminds me of a teenager, capable of throwing fits
>>> for no understandable reason and providing no explanation).
>>> 
>>> Assuming I am able to proxy the plexus logger, it should be able to
>>> capture output of all plugins too, right ? Anyone have any
>>> examples/explanation of how to proxy the logger ?
>>> 
>>> Anyone have any thoughts/preferences on this ?
>>> 
>> 
>> You might want to wait a bit then as we have Nexus OSS completely running on Guice with a Plexus shim, almost have Nexus Pro running on the same system and probably about a week of work getting the same to work for Maven itself.
>> 
>> So if you can figure out a strategy to do this with Guice then that's what I plan to run Maven 3.x on. I will put this in a GIT repository to share when it's finished and then the community can decide if they want to have it be absorbed into Maven at Apache. I'm killing off Plexus as fast as possible and replacing it with Guice and we're close but Sonatype is focusing on Nexus first and then we'll try it with Maven and then we'll ask people here if they would like those changes here as I'm not presuming anything.
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
> 

Thanks,

Jason

----------------------------------------------------------
Jason van Zyl
Founder,  Apache Maven
http://twitter.com/jvanzyl
----------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Posted by Kristian Rosenvold <kr...@gmail.com>.

There's a lot of fun to be had with modern DI ;) I assume you're
planning to keep the plexus interfaces and just change implementation ?

Do you have any documentation/references on how you're intending to do
this ? Will the plugins be wired up with guice too ?

On the grand scale of things, this is just a private implementation
detail. I wrote a System.out/System.err based threading interceptor in
SUREFIRE-555 and I'll see if I can get that working. I've been spending
too much time on this "patch" now, and there's just this one thing left
before I ask the community to accept it. As long as I don't have to
solve it with plexus I'm happy.

Kristian

on., 09.12.2009 kl. 08.50 -0800, skrev Jason van Zyl:
> On 2009-12-09, at 8:38 AM, Kristian Rosenvold wrote:
> 
> > My personal short-list on MNG-3004 now only contains 1 thing, getting
> > order on log output. I need some help on this one;
> > 
> > It seems like there's two sensible ways to handle this
> > 
> > A) Intercept plexus "Logger" and sort according to calling thread.
> > B) Intercept System.out/err and sort according to calling thread.
> > 
> > I know B is going to work, but I really think A is a nicer option. But
> > plexus scares me (it reminds me of a teenager, capable of throwing fits
> > for no understandable reason and providing no explanation).
> > 
> > Assuming I am able to proxy the plexus logger, it should be able to
> > capture output of all plugins too, right ? Anyone have any
> > examples/explanation of how to proxy the logger ?
> > 
> > Anyone have any thoughts/preferences on this ?
> > 
> 
> You might want to wait a bit then as we have Nexus OSS completely running on Guice with a Plexus shim, almost have Nexus Pro running on the same system and probably about a week of work getting the same to work for Maven itself.
> 
> So if you can figure out a strategy to do this with Guice then that's what I plan to run Maven 3.x on. I will put this in a GIT repository to share when it's finished and then the community can decide if they want to have it be absorbed into Maven at Apache. I'm killing off Plexus as fast as possible and replacing it with Guice and we're close but Sonatype is focusing on Nexus first and then we'll try it with Maven and then we'll ask people here if they would like those changes here as I'm not presuming anything.




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Posted by Brett Porter <br...@apache.org>.

On 10/12/2009, at 3:50 AM, Jason van Zyl wrote:

> 
> On 2009-12-09, at 8:38 AM, Kristian Rosenvold wrote:
> 
>> My personal short-list on MNG-3004 now only contains 1 thing, getting
>> order on log output. I need some help on this one;
>> 
>> It seems like there's two sensible ways to handle this
>> 
>> A) Intercept plexus "Logger" and sort according to calling thread.
>> B) Intercept System.out/err and sort according to calling thread.
>> 
>> I know B is going to work, but I really think A is a nicer option. But
>> plexus scares me (it reminds me of a teenager, capable of throwing fits
>> for no understandable reason and providing no explanation).
>> 
>> Assuming I am able to proxy the plexus logger, it should be able to
>> capture output of all plugins too, right ? Anyone have any
>> examples/explanation of how to proxy the logger ?
>> 
>> Anyone have any thoughts/preferences on this ?
>> 
> 
> You might want to wait a bit then as we have Nexus OSS completely running on Guice with a Plexus shim, almost have Nexus Pro running on the same system and probably about a week of work getting the same to work for Maven itself.

Will this directly impact the logging, or will it still retain the old logging system for backwards compat.? Regardless, I don't think Kristian should have to wait until the new year to proceed.

> 
> So if you can figure out a strategy to do this with Guice then that's what I plan to run Maven 3.x on. I will put this in a GIT repository to share when it's finished and then the community can decide if they want to have it be absorbed into Maven at Apache. I'm killing off Plexus as fast as possible and replacing it with Guice and we're close but Sonatype is focusing on Nexus first and then we'll try it with Maven and then we'll ask people here if they would like those changes here as I'm not presuming anything.

+1 killing Plexus (can we move Maven core's use of the plexus-util libs to commons-* as well? :)

- Brett

--
Brett Porter
brett@apache.org
http://brettporter.wordpress.com/





---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Posted by Jason van Zyl <ja...@maven.org>.

On 2009-12-09, at 8:38 AM, Kristian Rosenvold wrote:

> My personal short-list on MNG-3004 now only contains 1 thing, getting
> order on log output. I need some help on this one;
> 
> It seems like there's two sensible ways to handle this
> 
> A) Intercept plexus "Logger" and sort according to calling thread.
> B) Intercept System.out/err and sort according to calling thread.
> 
> I know B is going to work, but I really think A is a nicer option. But
> plexus scares me (it reminds me of a teenager, capable of throwing fits
> for no understandable reason and providing no explanation).
> 
> Assuming I am able to proxy the plexus logger, it should be able to
> capture output of all plugins too, right ? Anyone have any
> examples/explanation of how to proxy the logger ?
> 
> Anyone have any thoughts/preferences on this ?
> 

You might want to wait a bit then as we have Nexus OSS completely running on Guice with a Plexus shim, almost have Nexus Pro running on the same system and probably about a week of work getting the same to work for Maven itself.

So if you can figure out a strategy to do this with Guice then that's what I plan to run Maven 3.x on. I will put this in a GIT repository to share when it's finished and then the community can decide if they want to have it be absorbed into Maven at Apache. I'm killing off Plexus as fast as possible and replacing it with Guice and we're close but Sonatype is focusing on Nexus first and then we'll try it with Maven and then we'll ask people here if they would like those changes here as I'm not presuming anything.

> Kristian
> 
> P.S: The last time I tried to run with the integration tests they
> worked. Assuming i ran them correctly, that is. Maybe a good idea to
> update
> http://maven.apache.org/guides/development/guide-building-m2.html with
> descriptions on how to do run them correctly ?
> 
> 
> 
> fr., 04.12.2009 kl. 18.58 -0800, skrev Jason van Zyl:
>> I'm just starting to take a look now.
>> 
>> Have you buy chance run what you've built against the integration tests?
>> 
>> http://svn.apache.org/repos/asf/maven/core-integration-testing/trunk/
>> 
>> If not, happy to help, or we can pull it into the grid and try it out there.
>> 
>> On 2009-12-03, at 1:05 AM, Kristian Rosenvold wrote:
>> 
>>> It's getting close enough, as long as you satisfy the
>>> following constraints:
>>> 
>>> - Make sure your build works with "regular" maven3 snapshots first ;)
>>> - You're not too reliant on snapshot artifacts (MNG-2802 is next on my
>>> TODO list now)
>>> - You're not generating source code in generate-sources 
>>> - Running aggregating tasks (javadoc etc) is largely untested as of yet.
>>> 
>>> The last two should be easily fixable but I've been concentrating on
>>> the main concurrency concerns (safe publication, deadlock avoidance etc)
>>> until now, but this seems to be rock solid with my build as of now. This
>>> is really also the stuff I need input on, since this is usually quite
>>> timing sensitive. (I have 2 different builds I run on C2D, i7 and dual
>>> xenons without hiccups right now)
>>> 
>>> Get the install from 
>>> 
>>> http://cloud.github.com/downloads/krosenvold/maven3/apache-maven-3.0-SNAPSHOT-bin.tar.gz
>>> 
>>> 
>>> mvn -e -Dmaven.threads.experimental=6 clean install
>>> 
>>> On my build, 1 thread per core gives best results. Maybe "3" for Core 2
>>> duo.
>>> 
>>> This version is up-to-date with maven3 trunk as of Thu Dec 3 09:01:44
>>> 2009 +0100. If you run without the -Dmaven.threads.experimental=6
>>> option, you should basically by running regular maven3 trunk.
>>> 
>>> 
>>> Kristian
>>> 
>>> 
>>> 
>>> 
>>> On Thu, 2009-12-03 at 09:21 +0100, Jorg Heymans wrote:
>>>> On Tue, Dec 1, 2009 at 9:49 PM, Kristian Rosenvold
>>>> <kr...@gmail.com> wrote:
>>>>> I am pleased to announce that the "weave" mode now does a
>>>>> "mvn clean install" of a fairly regular project with any number of
>>>>> threads, and at great speed improvement - 2-4x is not uncommon.
>>>>> 
>>>>> There are still issues to be sorted out, and I'd be really grateful
>>>>> for any reports of problems.
>>>>> 
>>>>> See http://github.com/krosenvold/maven3 for a *lot* more details
>>>>> on problems & issues and how to test this out on your builds.
>>>> 
>>>> Looks incredibly promising !
>>>> 
>>>> I would be more than happy to give you test feedback if you could
>>>> supply a binary dist with this feature. Or is it not yet ready to be
>>>> tested by the 'masses' ?
>>>> 
>>>> Jorg
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>>>> For additional commands, e-mail: dev-help@maven.apache.org
>>>> 
>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>>> For additional commands, e-mail: dev-help@maven.apache.org
>>> 
>> 
>> Thanks,
>> 
>> Jason
>> 
>> ----------------------------------------------------------
>> Jason van Zyl
>> Founder,  Apache Maven
>> http://twitter.com/jvanzyl
>> ----------------------------------------------------------
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>> For additional commands, e-mail: dev-help@maven.apache.org
>> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
> 

Thanks,

Jason

----------------------------------------------------------
Jason van Zyl
Founder,  Apache Maven
http://twitter.com/jvanzyl
----------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Posted by Kristian Rosenvold <kr...@gmail.com>.

My personal short-list on MNG-3004 now only contains 1 thing, getting
order on log output. I need some help on this one;

It seems like there's two sensible ways to handle this

A) Intercept plexus "Logger" and sort according to calling thread.
B) Intercept System.out/err and sort according to calling thread.

I know B is going to work, but I really think A is a nicer option. But
plexus scares me (it reminds me of a teenager, capable of throwing fits
for no understandable reason and providing no explanation).

Assuming I am able to proxy the plexus logger, it should be able to
capture output of all plugins too, right ? Anyone have any
examples/explanation of how to proxy the logger ?

Anyone have any thoughts/preferences on this ?

Kristian
 
P.S: The last time I tried to run with the integration tests they
worked. Assuming i ran them correctly, that is. Maybe a good idea to
update
http://maven.apache.org/guides/development/guide-building-m2.html with
descriptions on how to do run them correctly ?



fr., 04.12.2009 kl. 18.58 -0800, skrev Jason van Zyl:
> I'm just starting to take a look now.
> 
> Have you buy chance run what you've built against the integration tests?
> 
> http://svn.apache.org/repos/asf/maven/core-integration-testing/trunk/
> 
> If not, happy to help, or we can pull it into the grid and try it out there.
> 
> On 2009-12-03, at 1:05 AM, Kristian Rosenvold wrote:
> 
> > It's getting close enough, as long as you satisfy the
> > following constraints:
> > 
> > - Make sure your build works with "regular" maven3 snapshots first ;)
> > - You're not too reliant on snapshot artifacts (MNG-2802 is next on my
> > TODO list now)
> > - You're not generating source code in generate-sources 
> > - Running aggregating tasks (javadoc etc) is largely untested as of yet.
> > 
> > The last two should be easily fixable but I've been concentrating on
> > the main concurrency concerns (safe publication, deadlock avoidance etc)
> > until now, but this seems to be rock solid with my build as of now. This
> > is really also the stuff I need input on, since this is usually quite
> > timing sensitive. (I have 2 different builds I run on C2D, i7 and dual
> > xenons without hiccups right now)
> > 
> > Get the install from 
> > 
> > http://cloud.github.com/downloads/krosenvold/maven3/apache-maven-3.0-SNAPSHOT-bin.tar.gz
> > 
> > 
> > mvn -e -Dmaven.threads.experimental=6 clean install
> > 
> > On my build, 1 thread per core gives best results. Maybe "3" for Core 2
> > duo.
> > 
> > This version is up-to-date with maven3 trunk as of Thu Dec 3 09:01:44
> > 2009 +0100. If you run without the -Dmaven.threads.experimental=6
> > option, you should basically by running regular maven3 trunk.
> > 
> > 
> > Kristian
> > 
> > 
> > 
> > 
> > On Thu, 2009-12-03 at 09:21 +0100, Jorg Heymans wrote:
> >> On Tue, Dec 1, 2009 at 9:49 PM, Kristian Rosenvold
> >> <kr...@gmail.com> wrote:
> >>> I am pleased to announce that the "weave" mode now does a
> >>> "mvn clean install" of a fairly regular project with any number of
> >>> threads, and at great speed improvement - 2-4x is not uncommon.
> >>> 
> >>> There are still issues to be sorted out, and I'd be really grateful
> >>> for any reports of problems.
> >>> 
> >>> See http://github.com/krosenvold/maven3 for a *lot* more details
> >>> on problems & issues and how to test this out on your builds.
> >> 
> >> Looks incredibly promising !
> >> 
> >> I would be more than happy to give you test feedback if you could
> >> supply a binary dist with this feature. Or is it not yet ready to be
> >> tested by the 'masses' ?
> >> 
> >> Jorg
> >> 
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> >> For additional commands, e-mail: dev-help@maven.apache.org
> >> 
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> > For additional commands, e-mail: dev-help@maven.apache.org
> > 
> 
> Thanks,
> 
> Jason
> 
> ----------------------------------------------------------
> Jason van Zyl
> Founder,  Apache Maven
> http://twitter.com/jvanzyl
> ----------------------------------------------------------
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Posted by Jason van Zyl <ja...@maven.org>.

I'm just starting to take a look now.

Have you buy chance run what you've built against the integration tests?

http://svn.apache.org/repos/asf/maven/core-integration-testing/trunk/

If not, happy to help, or we can pull it into the grid and try it out there.

On 2009-12-03, at 1:05 AM, Kristian Rosenvold wrote:

> It's getting close enough, as long as you satisfy the
> following constraints:
> 
> - Make sure your build works with "regular" maven3 snapshots first ;)
> - You're not too reliant on snapshot artifacts (MNG-2802 is next on my
> TODO list now)
> - You're not generating source code in generate-sources 
> - Running aggregating tasks (javadoc etc) is largely untested as of yet.
> 
> The last two should be easily fixable but I've been concentrating on
> the main concurrency concerns (safe publication, deadlock avoidance etc)
> until now, but this seems to be rock solid with my build as of now. This
> is really also the stuff I need input on, since this is usually quite
> timing sensitive. (I have 2 different builds I run on C2D, i7 and dual
> xenons without hiccups right now)
> 
> Get the install from 
> 
> http://cloud.github.com/downloads/krosenvold/maven3/apache-maven-3.0-SNAPSHOT-bin.tar.gz
> 
> 
> mvn -e -Dmaven.threads.experimental=6 clean install
> 
> On my build, 1 thread per core gives best results. Maybe "3" for Core 2
> duo.
> 
> This version is up-to-date with maven3 trunk as of Thu Dec 3 09:01:44
> 2009 +0100. If you run without the -Dmaven.threads.experimental=6
> option, you should basically by running regular maven3 trunk.
> 
> 
> Kristian
> 
> 
> 
> 
> On Thu, 2009-12-03 at 09:21 +0100, Jorg Heymans wrote:
>> On Tue, Dec 1, 2009 at 9:49 PM, Kristian Rosenvold
>> <kr...@gmail.com> wrote:
>>> I am pleased to announce that the "weave" mode now does a
>>> "mvn clean install" of a fairly regular project with any number of
>>> threads, and at great speed improvement - 2-4x is not uncommon.
>>> 
>>> There are still issues to be sorted out, and I'd be really grateful
>>> for any reports of problems.
>>> 
>>> See http://github.com/krosenvold/maven3 for a *lot* more details
>>> on problems & issues and how to test this out on your builds.
>> 
>> Looks incredibly promising !
>> 
>> I would be more than happy to give you test feedback if you could
>> supply a binary dist with this feature. Or is it not yet ready to be
>> tested by the 'masses' ?
>> 
>> Jorg
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>> For additional commands, e-mail: dev-help@maven.apache.org
>> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
> 

Thanks,

Jason

----------------------------------------------------------
Jason van Zyl
Founder,  Apache Maven
http://twitter.com/jvanzyl
----------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Posted by Kristian Rosenvold <kr...@gmail.com>.

It's getting close enough, as long as you satisfy the
following constraints:

- Make sure your build works with "regular" maven3 snapshots first ;)
- You're not too reliant on snapshot artifacts (MNG-2802 is next on my
TODO list now)
- You're not generating source code in generate-sources 
- Running aggregating tasks (javadoc etc) is largely untested as of yet.

The last two should be easily fixable but I've been concentrating on
the main concurrency concerns (safe publication, deadlock avoidance etc)
until now, but this seems to be rock solid with my build as of now. This
is really also the stuff I need input on, since this is usually quite
timing sensitive. (I have 2 different builds I run on C2D, i7 and dual
xenons without hiccups right now)

Get the install from 

http://cloud.github.com/downloads/krosenvold/maven3/apache-maven-3.0-SNAPSHOT-bin.tar.gz

mvn -e -Dmaven.threads.experimental=6 clean install

On my build, 1 thread per core gives best results. Maybe "3" for Core 2
duo.

This version is up-to-date with maven3 trunk as of Thu Dec 3 09:01:44
2009 +0100. If you run without the -Dmaven.threads.experimental=6
option, you should basically by running regular maven3 trunk.

Kristian

On Thu, 2009-12-03 at 09:21 +0100, Jorg Heymans wrote:
> On Tue, Dec 1, 2009 at 9:49 PM, Kristian Rosenvold
> <kr...@gmail.com> wrote:
> > I am pleased to announce that the "weave" mode now does a
> > "mvn clean install" of a fairly regular project with any number of
> > threads, and at great speed improvement - 2-4x is not uncommon.
> >
> > There are still issues to be sorted out, and I'd be really grateful
> > for any reports of problems.
> >
> > See http://github.com/krosenvold/maven3 for a *lot* more details
> > on problems & issues and how to test this out on your builds.
> 
> Looks incredibly promising !
> 
> I would be more than happy to give you test feedback if you could
> supply a binary dist with this feature. Or is it not yet ready to be
> tested by the 'masses' ?
> 
> Jorg
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Posted by Jorg Heymans <jo...@gmail.com>.

On Tue, Dec 1, 2009 at 9:49 PM, Kristian Rosenvold
<kr...@gmail.com> wrote:
> I am pleased to announce that the "weave" mode now does a
> "mvn clean install" of a fairly regular project with any number of
> threads, and at great speed improvement - 2-4x is not uncommon.
>
> There are still issues to be sorted out, and I'd be really grateful
> for any reports of problems.
>
> See http://github.com/krosenvold/maven3 for a *lot* more details
> on problems & issues and how to test this out on your builds.

Looks incredibly promising !

I would be more than happy to give you test feedback if you could
supply a binary dist with this feature. Or is it not yet ready to be
tested by the 'masses' ?

Jorg

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Posted by Kristian Rosenvold <kr...@gmail.com>.

I am pleased to announce that the "weave" mode now does a 
"mvn clean install" of a fairly regular project with any number of
threads, and at great speed improvement - 2-4x is not uncommon.

There are still issues to be sorted out, and I'd be really grateful
for any reports of problems.

See http://github.com/krosenvold/maven3 for a *lot* more details
on problems & issues and how to test this out on your builds.

The patch is *not* finished yet, but achieving a reasonably stable  "mvn
clean install" has been an important milestone.

Kristian



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: git was: MNG-3004/MNG-2802

Posted by Kristian Rosenvold <kr...@gmail.com>.

Someone'll probably have to get some antihistamines ;)

Seriously though, we're 20 people using git in my daily job, and we're 
stuck on windows. Most of us are using cygwin (and that git is just like
a slower version of linux git). Eclipse has it's own java-based git
implementation (which lacks some bells & whistles but gets the job
done). IntelliJ v9 betas are getting really good
(http://www.jetbrains.net/confluence/display/IDEADEV/Maia+EAP) and has a
marvellous git integration (which is built on top of cygwin, but you
wouldn't notice).

I know some people say cygwin is "not windows". You /can/ actually add
c:/cygwin/usr/local/bin to your *windows* path and run "git" from
cmd.exe. The rest is just semantics to me.

The nice thing about git is that it's designed to allow the free flow of
code in any direction the owner of each repository cares for. Take
github as an example; we can all have our separate repositories (forks)
of a given project and usually someone takes the lead and picks up
changes from the others. This can change over time too, but the concept
of "commit rights" does not exist within git; everyone can commit to
their own repo and keep their work safe, it's just a matter of getting
someone to accept your changes that may be harder - at least you may
have to communicate with others to make that happen ;)

I'll stop pestering you about git ;) As of this moment it's the
only place I have access to keep my source code anyway, so it's not as
if I have a choice. And there's this rash.....

Kristian

On Mon, 2009-11-23 at 18:08 -0800, Dan Fabulich wrote:
> Kristian Rosenvold wrote:
> 
> > Ever since my git-fu got sufficiently strong I get this allergic rash 
> > every time someone says "subversion". Anyone wishing to join can just 
> > fork and we'll have some git-fun ;)
> 
> I'm on Windows; the best Windows git client (msysgit) is covered in poison 
> oak. ;-)
> 
> -Dan
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

git was: MNG-3004/MNG-2802

Posted by Dan Fabulich <da...@fabulich.com>.

Kristian Rosenvold wrote:

> Ever since my git-fu got sufficiently strong I get this allergic rash 
> every time someone says "subversion". Anyone wishing to join can just 
> fork and we'll have some git-fun ;)

I'm on Windows; the best Windows git client (msysgit) is covered in poison 
oak. ;-)

-Dan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: MNG-3004/MNG-2802 - Achieving massive parallelity - Proposal

Posted by Kristian Rosenvold <kr...@gmail.com>.

I have tried to summarize the discussion/results of this topic so far. I
was not able to write to the wiki (user krosenvold), but if someone with
permissions would consider uploading this http://gist.github.com/241344
to the wiki (it's confluence format), It'd bring the process one step
forward. 

I have also started initial small hacks in my own github repo
git://github.com/krosenvold/MNG-3004.git ,and I will be making
some commits there starting tomorrow. Ever since my git-fu got 
sufficiently strong I get this allergic rash every time someone says
"subversion". Anyone wishing to join can just fork and we'll have some
git-fun ;)

Kristian



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Posted by Ralph Goers <ra...@dslextreme.com>.

One downside to this kind of parallelism that you should consider is the output. If all these tasks are writing to stdout or stderr simultaneously it is going to make the build output hard to understand.  It might be preferable to pipe the output from a thread into a cache and then write the whole cache at once while locked to stdout.

Ralph

On Nov 22, 2009, at 12:06 PM, Dan Fabulich wrote:

> I like it!
> 
> Well, except for the "1 thread per module" part; that's clearly too many threads.  You'd want a fixed thread pool.
> 
> But restructuring the multithreading around the individual phases and *scheduling* phases from later projects when earlier project phases are done seems workable.
> 
> We probably would have thought of this earlier (or, at least, *I* would have) if the default reactor behavior worked like that.
> 
> Today, "mvn install" will first compile, test, and install project A, then compile, test, and install project B, and then compile, test and install project C.
> 
> But even without multithreading, you could imagine the reactor compiling A, then compiling B, then compiling C, then testing A, then testing B, then testing C, and finally installing A, installing B, and installing C. This strategy might fail faster than today's project-by-project strategy.
> 
> I propose that we first implement this as an optional reactor strategy (via a special command-line argument) in singlethreaded mode, and work out all the kinks.  Once we're pretty happy with that, we can add support for it in multithreaded mode.  It's especially important that it be at least *possible* to run it in singlethreaded mode, in case it causes problems for some projects independently of multithreading.
> 
> BTW, what would we call this new mode?  Perhaps we'd call it "weave" mode, because we're going across the projects horizontally in lifecycle order. I initially thought we might call it "breadth-first" as opposed to "depth-first," but that's a bad name because it sounds like we're reordering the projects.
> 
> 
> So, where might we find kinks in "weave" mode?
> * Correctly implementing reactor failure behavior (--fail-fast, --fail-at-end, --fail-never) with blacklisting
> * What happens if we specify multiple lifecycles?  "mvn compile test"
> * What happens if we just specify the raw goals? "mvn myPlugin:goal"
> * What if we mix and match? "mvn compile myPlugin:goal test"
> * What if we put clean last?  Would we clean projects while later projects depend on them? "mvn compile clean"
> * What if the reactor is building a plugin that is used later in the reactor?
> * How would users resume "weave" mode?  (Today we allow users to --resume-from a particular project.)  Would "weave" users resume from a particular project+phase?  Would resuming even be reasonable?  If you changed a class, you'd need to recompile it and THEN retest it...
> 
> These are the sort of areas where we'd want to have a good singlethreaded implementation with integration tests BEFORE plowing ahead with a multithreaded implementation.
> 
> -Dan
> 
> Kristian Rosenvold wrote:
> 
>> I've looked over the code and thought a bit further about the
>> constraints involved, and given that:
>> 
>> - Multi module reactor builds are the only interesting targets of
>> multithreading.
>> - Reactor builds do not use the "install" output of their upstream
>> dependencies (I was not aware of that ;)
>> 
>> You do not have to re-order anything at all. An implementation
>> could just:
>> A) Immediately fork 1 thread per module for all modules.
>> B) For the phases compile, install and deploy, a given module can
>> only proceeed when all its upstream dependencies have completed the same
>> state
>> There's still a chance of leaking artifacts to local repository if
>> upstream deploy fails after install, and the general idea of a
>> transacted repo would still be nice to stay consistent.
>> 
>> I'm still a bit sure about B) above, it may be a bit limiting in terms
>> of other usage scenarios. I'm also a bit sure how that'd fit in with all
>> the other activities in the lifecycle. An alternative would be to
>> make a declarative-representation of phase-interdependencies that could
>> express multiple types of concurrency-interdependencies. (But I
>> consistenly only see one dependency type -
>> upstreamMustFinishBeforeThisCanStart...?)
>> 
>> Would it float ?
>> 
>> Kristian
>> 
>> 
>> lø., 21.11.2009 kl. 11.40 +0000, skrev Stephen Connolly:
>>> In m3 (which is what we are talking about) AFAIK we can have a
>>> listener that waits for the end of the start of the deploy phase
>>> and/or the end of execution.
>>> 
>>> With a customized install plugin, we could just install to the
>>> "transaction" repository.  The listener can then block until the
>>> criteria have been met (allowing other modules to progress) That would
>>> achieve what you're after... namely, produce the artifacts for
>>> consumption by the other modules before running test and
>>> integration-test. Once the criteria have been met, we either fail the
>>> module or we move the artifacts from the "transactional" local repo to
>>> the real local repo and allow the lifecycle to continue
>>> 
>>> -Stephen
>>> 
>>> 2009/11/21 Kristian Rosenvold <kr...@gmail.com>:
>>>> I seem to understand that there's room for several different
>>>> types of solution here;
>>>> 
>>>> Starting with the single-machine solution; I now understand that
>>>> you could start forking downstream builds straight after
>>>> compile in a reactor build, maybe after install in other cases.
>>>> 
>>>> In this scenario I think each module is dependant on all upstream
>>>> modules successfully achieving "install" before proceeding to "deploy".
>>>> I really think it's important to avoid leaking artifacts that do not
>>>> have its own (and all upstream) lifecycle requirements fulfilled.
>>>> 
>>>> When it comes to clustering there may be several approaches:
>>>> If you decide to publish artifacts through "deploy" to any kind
>>>> of repo I believe these require to have all lifecycle requirements met,
>>>> which at my current understanding seems orthogonal to local out-of-order
>>>> execution.
>>>> 
>>>> Wouldn't it be feasible to distribute the "local" and perhaps
>>>> "transacted local" repo inside the cluster using network
>>>> file sharing ? One would still have to solve serialization issues
>>>> and using installed artifacts in a reactor build..?
>>>> 
>>>> The clustering case seems like a much harder task than achieving
>>>> full local concurrency. I did some fairly extensive measurements
>>>> with my current build when I set up concurrent spring/junit testing:
>>>> 
>>>> Missing concurrency in classloading is the most important reason
>>>> why unit tests run slowly (classloading is strictly a synchronized
>>>> business until jdk7). By running tests out-order on my local
>>>> unit test-build I am fairly certain I could reduce run-time
>>>> for "mvn clean install" to something much closer to "mvn
>>>> -Dmaven.test.skip=true clean install" (80->25 seconds in my case).
>>>> This is even before I start parallelizing the individual modules.
>>>> 
>>>> I must confess that I've yet to see a build that really needs
>>>> clustering for any other reason than running tests or other individual
>>>> tasks (javadoc, site etc). I think I'd be inclined to just distributing
>>>> those specific tasks in a cluster. If you actually had a decent model of
>>>> inter-lifecycle phase dependencies (requiredForStarting between phases),
>>>> you could probably achieve good results by keeping lifecycle execution
>>>> centralized but ditributing plugin execution ?
>>>> 
>>>> I suppose I may be narrow-minded on this last one...
>>>> 
>>>> I will be starting to look at the DefaultLifeCycleExecutor with thoughts
>>>> of out-of-order execution, maybe dabble around a little.
>>>> 
>>>> Kristian
>>>> 
>>>> fr., 20.11.2009 kl. 06.29 -0800, skrev Dan Fabulich:
>>>>> I've been meaning to reply to your earlier emails (it's been a busy week);
>>>>> to this I'll just say that moving the "test" phase after the "install"
>>>>> phase is a fascinating idea, which I personally like, but it seems like a
>>>>> big violation of the contract for the lifecycle, and I suspect it won't be
>>>>> popular. :-(
>>>>> 
>>>>> I've long felt that there should be a phase for testing after "install"
>>>>> for similar reasons.  This might be SLIGHTLY more popular since users
>>>>> would need to explicitly cause their tests to run during this phase.
>>>>> 
>>>>> What about users doing multi-machine builds?  Earlier this week I wrote
>>>>> that users desiring to do multi-machine parallelism should deploy their
>>>>> builds to a remote repository shared between the machines.  Should their
>>>>> tests run post-deploy?
>>>>> 
>>>>> -Dan
>>>>> 
>>>>> 
>>>>> Kristian Rosenvold wrote:
>>>>> 
>>>>>> I've been thinking further about parallelity within maven. The proposed
>>>>>> solution to MNG-3004
>>>>>> achieves parallelity by analyzing inter-module dependencies and scheduling
>>>>>> parallel dependencies in parallel.
>>>>>> 
>>>>>> A simple further evolution of this would be to collect and download all
>>>>>> external dependencies
>>>>>> for all modules immediately.
>>>>>> 
>>>>>> But this idea has been rummaging in my head while jogging for a week or so:
>>>>>> 
>>>>>> Would it be possible to achieve super-parallelity by describing
>>>>>> relationships between phases of the build, and even reordering some of the
>>>>>> phases ? I'll try to explain:
>>>>>> 
>>>>>> Assume that you can add transactional ACID (or maybe just AID) abilities
>>>>>> towards the local
>>>>>> repo for a full build. Simply put: All writes to a local repo is done in a
>>>>>> per-process-specific instance of the repo, that can be rolled back if the
>>>>>> build fails (or pushed to the local repo if
>>>>>> the build is ok)
>>>>>> 
>>>>>> If you do that you can re-order the life-cycle for most builds to be
>>>>>> something like this:
>>>>>> 
>>>>>> validate
>>>>>> compile
>>>>>> package
>>>>>> install
>>>>>> test
>>>>>> integration-test
>>>>>> deploy
>>>>>> 
>>>>>> Notice that I just moved all the "test" phases after the "install" phase.
>>>>>> Theoretically you could start any subsequent modules immediately after
>>>>>> "install" is done. Running of tests is really the big killer in most
>>>>>> multi-module projects I see.
>>>>>> 
>>>>>> Since your commit "push" towards the local repo only happens at the very end
>>>>>> of the build, you
>>>>>> will not publish artifacts when tests are failing (at leas not project
>>>>>> output artifacts)
>>>>>> 
>>>>>> You could actually make this a generic model that describes deifferent kinds
>>>>>> of
>>>>>> dependencies between lifecycle phases of different modules. The dependency I
>>>>>> immediately
>>>>>> see is "requiredForStarting" - which could be interpreted as meaning that
>>>>>> any upstream
>>>>>> dependencies must have reached at least that phase before the phase can be
>>>>>> started
>>>>>> for this project. I'm not sure if there's any value in a generic model, but
>>>>>> my perspective
>>>>>> may be limited to what I see on a daily basis.
>>>>>> 
>>>>>> Would this be feasible ?
>>>>>> 
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>>>>> For additional commands, e-mail: dev-help@maven.apache.org
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>>>> For additional commands, e-mail: dev-help@maven.apache.org
>>>> 
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>>> For additional commands, e-mail: dev-help@maven.apache.org
>>> 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>> For additional commands, e-mail: dev-help@maven.apache.org
>> 
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Posted by Dan Fabulich <da...@fabulich.com>.

I like it!

Well, except for the "1 thread per module" part; that's clearly too many 
threads.  You'd want a fixed thread pool.

But restructuring the multithreading around the individual phases and 
*scheduling* phases from later projects when earlier project phases are 
done seems workable.

We probably would have thought of this earlier (or, at least, *I* would 
have) if the default reactor behavior worked like that.

Today, "mvn install" will first compile, test, and install project A, then 
compile, test, and install project B, and then compile, test and install 
project C.

But even without multithreading, you could imagine the reactor compiling 
A, then compiling B, then compiling C, then testing A, then testing B, 
then testing C, and finally installing A, installing B, and installing C. 
This strategy might fail faster than today's project-by-project strategy.

I propose that we first implement this as an optional reactor strategy 
(via a special command-line argument) in singlethreaded mode, and work out 
all the kinks.  Once we're pretty happy with that, we can add support for 
it in multithreaded mode.  It's especially important that it be at least 
*possible* to run it in singlethreaded mode, in case it causes problems 
for some projects independently of multithreading.

BTW, what would we call this new mode?  Perhaps we'd call it "weave" mode, 
because we're going across the projects horizontally in lifecycle order. 
I initially thought we might call it "breadth-first" as opposed to 
"depth-first," but that's a bad name because it sounds like we're 
reordering the projects.


So, where might we find kinks in "weave" mode?
* Correctly implementing reactor failure behavior (--fail-fast, 
--fail-at-end, --fail-never) with blacklisting
* What happens if we specify multiple lifecycles?  "mvn compile test"
* What happens if we just specify the raw goals? "mvn 
myPlugin:goal"
* What if we mix and match? "mvn compile myPlugin:goal test"
* What if we put clean last?  Would we clean projects while later 
projects depend on them? "mvn compile clean"
* What if the reactor is building a plugin that is used later in the 
reactor?
* How would users resume "weave" mode?  (Today we allow users to 
--resume-from a particular project.)  Would "weave" users resume from a 
particular project+phase?  Would resuming even be reasonable?  If you 
changed a class, you'd need to recompile it and THEN retest it...

These are the sort of areas where we'd want to have a good singlethreaded 
implementation with integration tests BEFORE plowing ahead with a 
multithreaded implementation.

-Dan

Kristian Rosenvold wrote:

> I've looked over the code and thought a bit further about the
> constraints involved, and given that:
>
> - Multi module reactor builds are the only interesting targets of
> multithreading.
> - Reactor builds do not use the "install" output of their upstream
> dependencies (I was not aware of that ;)
>
> You do not have to re-order anything at all. An implementation
> could just:
> A) Immediately fork 1 thread per module for all modules.
> B) For the phases compile, install and deploy, a given module can
> only proceeed when all its upstream dependencies have completed the same
> state
> There's still a chance of leaking artifacts to local repository if
> upstream deploy fails after install, and the general idea of a
> transacted repo would still be nice to stay consistent.
>
> I'm still a bit sure about B) above, it may be a bit limiting in terms
> of other usage scenarios. I'm also a bit sure how that'd fit in with all
> the other activities in the lifecycle. An alternative would be to
> make a declarative-representation of phase-interdependencies that could
> express multiple types of concurrency-interdependencies. (But I
> consistenly only see one dependency type -
> upstreamMustFinishBeforeThisCanStart...?)
>
> Would it float ?
>
> Kristian
>
>
> lø., 21.11.2009 kl. 11.40 +0000, skrev Stephen Connolly:
>> In m3 (which is what we are talking about) AFAIK we can have a
>> listener that waits for the end of the start of the deploy phase
>> and/or the end of execution.
>>
>> With a customized install plugin, we could just install to the
>> "transaction" repository.  The listener can then block until the
>> criteria have been met (allowing other modules to progress) That would
>> achieve what you're after... namely, produce the artifacts for
>> consumption by the other modules before running test and
>> integration-test. Once the criteria have been met, we either fail the
>> module or we move the artifacts from the "transactional" local repo to
>> the real local repo and allow the lifecycle to continue
>>
>> -Stephen
>>
>> 2009/11/21 Kristian Rosenvold <kr...@gmail.com>:
>>> I seem to understand that there's room for several different
>>> types of solution here;
>>>
>>> Starting with the single-machine solution; I now understand that
>>> you could start forking downstream builds straight after
>>> compile in a reactor build, maybe after install in other cases.
>>>
>>> In this scenario I think each module is dependant on all upstream
>>> modules successfully achieving "install" before proceeding to "deploy".
>>> I really think it's important to avoid leaking artifacts that do not
>>> have its own (and all upstream) lifecycle requirements fulfilled.
>>>
>>> When it comes to clustering there may be several approaches:
>>> If you decide to publish artifacts through "deploy" to any kind
>>> of repo I believe these require to have all lifecycle requirements met,
>>> which at my current understanding seems orthogonal to local out-of-order
>>> execution.
>>>
>>> Wouldn't it be feasible to distribute the "local" and perhaps
>>> "transacted local" repo inside the cluster using network
>>> file sharing ? One would still have to solve serialization issues
>>> and using installed artifacts in a reactor build..?
>>>
>>> The clustering case seems like a much harder task than achieving
>>> full local concurrency. I did some fairly extensive measurements
>>> with my current build when I set up concurrent spring/junit testing:
>>>
>>> Missing concurrency in classloading is the most important reason
>>> why unit tests run slowly (classloading is strictly a synchronized
>>> business until jdk7). By running tests out-order on my local
>>> unit test-build I am fairly certain I could reduce run-time
>>> for "mvn clean install" to something much closer to "mvn
>>> -Dmaven.test.skip=true clean install" (80->25 seconds in my case).
>>> This is even before I start parallelizing the individual modules.
>>>
>>> I must confess that I've yet to see a build that really needs
>>> clustering for any other reason than running tests or other individual
>>> tasks (javadoc, site etc). I think I'd be inclined to just distributing
>>> those specific tasks in a cluster. If you actually had a decent model of
>>> inter-lifecycle phase dependencies (requiredForStarting between phases),
>>> you could probably achieve good results by keeping lifecycle execution
>>> centralized but ditributing plugin execution ?
>>>
>>> I suppose I may be narrow-minded on this last one...
>>>
>>> I will be starting to look at the DefaultLifeCycleExecutor with thoughts
>>> of out-of-order execution, maybe dabble around a little.
>>>
>>> Kristian
>>>
>>> fr., 20.11.2009 kl. 06.29 -0800, skrev Dan Fabulich:
>>>> I've been meaning to reply to your earlier emails (it's been a busy week);
>>>> to this I'll just say that moving the "test" phase after the "install"
>>>> phase is a fascinating idea, which I personally like, but it seems like a
>>>> big violation of the contract for the lifecycle, and I suspect it won't be
>>>> popular. :-(
>>>>
>>>> I've long felt that there should be a phase for testing after "install"
>>>> for similar reasons.  This might be SLIGHTLY more popular since users
>>>> would need to explicitly cause their tests to run during this phase.
>>>>
>>>> What about users doing multi-machine builds?  Earlier this week I wrote
>>>> that users desiring to do multi-machine parallelism should deploy their
>>>> builds to a remote repository shared between the machines.  Should their
>>>> tests run post-deploy?
>>>>
>>>> -Dan
>>>>
>>>>
>>>> Kristian Rosenvold wrote:
>>>>
>>>>> I've been thinking further about parallelity within maven. The proposed
>>>>> solution to MNG-3004
>>>>> achieves parallelity by analyzing inter-module dependencies and scheduling
>>>>> parallel dependencies in parallel.
>>>>>
>>>>> A simple further evolution of this would be to collect and download all
>>>>> external dependencies
>>>>> for all modules immediately.
>>>>>
>>>>> But this idea has been rummaging in my head while jogging for a week or so:
>>>>>
>>>>> Would it be possible to achieve super-parallelity by describing
>>>>> relationships between phases of the build, and even reordering some of the
>>>>> phases ? I'll try to explain:
>>>>>
>>>>> Assume that you can add transactional ACID (or maybe just AID) abilities
>>>>> towards the local
>>>>> repo for a full build. Simply put: All writes to a local repo is done in a
>>>>> per-process-specific instance of the repo, that can be rolled back if the
>>>>> build fails (or pushed to the local repo if
>>>>> the build is ok)
>>>>>
>>>>> If you do that you can re-order the life-cycle for most builds to be
>>>>> something like this:
>>>>>
>>>>> validate
>>>>> compile
>>>>> package
>>>>> install
>>>>> test
>>>>> integration-test
>>>>> deploy
>>>>>
>>>>> Notice that I just moved all the "test" phases after the "install" phase.
>>>>> Theoretically you could start any subsequent modules immediately after
>>>>> "install" is done. Running of tests is really the big killer in most
>>>>> multi-module projects I see.
>>>>>
>>>>> Since your commit "push" towards the local repo only happens at the very end
>>>>> of the build, you
>>>>> will not publish artifacts when tests are failing (at leas not project
>>>>> output artifacts)
>>>>>
>>>>> You could actually make this a generic model that describes deifferent kinds
>>>>> of
>>>>> dependencies between lifecycle phases of different modules. The dependency I
>>>>> immediately
>>>>> see is "requiredForStarting" - which could be interpreted as meaning that
>>>>> any upstream
>>>>> dependencies must have reached at least that phase before the phase can be
>>>>> started
>>>>> for this project. I'm not sure if there's any value in a generic model, but
>>>>> my perspective
>>>>> may be limited to what I see on a daily basis.
>>>>>
>>>>> Would this be feasible ?
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>>>> For additional commands, e-mail: dev-help@maven.apache.org
>>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>>> For additional commands, e-mail: dev-help@maven.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>> For additional commands, e-mail: dev-help@maven.apache.org
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>
>

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Posted by Kristian Rosenvold <kr...@gmail.com>.

I've looked over the code and thought a bit further about the
constraints involved, and given that:

- Multi module reactor builds are the only interesting targets of
multithreading.
- Reactor builds do not use the "install" output of their upstream
dependencies (I was not aware of that ;)

You do not have to re-order anything at all. An implementation 
could just:
A) Immediately fork 1 thread per module for all modules.
B) For the phases compile, install and deploy, a given module can
only proceeed when all its upstream dependencies have completed the same
state
There's still a chance of leaking artifacts to local repository if
upstream deploy fails after install, and the general idea of a
transacted repo would still be nice to stay consistent.
 
I'm still a bit sure about B) above, it may be a bit limiting in terms
of other usage scenarios. I'm also a bit sure how that'd fit in with all
the other activities in the lifecycle. An alternative would be to
make a declarative-representation of phase-interdependencies that could
express multiple types of concurrency-interdependencies. (But I
consistenly only see one dependency type -
upstreamMustFinishBeforeThisCanStart...?)

Would it float ?

Kristian


lø., 21.11.2009 kl. 11.40 +0000, skrev Stephen Connolly:
> In m3 (which is what we are talking about) AFAIK we can have a
> listener that waits for the end of the start of the deploy phase
> and/or the end of execution.
> 
> With a customized install plugin, we could just install to the
> "transaction" repository.  The listener can then block until the
> criteria have been met (allowing other modules to progress) That would
> achieve what you're after... namely, produce the artifacts for
> consumption by the other modules before running test and
> integration-test. Once the criteria have been met, we either fail the
> module or we move the artifacts from the "transactional" local repo to
> the real local repo and allow the lifecycle to continue
> 
> -Stephen
> 
> 2009/11/21 Kristian Rosenvold <kr...@gmail.com>:
> > I seem to understand that there's room for several different
> > types of solution here;
> >
> > Starting with the single-machine solution; I now understand that
> > you could start forking downstream builds straight after
> > compile in a reactor build, maybe after install in other cases.
> >
> > In this scenario I think each module is dependant on all upstream
> > modules successfully achieving "install" before proceeding to "deploy".
> > I really think it's important to avoid leaking artifacts that do not
> > have its own (and all upstream) lifecycle requirements fulfilled.
> >
> > When it comes to clustering there may be several approaches:
> > If you decide to publish artifacts through "deploy" to any kind
> > of repo I believe these require to have all lifecycle requirements met,
> > which at my current understanding seems orthogonal to local out-of-order
> > execution.
> >
> > Wouldn't it be feasible to distribute the "local" and perhaps
> > "transacted local" repo inside the cluster using network
> > file sharing ? One would still have to solve serialization issues
> > and using installed artifacts in a reactor build..?
> >
> > The clustering case seems like a much harder task than achieving
> > full local concurrency. I did some fairly extensive measurements
> > with my current build when I set up concurrent spring/junit testing:
> >
> > Missing concurrency in classloading is the most important reason
> > why unit tests run slowly (classloading is strictly a synchronized
> > business until jdk7). By running tests out-order on my local
> > unit test-build I am fairly certain I could reduce run-time
> > for "mvn clean install" to something much closer to "mvn
> > -Dmaven.test.skip=true clean install" (80->25 seconds in my case).
> > This is even before I start parallelizing the individual modules.
> >
> > I must confess that I've yet to see a build that really needs
> > clustering for any other reason than running tests or other individual
> > tasks (javadoc, site etc). I think I'd be inclined to just distributing
> > those specific tasks in a cluster. If you actually had a decent model of
> > inter-lifecycle phase dependencies (requiredForStarting between phases),
> > you could probably achieve good results by keeping lifecycle execution
> > centralized but ditributing plugin execution ?
> >
> > I suppose I may be narrow-minded on this last one...
> >
> > I will be starting to look at the DefaultLifeCycleExecutor with thoughts
> > of out-of-order execution, maybe dabble around a little.
> >
> > Kristian
> >
> > fr., 20.11.2009 kl. 06.29 -0800, skrev Dan Fabulich:
> >> I've been meaning to reply to your earlier emails (it's been a busy week);
> >> to this I'll just say that moving the "test" phase after the "install"
> >> phase is a fascinating idea, which I personally like, but it seems like a
> >> big violation of the contract for the lifecycle, and I suspect it won't be
> >> popular. :-(
> >>
> >> I've long felt that there should be a phase for testing after "install"
> >> for similar reasons.  This might be SLIGHTLY more popular since users
> >> would need to explicitly cause their tests to run during this phase.
> >>
> >> What about users doing multi-machine builds?  Earlier this week I wrote
> >> that users desiring to do multi-machine parallelism should deploy their
> >> builds to a remote repository shared between the machines.  Should their
> >> tests run post-deploy?
> >>
> >> -Dan
> >>
> >>
> >> Kristian Rosenvold wrote:
> >>
> >> > I've been thinking further about parallelity within maven. The proposed
> >> > solution to MNG-3004
> >> > achieves parallelity by analyzing inter-module dependencies and scheduling
> >> > parallel dependencies in parallel.
> >> >
> >> > A simple further evolution of this would be to collect and download all
> >> > external dependencies
> >> > for all modules immediately.
> >> >
> >> > But this idea has been rummaging in my head while jogging for a week or so:
> >> >
> >> > Would it be possible to achieve super-parallelity by describing
> >> > relationships between phases of the build, and even reordering some of the
> >> > phases ? I'll try to explain:
> >> >
> >> > Assume that you can add transactional ACID (or maybe just AID) abilities
> >> > towards the local
> >> > repo for a full build. Simply put: All writes to a local repo is done in a
> >> > per-process-specific instance of the repo, that can be rolled back if the
> >> > build fails (or pushed to the local repo if
> >> > the build is ok)
> >> >
> >> > If you do that you can re-order the life-cycle for most builds to be
> >> > something like this:
> >> >
> >> > validate
> >> > compile
> >> > package
> >> > install
> >> > test
> >> > integration-test
> >> > deploy
> >> >
> >> > Notice that I just moved all the "test" phases after the "install" phase.
> >> > Theoretically you could start any subsequent modules immediately after
> >> > "install" is done. Running of tests is really the big killer in most
> >> > multi-module projects I see.
> >> >
> >> > Since your commit "push" towards the local repo only happens at the very end
> >> > of the build, you
> >> > will not publish artifacts when tests are failing (at leas not project
> >> > output artifacts)
> >> >
> >> > You could actually make this a generic model that describes deifferent kinds
> >> > of
> >> > dependencies between lifecycle phases of different modules. The dependency I
> >> > immediately
> >> > see is "requiredForStarting" - which could be interpreted as meaning that
> >> > any upstream
> >> > dependencies must have reached at least that phase before the phase can be
> >> > started
> >> > for this project. I'm not sure if there's any value in a generic model, but
> >> > my perspective
> >> > may be limited to what I see on a daily basis.
> >> >
> >> > Would this be feasible ?
> >> >
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> >> For additional commands, e-mail: dev-help@maven.apache.org
> >>
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> > For additional commands, e-mail: dev-help@maven.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Posted by Stephen Connolly <st...@gmail.com>.

In m3 (which is what we are talking about) AFAIK we can have a
listener that waits for the end of the start of the deploy phase
and/or the end of execution.

With a customized install plugin, we could just install to the
"transaction" repository.  The listener can then block until the
criteria have been met (allowing other modules to progress) That would
achieve what you're after... namely, produce the artifacts for
consumption by the other modules before running test and
integration-test. Once the criteria have been met, we either fail the
module or we move the artifacts from the "transactional" local repo to
the real local repo and allow the lifecycle to continue

-Stephen

2009/11/21 Kristian Rosenvold <kr...@gmail.com>:
> I seem to understand that there's room for several different
> types of solution here;
>
> Starting with the single-machine solution; I now understand that
> you could start forking downstream builds straight after
> compile in a reactor build, maybe after install in other cases.
>
> In this scenario I think each module is dependant on all upstream
> modules successfully achieving "install" before proceeding to "deploy".
> I really think it's important to avoid leaking artifacts that do not
> have its own (and all upstream) lifecycle requirements fulfilled.
>
> When it comes to clustering there may be several approaches:
> If you decide to publish artifacts through "deploy" to any kind
> of repo I believe these require to have all lifecycle requirements met,
> which at my current understanding seems orthogonal to local out-of-order
> execution.
>
> Wouldn't it be feasible to distribute the "local" and perhaps
> "transacted local" repo inside the cluster using network
> file sharing ? One would still have to solve serialization issues
> and using installed artifacts in a reactor build..?
>
> The clustering case seems like a much harder task than achieving
> full local concurrency. I did some fairly extensive measurements
> with my current build when I set up concurrent spring/junit testing:
>
> Missing concurrency in classloading is the most important reason
> why unit tests run slowly (classloading is strictly a synchronized
> business until jdk7). By running tests out-order on my local
> unit test-build I am fairly certain I could reduce run-time
> for "mvn clean install" to something much closer to "mvn
> -Dmaven.test.skip=true clean install" (80->25 seconds in my case).
> This is even before I start parallelizing the individual modules.
>
> I must confess that I've yet to see a build that really needs
> clustering for any other reason than running tests or other individual
> tasks (javadoc, site etc). I think I'd be inclined to just distributing
> those specific tasks in a cluster. If you actually had a decent model of
> inter-lifecycle phase dependencies (requiredForStarting between phases),
> you could probably achieve good results by keeping lifecycle execution
> centralized but ditributing plugin execution ?
>
> I suppose I may be narrow-minded on this last one...
>
> I will be starting to look at the DefaultLifeCycleExecutor with thoughts
> of out-of-order execution, maybe dabble around a little.
>
> Kristian
>
> fr., 20.11.2009 kl. 06.29 -0800, skrev Dan Fabulich:
>> I've been meaning to reply to your earlier emails (it's been a busy week);
>> to this I'll just say that moving the "test" phase after the "install"
>> phase is a fascinating idea, which I personally like, but it seems like a
>> big violation of the contract for the lifecycle, and I suspect it won't be
>> popular. :-(
>>
>> I've long felt that there should be a phase for testing after "install"
>> for similar reasons.  This might be SLIGHTLY more popular since users
>> would need to explicitly cause their tests to run during this phase.
>>
>> What about users doing multi-machine builds?  Earlier this week I wrote
>> that users desiring to do multi-machine parallelism should deploy their
>> builds to a remote repository shared between the machines.  Should their
>> tests run post-deploy?
>>
>> -Dan
>>
>>
>> Kristian Rosenvold wrote:
>>
>> > I've been thinking further about parallelity within maven. The proposed
>> > solution to MNG-3004
>> > achieves parallelity by analyzing inter-module dependencies and scheduling
>> > parallel dependencies in parallel.
>> >
>> > A simple further evolution of this would be to collect and download all
>> > external dependencies
>> > for all modules immediately.
>> >
>> > But this idea has been rummaging in my head while jogging for a week or so:
>> >
>> > Would it be possible to achieve super-parallelity by describing
>> > relationships between phases of the build, and even reordering some of the
>> > phases ? I'll try to explain:
>> >
>> > Assume that you can add transactional ACID (or maybe just AID) abilities
>> > towards the local
>> > repo for a full build. Simply put: All writes to a local repo is done in a
>> > per-process-specific instance of the repo, that can be rolled back if the
>> > build fails (or pushed to the local repo if
>> > the build is ok)
>> >
>> > If you do that you can re-order the life-cycle for most builds to be
>> > something like this:
>> >
>> > validate
>> > compile
>> > package
>> > install
>> > test
>> > integration-test
>> > deploy
>> >
>> > Notice that I just moved all the "test" phases after the "install" phase.
>> > Theoretically you could start any subsequent modules immediately after
>> > "install" is done. Running of tests is really the big killer in most
>> > multi-module projects I see.
>> >
>> > Since your commit "push" towards the local repo only happens at the very end
>> > of the build, you
>> > will not publish artifacts when tests are failing (at leas not project
>> > output artifacts)
>> >
>> > You could actually make this a generic model that describes deifferent kinds
>> > of
>> > dependencies between lifecycle phases of different modules. The dependency I
>> > immediately
>> > see is "requiredForStarting" - which could be interpreted as meaning that
>> > any upstream
>> > dependencies must have reached at least that phase before the phase can be
>> > started
>> > for this project. I'm not sure if there's any value in a generic model, but
>> > my perspective
>> > may be limited to what I see on a daily basis.
>> >
>> > Would this be feasible ?
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>> For additional commands, e-mail: dev-help@maven.apache.org
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Posted by Kristian Rosenvold <kr...@gmail.com>.

I seem to understand that there's room for several different
types of solution here;

Starting with the single-machine solution; I now understand that 
you could start forking downstream builds straight after
compile in a reactor build, maybe after install in other cases.

In this scenario I think each module is dependant on all upstream
modules successfully achieving "install" before proceeding to "deploy".
I really think it's important to avoid leaking artifacts that do not
have its own (and all upstream) lifecycle requirements fulfilled.

When it comes to clustering there may be several approaches:
If you decide to publish artifacts through "deploy" to any kind
of repo I believe these require to have all lifecycle requirements met,
which at my current understanding seems orthogonal to local out-of-order
execution.

Wouldn't it be feasible to distribute the "local" and perhaps
"transacted local" repo inside the cluster using network
file sharing ? One would still have to solve serialization issues
and using installed artifacts in a reactor build..?  

The clustering case seems like a much harder task than achieving
full local concurrency. I did some fairly extensive measurements
with my current build when I set up concurrent spring/junit testing:

Missing concurrency in classloading is the most important reason
why unit tests run slowly (classloading is strictly a synchronized
business until jdk7). By running tests out-order on my local
unit test-build I am fairly certain I could reduce run-time
for "mvn clean install" to something much closer to "mvn
-Dmaven.test.skip=true clean install" (80->25 seconds in my case). 
This is even before I start parallelizing the individual modules.

I must confess that I've yet to see a build that really needs
clustering for any other reason than running tests or other individual
tasks (javadoc, site etc). I think I'd be inclined to just distributing 
those specific tasks in a cluster. If you actually had a decent model of
inter-lifecycle phase dependencies (requiredForStarting between phases),
you could probably achieve good results by keeping lifecycle execution 
centralized but ditributing plugin execution ?

I suppose I may be narrow-minded on this last one...

I will be starting to look at the DefaultLifeCycleExecutor with thoughts
of out-of-order execution, maybe dabble around a little.

Kristian

fr., 20.11.2009 kl. 06.29 -0800, skrev Dan Fabulich:
> I've been meaning to reply to your earlier emails (it's been a busy week); 
> to this I'll just say that moving the "test" phase after the "install" 
> phase is a fascinating idea, which I personally like, but it seems like a 
> big violation of the contract for the lifecycle, and I suspect it won't be 
> popular. :-(
> 
> I've long felt that there should be a phase for testing after "install" 
> for similar reasons.  This might be SLIGHTLY more popular since users 
> would need to explicitly cause their tests to run during this phase.
> 
> What about users doing multi-machine builds?  Earlier this week I wrote 
> that users desiring to do multi-machine parallelism should deploy their 
> builds to a remote repository shared between the machines.  Should their 
> tests run post-deploy?
> 
> -Dan
> 
> 
> Kristian Rosenvold wrote:
> 
> > I've been thinking further about parallelity within maven. The proposed
> > solution to MNG-3004
> > achieves parallelity by analyzing inter-module dependencies and scheduling
> > parallel dependencies in parallel.
> >
> > A simple further evolution of this would be to collect and download all
> > external dependencies
> > for all modules immediately.
> >
> > But this idea has been rummaging in my head while jogging for a week or so:
> >
> > Would it be possible to achieve super-parallelity by describing
> > relationships between phases of the build, and even reordering some of the
> > phases ? I'll try to explain:
> >
> > Assume that you can add transactional ACID (or maybe just AID) abilities
> > towards the local
> > repo for a full build. Simply put: All writes to a local repo is done in a
> > per-process-specific instance of the repo, that can be rolled back if the
> > build fails (or pushed to the local repo if
> > the build is ok)
> >
> > If you do that you can re-order the life-cycle for most builds to be
> > something like this:
> >
> > validate
> > compile
> > package
> > install
> > test
> > integration-test
> > deploy
> >
> > Notice that I just moved all the "test" phases after the "install" phase.
> > Theoretically you could start any subsequent modules immediately after
> > "install" is done. Running of tests is really the big killer in most
> > multi-module projects I see.
> >
> > Since your commit "push" towards the local repo only happens at the very end
> > of the build, you
> > will not publish artifacts when tests are failing (at leas not project
> > output artifacts)
> >
> > You could actually make this a generic model that describes deifferent kinds
> > of
> > dependencies between lifecycle phases of different modules. The dependency I
> > immediately
> > see is "requiredForStarting" - which could be interpreted as meaning that
> > any upstream
> > dependencies must have reached at least that phase before the phase can be
> > started
> > for this project. I'm not sure if there's any value in a generic model, but
> > my perspective
> > may be limited to what I see on a daily basis.
> >
> > Would this be feasible ?
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: MNG-3004/MNG-2802 - Achieving massive parallelity !

Posted by Kristian Rosenvold <kr...@gmail.com>.

I have now come a long way towards implementing a massively concurrent
maven.  To whet some appetites before ifs/but/whens/problems come up:

Using 16 threads on my Intel i7, build time for my daytime project went
from 87 seconds to 19.7 seconds. An average of 500% CPU was consumed to
deliver this, as can be seen from this figure:

 http://img403.imageshack.us/img403/6240/screenshotsystemmonitor.png

The bathtub in the middle of the graph is a full build.

A lot of the questions Dan Fabulich and others brought up in earlier
mails seem to be getting answers as the code speaks to me, and I'll try
to update the proposal with my tentative suggested solutions.

Although by no means finished, I invite others interested in the 
"cause" to join. Contributing can be as easy as trying it out on your
project and reporting an issue. Please use this issue tracker
http://github.com/krosenvold/maven3/issues

Currently I'm only at the stage where I can run a "install" over a
pre-existing build, I have not even tried a "clean install"

Howto:

git clone git://github.com/krosenvold/maven3.git

.. build and set up....


mvn -Dmaven.weave.threads=16 install


2x number of cores seem like a nice number of threads. If you run with 1
thread it should be fairly similar to your current build. Failures with
threads=1 are *really* welcome ;)

Those wishing to try to find issues themselves are welcome; it all
starts out at DefaultLifecycleExecutor.java.

Kristian


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org

Re: MNG-3004/MNG-2802 - Achieving massive parallelity ?

Posted by Dan Fabulich <da...@fabulich.com>.

I've been meaning to reply to your earlier emails (it's been a busy week); 
to this I'll just say that moving the "test" phase after the "install" 
phase is a fascinating idea, which I personally like, but it seems like a 
big violation of the contract for the lifecycle, and I suspect it won't be 
popular. :-(

I've long felt that there should be a phase for testing after "install" 
for similar reasons.  This might be SLIGHTLY more popular since users 
would need to explicitly cause their tests to run during this phase.

What about users doing multi-machine builds?  Earlier this week I wrote 
that users desiring to do multi-machine parallelism should deploy their 
builds to a remote repository shared between the machines.  Should their 
tests run post-deploy?

-Dan


Kristian Rosenvold wrote:

> I've been thinking further about parallelity within maven. The proposed
> solution to MNG-3004
> achieves parallelity by analyzing inter-module dependencies and scheduling
> parallel dependencies in parallel.
>
> A simple further evolution of this would be to collect and download all
> external dependencies
> for all modules immediately.
>
> But this idea has been rummaging in my head while jogging for a week or so:
>
> Would it be possible to achieve super-parallelity by describing
> relationships between phases of the build, and even reordering some of the
> phases ? I'll try to explain:
>
> Assume that you can add transactional ACID (or maybe just AID) abilities
> towards the local
> repo for a full build. Simply put: All writes to a local repo is done in a
> per-process-specific instance of the repo, that can be rolled back if the
> build fails (or pushed to the local repo if
> the build is ok)
>
> If you do that you can re-order the life-cycle for most builds to be
> something like this:
>
> validate
> compile
> package
> install
> test
> integration-test
> deploy
>
> Notice that I just moved all the "test" phases after the "install" phase.
> Theoretically you could start any subsequent modules immediately after
> "install" is done. Running of tests is really the big killer in most
> multi-module projects I see.
>
> Since your commit "push" towards the local repo only happens at the very end
> of the build, you
> will not publish artifacts when tests are failing (at leas not project
> output artifacts)
>
> You could actually make this a generic model that describes deifferent kinds
> of
> dependencies between lifecycle phases of different modules. The dependency I
> immediately
> see is "requiredForStarting" - which could be interpreted as meaning that
> any upstream
> dependencies must have reached at least that phase before the phase can be
> started
> for this project. I'm not sure if there's any value in a generic model, but
> my perspective
> may be limited to what I see on a daily basis.
>
> Would this be feasible ?
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org