You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@gump.apache.org by Stefan Bodewig <bo...@apache.org> on 2010/06/21 06:34:07 UTC

Parallelism (was Re: A Few Plans)

On 2010-06-20, Sander Temme wrote:

> On Jun 18, 2010, at 10:15 AM, Stefan Bodewig wrote:

>> Hi,

>> I just wanted to share a few plans I have short/midterm.  Feel free to
>> comment, pick tasks or add wishes.

>> Honestly I have no idea how we could deal with the ever increasing build
>> times as Gump grows, apart from some sort of distributed Gump which I
>> wouldn't want to build on top of the current code base (I'd rather think
>> in a tuple spaces architecture like Mnesia and Erlang or JavaSpaces and
>> anything on the VM).

> When I come back from vacation, we'll be pressing into service a dual
> quad core Apple Xserve with 6Gb of memory.

Sounds great - enjoy your vacation.

> This would allow for a little more concurrency than we can do on the
> Zone or VM... of course we'd have to be able to address all of those
> cores.  Wonder whether Python has glue for Grand Central Dispatch?

Right now Gump is a controlling process that spins of new processes, so
doing more in parallel would mean run those processes in parallel -
nothing Python would need to support.

If you log into one of the machines while Gump is running, the system
feels sluggish and any opration that hits the file system takes ages
which makes me fear we are I/O bound rather than CPU bound - making
those cores do more may not help too much in that case.  I can certainly
be wrong.

IIRC Gump's trunk supports parallel SCM checkouts but we've restricted
it to a maximum of one updater because Adam saw problems - it's been a
long time.

Currently we don't support building things in parallel at all.  Starting
several Ant or make builds in parallel would likely do what you expect,
but I don't know how mvn would deal with multiple processes accessing
the same local repository (and writing to it) in parallel.

It may be possible to construct concurrency in a way that is more or
less safe so that long running and self-contained builds like test-ant
could be spun off but all mvn builds that accessed the same local repo
would get serialized.  Of course it would take somebody to write the
code 8-)

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org


Re: Parallelism

Posted by Stefan Bodewig <bo...@apache.org>.
On 2010-06-21, Leo Simons wrote:

> On 6/21/10 5:34 AM, Stefan Bodewig wrote:
>> Currently we don't support building things in parallel at all.  Starting
>> several Ant or make builds in parallel would likely do what you expect,
>> but I don't know how mvn would deal with multiple processes accessing
>> the same local repository (and writing to it) in parallel.

> I don't think there's any special code for it in maven, but
> nevertheless I've never really seen an issue with that. A common
> hudson deployment is to run 4-5 builds in parallel using a 'hudson'
> user that has one local repository.

I thought I had seen people raising issues about that, but it is likely
more a hypothetic race condition.

> PS: no I'm not dead just persistently e-mail overloaded :)

Good to know.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org


Re: Parallelism (was Re: A Few Plans)

Posted by Leo Simons <ma...@leosimons.com>.
On 6/21/10 5:34 AM, Stefan Bodewig wrote:
>> This would allow for a little more concurrency than we can do on the
>> Zone or VM... of course we'd have to be able to address all of those
>> cores.  Wonder whether Python has glue for Grand Central Dispatch?

Most of the weight in gump runs is inside the java processes; the other 
half of the latency is svn checkouts/updates.

For the former, you'd need (a) the JVM to hook up GCD (which is for 
apple to do) and (b) maven to do more stuff in parallel (which is on the 
charts for maven 3 I think).

For the latter, IIRC we have code to run more checkouts in parallel, but 
the code is buggy in gump2; and would mean a load increase on the svn 
server which may not be a good thing?

> If you log into one of the machines while Gump is running, the system
> feels sluggish and any opration that hits the file system takes ages
> which makes me fear we are I/O bound rather than CPU bound - making
> those cores do more may not help too much in that case.  I can certainly
> be wrong.

You're absolutely right. Builds are almost always I/O bound and you'll 
see a lot of CPU is actually iowait -- so the numbers in top are usually 
misleading and most of CPU busy-ness is due to overhead of waiting for IO.

> IIRC Gump's trunk supports parallel SCM checkouts but we've restricted
> it to a maximum of one updater because Adam saw problems - it's been a
> long time.

Oh, eh, yep, so that's what I remember too :)

In any case to take advantage of multicores it'd be good to re-implement 
parallelism using python's multiprocess module.

> Currently we don't support building things in parallel at all.  Starting
> several Ant or make builds in parallel would likely do what you expect,
> but I don't know how mvn would deal with multiple processes accessing
> the same local repository (and writing to it) in parallel.

I don't think there's any special code for it in maven, but nevertheless 
I've never really seen an issue with that. A common hudson deployment is 
to run 4-5 builds in parallel using a 'hudson' user that has one local 
repository.


cheers,


Leo

PS: no I'm not dead just persistently e-mail overloaded :)

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org