You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@gump.apache.org by "Adam R. B. Jack" <aj...@trysybase.com> on 2004/06/30 15:48:29 UTC

Multithreading the updates

I took a break from the studying I ought be doing, in order to tinker with
multi-threading. Heck, I've suffered all this pain w/ Python (my own doing,
no doubt) so I might as well get some fun out of it. I like the results.

Since cvs|svn|whatever are typically network latency/IO bound, there is
quite a benefit (in terms of length of run) to having a few of these running
simultaneously. Since we update to a different directory per module, and
then sync to their separate working directories, with no overlaps, we have a
perfect chore for multithreading.

Basically the list of modules to update is generated from the list (in
order) of projects to build. We have a pool of worker threads (I made it 5,
but that may be too many, could be too few on Brutus) and this pool work in
addition to the main thread. The main thread works through the project list
and when it needs a module checked out, it either waits for the workers to
do it, or does it itself in it's own thread. [Python seems to have nice
simple lock and semaphore code. Look at gump.treads.tools if you like.]

I've added the following to the workspace so we can experiment with this:

        <!-- Numbers of *extra* threads, besides the main one. -->
        <threads updaters="5" builders="0" />

The 'risk' with this change  that we are running fair chunks of code
simultaneously & (currently) the few places the code could overlap (the run
lists) are not synchronized (since Python has no such concept on objects,
locks are separate). I think we are pretty lucky with this (considering)
because we have so few points of overlap, but *maybe* also Python is
reasonably thread-safe. I'm not sure yet, and over time I'll work on locking
more.

I'm not quite ready to do the same with building (there is more interaction
there, and thread work detection is a little trickier). I hope to get there,
primarily 'cos Brutus has two CPUs to play on.

I will leave this feature in on gump.try for a while, and give it some runs
in order to get comfortable with how reliable it is.

    http://gump.try.sybase.com/buildLog.html

Feedback welcomed.

regards,

Adam
--
Experience the Unwired Enterprise:
http://www.sybase.com/unwiredenterprise
Try Sybase: http://www.try.sybase.com


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org


Re: Multithreading the updates

Posted by "Adam R. B. Jack" <aj...@trysybase.com>.
> The code doesn't appear to be breaking (despite Python not locking in base
> classes like lists, and the Gump code not locking all that it might). I
> tried doing a Google search for locking in Python, and really found little
> "street wise" information. I found some updates from a Mr Stein (a gent,
not
> a gazillion miles from ASF ;-) but little that really told me what the
risks
> are w/ running large pieces of Python in separate threads. I guess we suck
> it and see.

We'll sucked ... and it sucked. ;-) Something was not playing nicely.

I believe the issue is likely to do with the fact that to launch a new
executable (CVS|SVN or whatever) Python Gump currently uses process global
CD and process global ENV modifications, then spawn's via system(). Clearly
that this is not thread safe. Basically I've always hated this portion of
the code, just never had chance/incentive to re-write it to those complex
alternatives. This might be the time.

Just giving a heads up as to why I disabled threads on the JDK1.5 and
gump.try servers...

regards,

Adam


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org


Re: Multithreading the updates

Posted by "Adam R. B. Jack" <aj...@trysybase.com>.
> this is a good idea, my experience with gump is also that the updates
> are taking ages.

Yeah, once I started down this path it became glaringly obvious this was a
good win. Most of the updates take time to figure out that little or nothing
needs updating, and those are great things to have spawned off. :)

The code doesn't appear to be breaking (despite Python not locking in base
classes like lists, and the Gump code not locking all that it might). I
tried doing a Google search for locking in Python, and really found little
"street wise" information. I found some updates from a Mr Stein (a gent, not
a gazillion miles from ASF ;-) but little that really told me what the risks
are w/ running large pieces of Python in separate threads. I guess we suck
it and see.

Over time I can make the locking totally tight, right now a few lists are
appended to [work done], a few variables are set [setting states of
modules], and these could occur simultaneously but (1) I don't know if that
will really hurt (2) the threads (CVS|SVN updates) are so large, the chances
of them hitting at the same time are pretty small.

All Pythonic insights welcomed...

regards,

Adam


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org


Re: Multithreading the updates

Posted by Antoine Levy-Lambert <an...@gmx.de>.
Hi Adam,

this is a good idea, my experience with gump is also that the updates 
are taking ages.

Cheers,

Antoine


Adam R. B. Jack wrote:

>I took a break from the studying I ought be doing, in order to tinker with
>multi-threading. Heck, I've suffered all this pain w/ Python (my own doing,
>no doubt) so I might as well get some fun out of it. I like the results.
>
>Since cvs|svn|whatever are typically network latency/IO bound, there is
>quite a benefit (in terms of length of run) to having a few of these running
>simultaneously. Since we update to a different directory per module, and
>then sync to their separate working directories, with no overlaps, we have a
>perfect chore for multithreading.
>
>Basically the list of modules to update is generated from the list (in
>order) of projects to build. We have a pool of worker threads (I made it 5,
>but that may be too many, could be too few on Brutus) and this pool work in
>addition to the main thread. The main thread works through the project list
>and when it needs a module checked out, it either waits for the workers to
>do it, or does it itself in it's own thread. [Python seems to have nice
>simple lock and semaphore code. Look at gump.treads.tools if you like.]
>
>I've added the following to the workspace so we can experiment with this:
>
>        <!-- Numbers of *extra* threads, besides the main one. -->
>        <threads updaters="5" builders="0" />
>
>The 'risk' with this change  that we are running fair chunks of code
>simultaneously & (currently) the few places the code could overlap (the run
>lists) are not synchronized (since Python has no such concept on objects,
>locks are separate). I think we are pretty lucky with this (considering)
>because we have so few points of overlap, but *maybe* also Python is
>reasonably thread-safe. I'm not sure yet, and over time I'll work on locking
>more.
>
>I'm not quite ready to do the same with building (there is more interaction
>there, and thread work detection is a little trickier). I hope to get there,
>primarily 'cos Brutus has two CPUs to play on.
>
>I will leave this feature in on gump.try for a while, and give it some runs
>in order to get comfortable with how reliable it is.
>
>    http://gump.try.sybase.com/buildLog.html
>
>Feedback welcomed.
>
>regards,
>
>Adam
>  
>



---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org


Re: Multithreading the updates

Posted by "Adam R. B. Jack" <aj...@apache.org>.
> could it be that cvs-level locking provides a nice filesystem-based sync?

The various threads all do CVS or SVN into completely separate (peer)
directories, so there really ought be no overlap at that point. The
'worklist' (where to get chores from) is synchronized [using Java term] with
a lock. The only real prospect for problems (right now) are (1) if a module
update fails and the code try to 'propagate the failure state' to dependees
(straying into other modules in the tree, that could be updated & changing
state) and (2) moving the module from the "to-do" to the "done" lists on the
run (the largest risk, IMHO) and pretty easy to fix.

We probably need to make 'state propagation across the dependency tree'
synchronized, so we can do multi-threaded builds, but that is a little
trickier to do. A brute force 'run lock' might be fine for this.

regards,

Adam


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org


Re: Multithreading the updates

Posted by Leo Simons <ls...@jicarilla.org>.
Adam R. B. Jack wrote:
> The 'risk' with this change  that we are running fair chunks of code
> simultaneously & (currently) the few places the code could overlap (the run
> lists) are not synchronized (since Python has no such concept on objects,
> locks are separate). I think we are pretty lucky with this (considering)
> because we have so few points of overlap, but *maybe* also Python is
> reasonably thread-safe. I'm not sure yet, and over time I'll work on locking
> more.

could it be that cvs-level locking provides a nice filesystem-based sync?

- LSD

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org