You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Marvin Humphrey <ma...@rectangular.com> on 2010/11/16 03:34:12 UTC

[lucy-dev] Parallel compilation

Greets,

My laptop has two cores, but the Lucy build process is single threaded and
doesn't take advantage of the second processor.

I hacked up the patch below for trunk/perl/buildlib/Lucy/Build.pm to try to
speed things up.  It forks off a max of 4 child processes which compile up to
10 C files each.  Here are before-and-after results for "time ./Build code":

    BEFORE:
    real    2m37.562s
    user    2m18.278s
    sys     0m19.448s

    AFTER:
    real    1m49.056s
    user    2m16.673s
    sys     0m20.221s

A nice gain... However, the patch isn't ready for primetime because it doesn't
handle compilation failure gracefully, check for number of CPU cores, or fall
back to single-threaded mode when fork() isn't available.  

Any suggestions about the approach?  If not, I'll pursue this further and fix
the problems when I find another round tuit.

Marvin Humphrey


Index: buildlib/Lucy/Build.pm
===================================================================
--- buildlib/Lucy/Build.pm  (revision 1035418)
+++ buildlib/Lucy/Build.pm  (working copy)
@@ -71,6 +71,8 @@
 use Env qw( @PATH );
 use Fcntl;
 use Carp;
+use POSIX qw( WNOHANG );
+use Time::HiRes qw( sleep );
 
 BEGIN { unshift @PATH, curdir() }
 
@@ -491,7 +493,7 @@
     );
     my @objects;
 
-    # Compile C source files.
+    # Gather C source files, generate list of object files.
     my $c_files = [];
     push @$c_files, @{ $self->rscan_dir( $CORE_SOURCE_DIR,     qr/\.c$/ ) };
     push @$c_files, @{ $self->rscan_dir( $XS_SOURCE_DIR,       qr/\.c$/ ) };
@@ -502,16 +504,50 @@
         my $o_file = $c_file;
         $o_file =~ s/\.c/$Config{_o}/;
         push @objects, $o_file;
-        next if $self->up_to_date( $c_file, $o_file );
         $self->add_to_cleanup($o_file);
-        $cbuilder->compile(
-            source               => $c_file,
-            extra_compiler_flags => $self->extra_ccflags,
-            include_dirs         => \@include_dirs,
-            object_file          => $o_file,
-        );
     }
 
+    # Compile in multiple child processes to take advantage of multi-CPU
+    # machines.
+    my @children;
+    my $MAX_CHILDREN = 4;
+    for ( my $i = 0; $i < scalar @$c_files; $i += 10 ) {
+        my $pid = fork();
+        if ( !defined $pid ) {
+            die "Fork failed $!\n";
+        }
+        elsif ($pid) {
+            # Parent...
+            push( @children, $pid );
+        }
+        elsif ( $pid == 0 ) {
+            for ( my $j = $i; $j < $i + 10; $j++ ) {
+                # Child...
+                my $c_file = $c_files->[$j];
+                next unless $c_file;
+                my $o_file = $c_file;
+                $o_file =~ s/\.c/$Config{_o}/;
+                next if $self->up_to_date( $c_file, $o_file );
+
+                $cbuilder->compile(
+                    source               => $c_file,
+                    extra_compiler_flags => $self->extra_ccflags,
+                    include_dirs         => \@include_dirs,
+                    object_file          => $o_file,
+                );
+            }
+            exit(0);
+        }
+        while ( _active_kids( \@children ) >= $MAX_CHILDREN ) {
+            sleep(1);
+        }
+    }
+
+    # Wait for the last few compiles to finish.
+    foreach (@children) {
+        waitpid( $_, 0 );
+    }
+
     # .xs => .c
     my $perl_binding_c_file = "lib/Lucy.c";
     $self->add_to_cleanup($perl_binding_c_file);
@@ -571,6 +607,17 @@
     }
 }
 
+sub _active_kids {
+    my $pids = shift;
+    my $active = 0;
+    for my $pid (@$pids) {
+        my $status = waitpid($pid, WNOHANG);
+        next unless $status == 0;
+        $active++;
+    }
+    return $active;
+}
+
 sub ACTION_code {
     my $self = shift;
 


Re: [lucy-dev] Parallel compilation

Posted by Peter Karman <pe...@peknet.com>.
Marvin Humphrey wrote on 11/18/10 12:24 PM:

> 
> However, I think that instead of applying this patch, we may want to
> prioritize transparency and portability over build speed and work up an
> alternative patch that begins a transition to Makefiles.  It's nice to knock
> 30 seconds off the build time (at the cost of some minor complexity), but IMO
> it's more important to move towards a build system that everyone can use and
> understand.
> 

+1

-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

Re: [lucy-dev] Parallel compilation

Posted by Nathan Kurz <na...@verse.com>.
On Thu, Nov 18, 2010 at 10:24 AM, Marvin Humphrey
<ma...@rectangular.com> wrote:
> However, I think that instead of applying this patch, we may want to
> prioritize transparency and portability over build speed and work up an
> alternative patch that begins a transition to Makefiles.  It's nice to knock
> 30 seconds off the build time (at the cost of some minor complexity), but IMO
> it's more important to move towards a build system that everyone can use and
> understand.

+1 to transparency.   Build speed is never a bad thing, but is not
currently a limiting factor in the number of contributors to this
project.

--nate

Re: [lucy-dev] Parallel compilation

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Tue, Nov 16, 2010 at 07:51:43AM -0500, Robert Muir wrote:
> couldnt you instead just split all the .c files into 4 pieces, and
> only have 4 children up front?

I've reworked the patch to incorporate the improvements that have come out of
this thread, and I've attached it to a new issue:
<https://issues.apache.org/jira/browse/LUCY-127>.

However, I think that instead of applying this patch, we may want to
prioritize transparency and portability over build speed and work up an
alternative patch that begins a transition to Makefiles.  It's nice to knock
30 seconds off the build time (at the cost of some minor complexity), but IMO
it's more important to move towards a build system that everyone can use and
understand.

Marvin Humphrey


Re: [lucy-dev] Slow migration to Makefiles

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Marvin,


On Nov 16, 2010, at 12:16 PM, Marvin Humphrey wrote:

> On Tue, Nov 16, 2010 at 09:39:20AM -0800, Mattmann, Chris A (388J) wrote:
> 
>>> Over time, we should expect to migrate a lot of the build structure to
>>> Makefiles.  I hate make, but it's the lowest common denominator.
>> 
>> Is that strictly true? I mean, the reality is whatever you could do in make,
>> could be done in e.g., Ant, right? 
> 
> Sure -- but we could also do everything in Perl/Ruby/Python/etc.  The primary
> advantage that Make has over Ant and all of those is that it's already there
> on every system.  

It seems to me that Ant is on most *nix systems (just like Make), it's just whether it's configured or not by default or enabled for an install package.

> 
> The point of migrating to Makefiles would be to share build routines across
> host bindings.  Building Lucy for C or Python shouldn't require Perl, or Java,
> or whatever.  For now, we have to put up with a Perl dependency, but I would
> like to eliminate that, at least for simple building of the library as a user
> would.  (Developers will continue to have to deal with the Perl dependency,
> but after I rewrite Clownfish::Parser to be based on the Lemon parser
> generator rather than Parse::RecDescent, they'll only need core Perl.)

Hmmm, possibly, though I wouldn't liken building Lucy to PLs -- that doesn't seem like a fair comparison. In reality, building Lucy likely requires some "build tool", be it Make, or Ant, or Python buildouts, or setuptools, or whatever, independent of whatever PL Lucy is implemented in.

> 
> We don't want to depend on Make exclusively for the build IMO -- the
> contortions necessary for cross-platform compatibility when solving complex
> problems aren't worth it.  Instead, I think we should keep the Makefiles
> simple, but use scripts to generate input for them.  Probably such dev helper
> scripts will continue to be written in Perl, like the update_snowstem.pl I
> just added last week.

Build scripts generating other build scripts just seems like bloat to me. Of course, like i mentioned I haven't stepped up to rewrite the Lucy build system, so I'm just pontificating at this point. ;)

> 
> I don't think it's reasonable to expect Simon or Robert to fully grok a
> sophisticated Module::Build subclass like trunk/perl/buildlib/Lucy/Build.pm.
> However, I do think that it's reasonable to expect Lucy committers to
> understand shared Makefiles, and I also think it's reasonable to expect them
> to understand simple Perl scripts like update_snowstem.pl.

Seems to me that the complexity of either one of your two examples (Module::Bulid or Make) aren't too far apart. Also, SImon *is* a Lucy committer, so we should definitely make sure he gets how to build the project, especially if we want to grow beyond you, Nate, or Peter being the only folks who know Lucy or what's going on with it...

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: [lucy-dev] Slow migration to Makefiles

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Thanks for the feedback, Robert...

Cheers,
Chris

On Nov 17, 2010, at 11:22 AM, Robert Muir wrote:

> On Wed, Nov 17, 2010 at 2:08 PM, Nathan Kurz <na...@verse.com> wrote:
>> On Tue, Nov 16, 2010 at 12:31 PM, Mattmann, Chris A (388J)
>> <ch...@jpl.nasa.gov> wrote:
>>> Hmm, my 2 cents is that it's infinitely simpler to understand a build.xml file (or better yet a Maven pom.xml :) -- just my opinion people no tomatoes!) than it is to understand makefiles, or better yet, programs that generate makefiles on the fly, or that generate other build scripts on the fly etc etc.
>> 
>> I much prefer Make to all alternatives.  Lucy is at base a C project,
>> and Make is the standard for C.  Certainly other things can work, but
>> most anything else causes me about the same amount of alarm as a
>> project that has only a README.doc in Word format.
>> 
> 
> +1. I've worked a lot with ant on the lucene-java project, and it
> sucks. in fact all build system suck. its just a matter of what 'sucks
> less' for what you are trying to do.
> 
> For java, ant sucks less than Make. Especially since there are tools
> built around it (including ones distributed with ant) for things like
> junit test integration.
> 
> I think Make is a better path for a project like Lucy that isn't
> java-oriented but more of the unix/C mentality. This is because there
> are tools built around Make (including yes, things like autoconf) for
> the C environment.
> 
> For example: since we have been discussing snowball recently, I had to
> add several new languages to a customized snowball build and i found
> this to be completely painless with snowball's make build.
> 
> I don't think we need to have a "build system war" discussion, since
> its like editors, everyone has their own opinions. But how many C
> projects do you see using make? Basically all of them.


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: [lucy-dev] Slow migration to Makefiles

Posted by "Henry C." <he...@cityweb.co.za>.
On Wed, November 17, 2010 21:22, Robert Muir wrote:
> But how many C projects do you see using make? Basically all of them.

or conversely, how many C projects do you see not using make?  Basically none.


Re: [lucy-dev] Slow migration to Makefiles

Posted by Robert Muir <rc...@gmail.com>.
On Wed, Nov 17, 2010 at 2:08 PM, Nathan Kurz <na...@verse.com> wrote:
> On Tue, Nov 16, 2010 at 12:31 PM, Mattmann, Chris A (388J)
> <ch...@jpl.nasa.gov> wrote:
>> Hmm, my 2 cents is that it's infinitely simpler to understand a build.xml file (or better yet a Maven pom.xml :) -- just my opinion people no tomatoes!) than it is to understand makefiles, or better yet, programs that generate makefiles on the fly, or that generate other build scripts on the fly etc etc.
>
> I much prefer Make to all alternatives.  Lucy is at base a C project,
> and Make is the standard for C.  Certainly other things can work, but
> most anything else causes me about the same amount of alarm as a
> project that has only a README.doc in Word format.
>

+1. I've worked a lot with ant on the lucene-java project, and it
sucks. in fact all build system suck. its just a matter of what 'sucks
less' for what you are trying to do.

For java, ant sucks less than Make. Especially since there are tools
built around it (including ones distributed with ant) for things like
junit test integration.

I think Make is a better path for a project like Lucy that isn't
java-oriented but more of the unix/C mentality. This is because there
are tools built around Make (including yes, things like autoconf) for
the C environment.

For example: since we have been discussing snowball recently, I had to
add several new languages to a customized snowball build and i found
this to be completely painless with snowball's make build.

I don't think we need to have a "build system war" discussion, since
its like editors, everyone has their own opinions. But how many C
projects do you see using make? Basically all of them.

Re: [lucy-dev] Slow migration to Makefiles

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Thanks Nate, cool, alright, so it sounds like the 3 of you are fine with make and if it's a simple make ; make all ; make install (probably with a sudo in there somewhere ;) ) then even I could get it right (which is saying something!) :)

Would be nice to know what Simon thinks since he's a Lucy committer now too...

Cheers,
Chris

On Nov 17, 2010, at 11:08 AM, Nathan Kurz wrote:

> On Tue, Nov 16, 2010 at 12:31 PM, Mattmann, Chris A (388J)
> <ch...@jpl.nasa.gov> wrote:
>> Hmm, my 2 cents is that it's infinitely simpler to understand a build.xml file (or better yet a Maven pom.xml :) -- just my opinion people no tomatoes!) than it is to understand makefiles, or better yet, programs that generate makefiles on the fly, or that generate other build scripts on the fly etc etc.
> 
> I much prefer Make to all alternatives.  Lucy is at base a C project,
> and Make is the standard for C.  Certainly other things can work, but
> most anything else causes me about the same amount of alarm as a
> project that has only a README.doc in Word format.
> 
>> Ant is available on nearly every Linux distribution that I've come across in recent years (installed into /usr/bin/ant or some variant).
> 
> I don't recall the details, but I recently tried to install Ant on my
> current desktop (Linux Slamd64) and gave up.  I'll do it from source
> at some point, but think it's silly that I'm not able to make updates
> to the Lucy project page until then.  My initial impressions of Ant
> are hence quite negative.
> 
>> That said, these are just my preferences (as are Marvin's for Make/programs that generate makes and so forth :) ). What do others think? The key question to ask yourselves is:
>> 
>> 1. will Marvin be the *only* RM that this project ever sees?
> 
> Had to look up RM.  No, presumably there will be other Release
> Managers so that Marvin can spend his time on areas more demanding of
> his particular expertise.
> 
>> 2. will Marvin be the *only* person building this project, ever?
> 
> No, I presume that some significant percentage of users will be
> building this.  The bar should be pretty low, roughly equivalent to
> 'make config; make all; make install'.
> 
>> 3. of the 2-3 existing Lucy developers, what are the preferences? I know Marvin's: what about Peter/Nate?
> 
> Make without reliance on autoconf or other impenetrable junk.  The
> general approach Marvin is currently using seems fine, although
> removing reliance on Perl seems good.   I want something short that
> can be clearly understood in it's entirety.
> 
> --nate
> 
> ps.  My feelings on Make are reasonably echoed here:
> http://blog.jgc.org/2010/11/things-make-got-right-and-how-to-make.html


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: [lucy-dev] Slow migration to Makefiles

Posted by Nathan Kurz <na...@verse.com>.
On Tue, Nov 16, 2010 at 12:31 PM, Mattmann, Chris A (388J)
<ch...@jpl.nasa.gov> wrote:
> Hmm, my 2 cents is that it's infinitely simpler to understand a build.xml file (or better yet a Maven pom.xml :) -- just my opinion people no tomatoes!) than it is to understand makefiles, or better yet, programs that generate makefiles on the fly, or that generate other build scripts on the fly etc etc.

I much prefer Make to all alternatives.  Lucy is at base a C project,
and Make is the standard for C.  Certainly other things can work, but
most anything else causes me about the same amount of alarm as a
project that has only a README.doc in Word format.

> Ant is available on nearly every Linux distribution that I've come across in recent years (installed into /usr/bin/ant or some variant).

I don't recall the details, but I recently tried to install Ant on my
current desktop (Linux Slamd64) and gave up.  I'll do it from source
at some point, but think it's silly that I'm not able to make updates
to the Lucy project page until then.  My initial impressions of Ant
are hence quite negative.

> That said, these are just my preferences (as are Marvin's for Make/programs that generate makes and so forth :) ). What do others think? The key question to ask yourselves is:
>
> 1. will Marvin be the *only* RM that this project ever sees?

Had to look up RM.  No, presumably there will be other Release
Managers so that Marvin can spend his time on areas more demanding of
his particular expertise.

> 2. will Marvin be the *only* person building this project, ever?

No, I presume that some significant percentage of users will be
building this.  The bar should be pretty low, roughly equivalent to
'make config; make all; make install'.

> 3. of the 2-3 existing Lucy developers, what are the preferences? I know Marvin's: what about Peter/Nate?

Make without reliance on autoconf or other impenetrable junk.  The
general approach Marvin is currently using seems fine, although
removing reliance on Perl seems good.   I want something short that
can be clearly understood in it's entirety.

--nate

ps.  My feelings on Make are reasonably echoed here:
http://blog.jgc.org/2010/11/things-make-got-right-and-how-to-make.html

Re: [lucy-dev] Slow migration to Makefiles

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Peter,


On Nov 16, 2010, at 2:01 PM, Peter Karman wrote:

> Mattmann, Chris A (388J) wrote on 11/16/2010 02:31 PM:
>> Hmm, my 2 cents is that it's infinitely simpler to understand a
>> build.xml file (or better yet a Maven pom.xml :) -- just my opinion
>> people no tomatoes!) than it is to understand makefiles, or better
>> yet, programs that generate makefiles on the fly, or that generate
>> other build scripts on the fly etc etc.
>> 
>> Ant is available on nearly every Linux distribution that I've come
>> across in recent years (installed into /usr/bin/ant or some variant).
>> 
> 
> A quick check of the 2 CentOS dists I have available (5.3 and 5.4)
> reveals that neither have ant installed. Both have make. Ant is, of
> course, available as an installable package. But it's not part of the
> standard build tools, afaik.

Well, whether it's "part of the standard build tools" is up for debate. If it's available as a package that's classified in that category then I would say it's "part of the standard build tools" just not "installed by default". But that's just semantics. I build CentOS systems a lot (have way too many sitting in a closet at home :) ) and I select most of the installable packages including Ant.

> 
>> 
>> That said, these are just my preferences (as are Marvin's for
>> Make/programs that generate makes and so forth :) ). What do others
>> think? The key question to ask yourselves is:
>> 
>> 1. will Marvin be the *only* RM that this project ever sees? 
> 
> No. I have done KS releases; I expect to do Lucy too.

Great.

> 
> 2. will
>> Marvin be the *only* person building this project, ever? 
> 
> No.

+1

> 
> 3. of the
>> 2-3 existing Lucy developers, what are the preferences? I know
>> Marvin's: what about Peter/Nate? 
> 
> Make is my pref. For the reasons Marvin states.
> 
> I don't expect our Makefiles to be complicated. I expect them to
> delegate to more sophisticated, generated scripts, as Marvin suggests.

Okey dok, welp, so long as others are in agreement with this, I'm fine with it, +1...

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
Phone: +1 (818) 354-8810
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: [lucy-dev] Slow migration to Makefiles

Posted by Peter Karman <pe...@peknet.com>.
Mattmann, Chris A (388J) wrote on 11/16/2010 02:31 PM:
> Hmm, my 2 cents is that it's infinitely simpler to understand a
> build.xml file (or better yet a Maven pom.xml :) -- just my opinion
> people no tomatoes!) than it is to understand makefiles, or better
> yet, programs that generate makefiles on the fly, or that generate
> other build scripts on the fly etc etc.
> 
> Ant is available on nearly every Linux distribution that I've come
> across in recent years (installed into /usr/bin/ant or some variant).
> 

A quick check of the 2 CentOS dists I have available (5.3 and 5.4)
reveals that neither have ant installed. Both have make. Ant is, of
course, available as an installable package. But it's not part of the
standard build tools, afaik.

> 
> That said, these are just my preferences (as are Marvin's for
> Make/programs that generate makes and so forth :) ). What do others
> think? The key question to ask yourselves is:
> 
> 1. will Marvin be the *only* RM that this project ever sees? 

No. I have done KS releases; I expect to do Lucy too.

2. will
> Marvin be the *only* person building this project, ever? 

No.

3. of the
> 2-3 existing Lucy developers, what are the preferences? I know
> Marvin's: what about Peter/Nate? 

Make is my pref. For the reasons Marvin states.

I don't expect our Makefiles to be complicated. I expect them to
delegate to more sophisticated, generated scripts, as Marvin suggests.


-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

RE: [lucy-dev] Slow migration to Makefiles

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hmm, my 2 cents is that it's infinitely simpler to understand a build.xml file (or better yet a Maven pom.xml :) -- just my opinion people no tomatoes!) than it is to understand makefiles, or better yet, programs that generate makefiles on the fly, or that generate other build scripts on the fly etc etc. 

Ant is available on nearly every Linux distribution that I've come across in recent years (installed into /usr/bin/ant or some variant). 

That said, these are just my preferences (as are Marvin's for Make/programs that generate makes and so forth :) ). What do others think? The key question to ask yourselves is: 

1. will Marvin be the *only* RM that this project ever sees?
2. will Marvin be the *only* person building this project, ever?
3. of the 2-3 existing Lucy developers, what are the preferences? I know Marvin's: what about Peter/Nate?
4. of the 1 new Lucy committers as part of Apache Lucy in the Incubator (e.g., Simon?)
5. are the mentors ever going to build and use this system? Or scarier yet, maintain it? My answer on that is that at some point I'd like to build it myself and understand it, but sophisticated Makefiles are not my cup of tea.

The community right now is small so it will be very driven by whomever picks up the shovel and starts to dig the hole, but it would be nice if the tool used to dig that hole is something that not only Marvin can wield...

Cheers,
Chris
________________________________________
From: Marvin Humphrey [marvin@rectangular.com]
Sent: Tuesday, November 16, 2010 12:16 PM
To: lucy-dev@incubator.apache.org
Subject: [lucy-dev] Slow migration to Makefiles

On Tue, Nov 16, 2010 at 09:39:20AM -0800, Mattmann, Chris A (388J) wrote:

> > Over time, we should expect to migrate a lot of the build structure to
> > Makefiles.  I hate make, but it's the lowest common denominator.
>
> Is that strictly true? I mean, the reality is whatever you could do in make,
> could be done in e.g., Ant, right?

Sure -- but we could also do everything in Perl/Ruby/Python/etc.  The primary
advantage that Make has over Ant and all of those is that it's already there
on every system.

The point of migrating to Makefiles would be to share build routines across
host bindings.  Building Lucy for C or Python shouldn't require Perl, or Java,
or whatever.  For now, we have to put up with a Perl dependency, but I would
like to eliminate that, at least for simple building of the library as a user
would.  (Developers will continue to have to deal with the Perl dependency,
but after I rewrite Clownfish::Parser to be based on the Lemon parser
generator rather than Parse::RecDescent, they'll only need core Perl.)

We don't want to depend on Make exclusively for the build IMO -- the
contortions necessary for cross-platform compatibility when solving complex
problems aren't worth it.  Instead, I think we should keep the Makefiles
simple, but use scripts to generate input for them.  Probably such dev helper
scripts will continue to be written in Perl, like the update_snowstem.pl I
just added last week.

I don't think it's reasonable to expect Simon or Robert to fully grok a
sophisticated Module::Build subclass like trunk/perl/buildlib/Lucy/Build.pm.
However, I do think that it's reasonable to expect Lucy committers to
understand shared Makefiles, and I also think it's reasonable to expect them
to understand simple Perl scripts like update_snowstem.pl.

Marvin Humphrey


[lucy-dev] Slow migration to Makefiles

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Tue, Nov 16, 2010 at 09:39:20AM -0800, Mattmann, Chris A (388J) wrote:

> > Over time, we should expect to migrate a lot of the build structure to
> > Makefiles.  I hate make, but it's the lowest common denominator.
> 
> Is that strictly true? I mean, the reality is whatever you could do in make,
> could be done in e.g., Ant, right? 

Sure -- but we could also do everything in Perl/Ruby/Python/etc.  The primary
advantage that Make has over Ant and all of those is that it's already there
on every system.  

The point of migrating to Makefiles would be to share build routines across
host bindings.  Building Lucy for C or Python shouldn't require Perl, or Java,
or whatever.  For now, we have to put up with a Perl dependency, but I would
like to eliminate that, at least for simple building of the library as a user
would.  (Developers will continue to have to deal with the Perl dependency,
but after I rewrite Clownfish::Parser to be based on the Lemon parser
generator rather than Parse::RecDescent, they'll only need core Perl.)

We don't want to depend on Make exclusively for the build IMO -- the
contortions necessary for cross-platform compatibility when solving complex
problems aren't worth it.  Instead, I think we should keep the Makefiles
simple, but use scripts to generate input for them.  Probably such dev helper
scripts will continue to be written in Perl, like the update_snowstem.pl I
just added last week.

I don't think it's reasonable to expect Simon or Robert to fully grok a
sophisticated Module::Build subclass like trunk/perl/buildlib/Lucy/Build.pm.
However, I do think that it's reasonable to expect Lucy committers to
understand shared Makefiles, and I also think it's reasonable to expect them
to understand simple Perl scripts like update_snowstem.pl.

Marvin Humphrey


Re: [lucy-dev] Parallel compilation

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Marvin,

On 11/16/10 9:25 AM, "Marvin Humphrey" <ma...@rectangular.com> wrote:

> 
> Over time, we should expect to migrate a lot of the build structure to
> Makefiles.  I hate make, but it's the lowest common denominator.

Is that strictly true? I mean, the reality is whatever you could do in make,
could be done in e.g., Ant, right? Of course not that I have the time to
rewrite the build system in Ant (or the desire), but I just wanted to
clarify...

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



Re: [lucy-dev] Parallel compilation

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Tue, Nov 16, 2010 at 07:51:43AM -0500, Robert Muir wrote:
> Sorry, I am a perl dummy, so forgive me if i interpreted your
> statements/code wrong :)

Over time, we should expect to migrate a lot of the build structure to
Makefiles.  I hate make, but it's the lowest common denominator.

> why is it 10 C files each?

It was originally 1 C file each. :)  That incurred waaaay too much fork()
overhead and CPU contention -- it was slower than a single-threaded build!

I hacked each fork to compile 10 files, and that was good enough to
effectively eliminate contention as a concern, achieve some nice gains, and
present the results to the list as a draft.  But 10 was an arbitrary number.

> wouldn't this require some wasted fork() overhead respawning many children?
> couldnt you instead just split all the .c files into 4 pieces, and
> only have 4 children up front?

I like it. :)  I think we should do as you propose, and spawn a fixed number
of child processes -- perhaps deriving the number of children from the CPU
count when we can figure it out and falling back to 4 when we can't.

We can then have each child communicate back to the parent process by writing
a status file upon successful exit.  The parent process can monitor the
children once per second or so, and terminate if a child exits without
communicating success.

There are other IPC methods we could use for monitoring how compilation is
proceeding in the child processes, but a combination of fork() and the file
system seems like it would be easiest to grok and maintain.

> p.s. we do a similar thing in Lucene-java with running tests, and a
> problem can be balancing the workload with the children.

Yes, that was what got me thinking.  I watched "top" and confirmed that GCC
wasn't taking advantage of multiple cores on its own.

> one heuristic about how long a test will take to run, or file will
> take to compile, is its length in bytes.
> it might be useful to sort the list of files by their size in bytes,
> and use mod to divide them up.

For now, I think modulus will be good enough.  It also has the advantage of
compiling files in roughly the same order that they are compiled in now --
which is useful, because the files most likely to fail are up front.

There's only one giganto file, the C file which has all the XS bindings in it.
It takes about 30 seconds to compile on my laptop, and exhausts the memory on
some systems.  Until we break up that file, it should only be run in the
parent process, so that it's not competing for resources with anything else.

Marvin Humphrey


Re: [lucy-dev] Parallel compilation

Posted by Robert Muir <rc...@gmail.com>.
On Mon, Nov 15, 2010 at 9:34 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
> Greets,
>
> My laptop has two cores, but the Lucy build process is single threaded and
> doesn't take advantage of the second processor.
>
> I hacked up the patch below for trunk/perl/buildlib/Lucy/Build.pm to try to
> speed things up.  It forks off a max of 4 child processes which compile up to
> 10 C files each.  Here are before-and-after results for "time ./Build code":

Sorry, I am a perl dummy, so forgive me if i interpreted your
statements/code wrong :)
why is it 10 C files each?
wouldn't this require some wasted fork() overhead respawning many children?
couldnt you instead just split all the .c files into 4 pieces, and
only have 4 children up front?

p.s. we do a similar thing in Lucene-java with running tests, and a
problem can be balancing the workload with the children.
one heuristic about how long a test will take to run, or file will
take to compile, is its length in bytes.
it might be useful to sort the list of files by their size in bytes,
and use mod to divide them up.