You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openoffice.apache.org by Andre Fischer <aw...@gmail.com> on 2014/01/29 10:18:45 UTC

Bottom up build

I would like to report some observations that I made when thinking about 
how to make building OpenOffice with one global makefile feasible.  It 
will probably the last of build related mails in the near future.

Traditional make uses a top-down approach.  It starts with a target, 
'all' by default, and looks at its dependencies.  When one of those has 
to be made or is newer then the target then the target also has to be 
made.  This is done recursively and depth-first.  Every file on which 
'all' has a direct or indirect dependency has to be checked.  If we 
would build OpenOffice with one makefile (included makefiles don't 
count) then that are a lot of files to check.  There are about 9800 cxx 
files, 3500 c files, 12000 hxx files, and lot of external headers.  
Checking the modification time of so many files is one of the reasons 
for the delay in , say, sw/ between starting make and its first action.

As I don't have all global dependencies in a format that would allow 
experimation, I tried how long it would take to get the mtime (time of 
last modification) for all files, source, generated, compiled, linked, 
about 120000.  I wrote a small Java program for that.  With a warm cache 
that takes about 23s.  When run in 4 threads this reduced to less than 
8s.  Could be worse.

But it also could be better because in general there are only a few 
files modified, usually files that you modified yourself in an editor.  
There is another approach, introduced, as far as I know, by the tup [1] 
build tool, that is bottom up.  If you had something similar to the 
oracle of complexity theory, that gave you the list of modified files 
since the last build, you could find the depending files much faster.  
Faster for two reasons. Firstly, there is only one path in the 
dependency tree up towards the root (while there are many down from the 
root).  Only targets on this path are affected by the modified file. 
Secondly, the dependency analysis is comparatively cheap.  The expensive 
part is to determine the file modification times.  If they where 
miraculously given then even the top-down approach would not take 
noticably longer.

So I made another experiment to see if such an oracle can be created.  
Java 7 has the java.nio.file.WatchService that lets you monitor file 
modfifications in one directory.  I registered it to all directories in 
our source tree (some 16000 directories).  With the WatchService in 
place every file modification can be recorded and stored for later.  On 
the next build you only have to check the set of modified files, not all 
files.  Registering the directory watchers takes a couple of seconds but 
after that it does not cause any noticeable CPU activity. Any file 
modifications are reported almost at once.  I do not have the framework 
in place to start a build with this information but I would expect it to 
be as fast as compiling the modified files and linking takes.

The tup website references a paper [2] in which the established top-down 
approaches are called alpha alogithms and the new bottom-up approach is 
called beta algorithm. Tup has implemented a file modification watcher 
(in C or C++) only for Linux.  On Windows it just scans all files (for 
which it needs a little more time than my Java program, maybe it does 
not use more than one thread).


This is something that we should keep in mind for when we ever should 
get a build solution with global dependencies and this build tool would 
turn out to be too slow.


If can find the source code of my Java experiments at [3]. If nothing 
else you can see an application of the ForkJoinPool that allowed my to 
write the parallel file system scan in just a few lines.  There is also 
an alternative implementation that uses the ExecutorService (with a 
fixed thread pool) which needs a few more lines of code.  And there is 
of course the use of the WatchService.


Regards,
Andre


[1] http://gittup.org/tup/
[2] http://gittup.org/tup/build_system_rules_and_algorithms.pdf
[3] http://people.apache.org/~af/test.zip

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org


Re: Bottom up build

Posted by jan i <ja...@apache.org>.
On 30 January 2014 23:10, Rob Weir <ro...@apache.org> wrote:

> On Wed, Jan 29, 2014 at 4:18 AM, Andre Fischer <aw...@gmail.com> wrote:
> > I would like to report some observations that I made when thinking about
> how
> > to make building OpenOffice with one global makefile feasible.  It will
> > probably the last of build related mails in the near future.
> >
> > Traditional make uses a top-down approach.  It starts with a target,
> 'all'
> > by default, and looks at its dependencies.  When one of those has to be
> made
> > or is newer then the target then the target also has to be made.  This is
> > done recursively and depth-first.  Every file on which 'all' has a
> direct or
> > indirect dependency has to be checked.  If we would build OpenOffice with
> > one makefile (included makefiles don't count) then that are a lot of
> files
> > to check.  There are about 9800 cxx files, 3500 c files, 12000 hxx files,
> > and lot of external headers.  Checking the modification time of so many
> > files is one of the reasons for the delay in , say, sw/ between starting
> > make and its first action.
> >
> > As I don't have all global dependencies in a format that would allow
> > experimation, I tried how long it would take to get the mtime (time of
> last
> > modification) for all files, source, generated, compiled, linked, about
> > 120000.  I wrote a small Java program for that.  With a warm cache that
> > takes about 23s.  When run in 4 threads this reduced to less than 8s.
>  Could
> > be worse.
> >
> > But it also could be better because in general there are only a few files
> > modified, usually files that you modified yourself in an editor.  There
> is
> > another approach, introduced, as far as I know, by the tup [1] build
> tool,
> > that is bottom up.  If you had something similar to the oracle of
> complexity
> > theory, that gave you the list of modified files since the last build,
> you
> > could find the depending files much faster.  Faster for two reasons.
> > Firstly, there is only one path in the dependency tree up towards the
> root
> > (while there are many down from the root).  Only targets on this path are
> > affected by the modified file. Secondly, the dependency analysis is
> > comparatively cheap.  The expensive part is to determine the file
> > modification times.  If they where miraculously given then even the
> top-down
> > approach would not take noticably longer.
> >
> > So I made another experiment to see if such an oracle can be created.
>  Java
> > 7 has the java.nio.file.WatchService that lets you monitor file
> > modfifications in one directory.  I registered it to all directories in
> our
> > source tree (some 16000 directories).  With the WatchService in place
> every
> > file modification can be recorded and stored for later.  On the next
> build
> > you only have to check the set of modified files, not all files.
> > Registering the directory watchers takes a couple of seconds but after
> that
> > it does not cause any noticeable CPU activity. Any file modifications are
> > reported almost at once.  I do not have the framework in place to start a
> > build with this information but I would expect it to be as fast as
> compiling
> > the modified files and linking takes.
> >
> > The tup website references a paper [2] in which the established top-down
> > approaches are called alpha alogithms and the new bottom-up approach is
> > called beta algorithm. Tup has implemented a file modification watcher
> (in C
> > or C++) only for Linux.  On Windows it just scans all files (for which it
> > needs a little more time than my Java program, maybe it does not use more
> > than one thread).
> >
> >
> > This is something that we should keep in mind for when we ever should
> get a
> > build solution with global dependencies and this build tool would turn
> out
> > to be too slow.
> >
> >
> > If can find the source code of my Java experiments at [3]. If nothing
> else
> > you can see an application of the ForkJoinPool that allowed my to write
> the
> > parallel file system scan in just a few lines.  There is also an
> alternative
> > implementation that uses the ExecutorService (with a fixed thread pool)
> > which needs a few more lines of code.  And there is of course the use of
> the
> > WatchService.
> >
>
> Has anyone read this book?
>
> http://www.amazon.com/Large-Scale-Software-Design-John-Lakos/dp/0201633620
>
> It was on my list to read for many years.   From what I've seen it
> suggests design approaches to the improve build times.  So things that
> go beyond what you can do by just changing build files, more
> fundamental changes to how interfaces are defined.
>

Have read it, the book goes more into C++ structures and design, than the
actual build process.

If you have a pure C++ project, you can do a lot of speed improvement by
definining the classes for speed instead of purity.

Its quite a good book, but have very little for the AOO build system.


>
> Otherwise I wonder if we're trying to optimize a bubble sort?
>

No we are trying to moving away from 3-4 build components trying to do the
same thing and each sub-optimized.

In other words, our system has grown complex, it was not designed complex.
Just look at how different the single module makefiles are (should have
been standardized) and at the same time we have central modules that have a
hefty number of "if" because of the lacking standard.

I know its unpopular to write it, but AOO is by no means a REAL BIG
software package...it just looks that way.

Building a linux system is a factor 4-5 to AOO, building microsoft MFC (for
those who have access) is about 1.5. Of course depending how you look at it.

rgds
jan I.

>
> -Rob
>
> >
> > Regards,
> > Andre
> >
> >
> > [1] http://gittup.org/tup/
> > [2] http://gittup.org/tup/build_system_rules_and_algorithms.pdf
> > [3] http://people.apache.org/~af/test.zip
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> > For additional commands, e-mail: dev-help@openoffice.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: dev-help@openoffice.apache.org
>
>

Re: Bottom up build

Posted by Andre Fischer <aw...@gmail.com>.
On 30.01.2014 23:10, Rob Weir wrote:
> On Wed, Jan 29, 2014 at 4:18 AM, Andre Fischer <aw...@gmail.com> wrote:
>> I would like to report some observations that I made when thinking about how
>> to make building OpenOffice with one global makefile feasible.  It will
>> probably the last of build related mails in the near future.
>>
>> Traditional make uses a top-down approach.  It starts with a target, 'all'
>> by default, and looks at its dependencies.  When one of those has to be made
>> or is newer then the target then the target also has to be made.  This is
>> done recursively and depth-first.  Every file on which 'all' has a direct or
>> indirect dependency has to be checked.  If we would build OpenOffice with
>> one makefile (included makefiles don't count) then that are a lot of files
>> to check.  There are about 9800 cxx files, 3500 c files, 12000 hxx files,
>> and lot of external headers.  Checking the modification time of so many
>> files is one of the reasons for the delay in , say, sw/ between starting
>> make and its first action.
>>
>> As I don't have all global dependencies in a format that would allow
>> experimation, I tried how long it would take to get the mtime (time of last
>> modification) for all files, source, generated, compiled, linked, about
>> 120000.  I wrote a small Java program for that.  With a warm cache that
>> takes about 23s.  When run in 4 threads this reduced to less than 8s.  Could
>> be worse.
>>
>> But it also could be better because in general there are only a few files
>> modified, usually files that you modified yourself in an editor.  There is
>> another approach, introduced, as far as I know, by the tup [1] build tool,
>> that is bottom up.  If you had something similar to the oracle of complexity
>> theory, that gave you the list of modified files since the last build, you
>> could find the depending files much faster.  Faster for two reasons.
>> Firstly, there is only one path in the dependency tree up towards the root
>> (while there are many down from the root).  Only targets on this path are
>> affected by the modified file. Secondly, the dependency analysis is
>> comparatively cheap.  The expensive part is to determine the file
>> modification times.  If they where miraculously given then even the top-down
>> approach would not take noticably longer.
>>
>> So I made another experiment to see if such an oracle can be created.  Java
>> 7 has the java.nio.file.WatchService that lets you monitor file
>> modfifications in one directory.  I registered it to all directories in our
>> source tree (some 16000 directories).  With the WatchService in place every
>> file modification can be recorded and stored for later.  On the next build
>> you only have to check the set of modified files, not all files.
>> Registering the directory watchers takes a couple of seconds but after that
>> it does not cause any noticeable CPU activity. Any file modifications are
>> reported almost at once.  I do not have the framework in place to start a
>> build with this information but I would expect it to be as fast as compiling
>> the modified files and linking takes.
>>
>> The tup website references a paper [2] in which the established top-down
>> approaches are called alpha alogithms and the new bottom-up approach is
>> called beta algorithm. Tup has implemented a file modification watcher (in C
>> or C++) only for Linux.  On Windows it just scans all files (for which it
>> needs a little more time than my Java program, maybe it does not use more
>> than one thread).
>>
>>
>> This is something that we should keep in mind for when we ever should get a
>> build solution with global dependencies and this build tool would turn out
>> to be too slow.
>>
>>
>> If can find the source code of my Java experiments at [3]. If nothing else
>> you can see an application of the ForkJoinPool that allowed my to write the
>> parallel file system scan in just a few lines.  There is also an alternative
>> implementation that uses the ExecutorService (with a fixed thread pool)
>> which needs a few more lines of code.  And there is of course the use of the
>> WatchService.
>>
> Has anyone read this book?
>
> http://www.amazon.com/Large-Scale-Software-Design-John-Lakos/dp/0201633620
>
> It was on my list to read for many years.   From what I've seen it
> suggests design approaches to the improve build times.  So things that
> go beyond what you can do by just changing build files, more
> fundamental changes to how interfaces are defined.

I have not but I might, thanks for the hint.

I agree that improving our software design is always a good idea but I 
would not change our design just to make the build faster.  Many of our 
code related problems are caused by the design and the limitations of 
the build system.   Examples are the individual building of directories 
(in dmake modules) and creation of one library per directory or by the 
many ugly tricks that avoid becoming "incompatible" (the need to compile 
files in other modules).

>
> Otherwise I wonder if we're trying to optimize a bubble sort?

I would smile about this if I had not seen a handcrafted sort algorithm 
in our build scripts that is even inferior than bubble sort :-)

Improving the build system for me is more than just an interesting way 
to pass time.  It is one of my major pain points (not sure how to phrase 
that in proper english).   The Windows build takes about three times the 
time of the Linux build.  The dmake builds in the larger modules (sc, 
sd) take so long that when they finish I have sometimes forgotten what 
changes I made.  The gbuild builds are not entirely reliable (change an 
SDI file and make does not compile one file).

-Andre

>
> -Rob
>
>> Regards,
>> Andre
>>
>>
>> [1] http://gittup.org/tup/
>> [2] http://gittup.org/tup/build_system_rules_and_algorithms.pdf
>> [3] http://people.apache.org/~af/test.zip
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
>> For additional commands, e-mail: dev-help@openoffice.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: dev-help@openoffice.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org


Re: Bottom up build

Posted by Rob Weir <ro...@apache.org>.
On Wed, Jan 29, 2014 at 4:18 AM, Andre Fischer <aw...@gmail.com> wrote:
> I would like to report some observations that I made when thinking about how
> to make building OpenOffice with one global makefile feasible.  It will
> probably the last of build related mails in the near future.
>
> Traditional make uses a top-down approach.  It starts with a target, 'all'
> by default, and looks at its dependencies.  When one of those has to be made
> or is newer then the target then the target also has to be made.  This is
> done recursively and depth-first.  Every file on which 'all' has a direct or
> indirect dependency has to be checked.  If we would build OpenOffice with
> one makefile (included makefiles don't count) then that are a lot of files
> to check.  There are about 9800 cxx files, 3500 c files, 12000 hxx files,
> and lot of external headers.  Checking the modification time of so many
> files is one of the reasons for the delay in , say, sw/ between starting
> make and its first action.
>
> As I don't have all global dependencies in a format that would allow
> experimation, I tried how long it would take to get the mtime (time of last
> modification) for all files, source, generated, compiled, linked, about
> 120000.  I wrote a small Java program for that.  With a warm cache that
> takes about 23s.  When run in 4 threads this reduced to less than 8s.  Could
> be worse.
>
> But it also could be better because in general there are only a few files
> modified, usually files that you modified yourself in an editor.  There is
> another approach, introduced, as far as I know, by the tup [1] build tool,
> that is bottom up.  If you had something similar to the oracle of complexity
> theory, that gave you the list of modified files since the last build, you
> could find the depending files much faster.  Faster for two reasons.
> Firstly, there is only one path in the dependency tree up towards the root
> (while there are many down from the root).  Only targets on this path are
> affected by the modified file. Secondly, the dependency analysis is
> comparatively cheap.  The expensive part is to determine the file
> modification times.  If they where miraculously given then even the top-down
> approach would not take noticably longer.
>
> So I made another experiment to see if such an oracle can be created.  Java
> 7 has the java.nio.file.WatchService that lets you monitor file
> modfifications in one directory.  I registered it to all directories in our
> source tree (some 16000 directories).  With the WatchService in place every
> file modification can be recorded and stored for later.  On the next build
> you only have to check the set of modified files, not all files.
> Registering the directory watchers takes a couple of seconds but after that
> it does not cause any noticeable CPU activity. Any file modifications are
> reported almost at once.  I do not have the framework in place to start a
> build with this information but I would expect it to be as fast as compiling
> the modified files and linking takes.
>
> The tup website references a paper [2] in which the established top-down
> approaches are called alpha alogithms and the new bottom-up approach is
> called beta algorithm. Tup has implemented a file modification watcher (in C
> or C++) only for Linux.  On Windows it just scans all files (for which it
> needs a little more time than my Java program, maybe it does not use more
> than one thread).
>
>
> This is something that we should keep in mind for when we ever should get a
> build solution with global dependencies and this build tool would turn out
> to be too slow.
>
>
> If can find the source code of my Java experiments at [3]. If nothing else
> you can see an application of the ForkJoinPool that allowed my to write the
> parallel file system scan in just a few lines.  There is also an alternative
> implementation that uses the ExecutorService (with a fixed thread pool)
> which needs a few more lines of code.  And there is of course the use of the
> WatchService.
>

Has anyone read this book?

http://www.amazon.com/Large-Scale-Software-Design-John-Lakos/dp/0201633620

It was on my list to read for many years.   From what I've seen it
suggests design approaches to the improve build times.  So things that
go beyond what you can do by just changing build files, more
fundamental changes to how interfaces are defined.

Otherwise I wonder if we're trying to optimize a bubble sort?

-Rob

>
> Regards,
> Andre
>
>
> [1] http://gittup.org/tup/
> [2] http://gittup.org/tup/build_system_rules_and_algorithms.pdf
> [3] http://people.apache.org/~af/test.zip
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: dev-help@openoffice.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org


Re: Bottom up build

Posted by Kay Schenk <ka...@gmail.com>.
On Wed, Jan 29, 2014 at 1:54 AM, jan i <ja...@apache.org> wrote:

> On 29 January 2014 10:18, Andre Fischer <aw...@gmail.com> wrote:
>
> > I would like to report some observations that I made when thinking about
> > how to make building OpenOffice with one global makefile feasible.  It
> will
> > probably the last of build related mails in the near future.
> >
> > Traditional make uses a top-down approach.  It starts with a target,
> 'all'
> > by default, and looks at its dependencies.  When one of those has to be
> > made or is newer then the target then the target also has to be made.
>  This
> > is done recursively and depth-first.  Every file on which 'all' has a
> > direct or indirect dependency has to be checked.  If we would build
> > OpenOffice with one makefile (included makefiles don't count) then that
> are
> > a lot of files to check.  There are about 9800 cxx files, 3500 c files,
> > 12000 hxx files, and lot of external headers.  Checking the modification
> > time of so many files is one of the reasons for the delay in , say, sw/
> > between starting make and its first action.
> >
> > As I don't have all global dependencies in a format that would allow
> > experimation, I tried how long it would take to get the mtime (time of
> last
> > modification) for all files, source, generated, compiled, linked, about
> > 120000.  I wrote a small Java program for that.  With a warm cache that
> > takes about 23s.  When run in 4 threads this reduced to less than 8s.
> >  Could be worse.
> >
> > But it also could be better because in general there are only a few files
> > modified, usually files that you modified yourself in an editor.  There
> is
> > another approach, introduced, as far as I know, by the tup [1] build
> tool,
> > that is bottom up.  If you had something similar to the oracle of
> > complexity theory, that gave you the list of modified files since the
> last
> > build, you could find the depending files much faster.  Faster for two
> > reasons. Firstly, there is only one path in the dependency tree up
> towards
> > the root (while there are many down from the root).  Only targets on this
> > path are affected by the modified file. Secondly, the dependency analysis
> > is comparatively cheap.  The expensive part is to determine the file
> > modification times.  If they where miraculously given then even the
> > top-down approach would not take noticably longer.
> >
> > So I made another experiment to see if such an oracle can be created.
> >  Java 7 has the java.nio.file.WatchService that lets you monitor file
> > modfifications in one directory.  I registered it to all directories in
> our
> > source tree (some 16000 directories).  With the WatchService in place
> every
> > file modification can be recorded and stored for later.  On the next
> build
> > you only have to check the set of modified files, not all files.
> >  Registering the directory watchers takes a couple of seconds but after
> > that it does not cause any noticeable CPU activity. Any file
> modifications
> > are reported almost at once.  I do not have the framework in place to
> start
> > a build with this information but I would expect it to be as fast as
> > compiling the modified files and linking takes.
> >
> > The tup website references a paper [2] in which the established top-down
> > approaches are called alpha alogithms and the new bottom-up approach is
> > called beta algorithm. Tup has implemented a file modification watcher
> (in
> > C or C++) only for Linux.  On Windows it just scans all files (for which
> it
> > needs a little more time than my Java program, maybe it does not use more
> > than one thread).
> >
> >
> > This is something that we should keep in mind for when we ever should get
> > a build solution with global dependencies and this build tool would turn
> > out to be too slow.
> >
> >
> > If can find the source code of my Java experiments at [3]. If nothing
> else
> > you can see an application of the ForkJoinPool that allowed my to write
> the
> > parallel file system scan in just a few lines.  There is also an
> > alternative implementation that uses the ExecutorService (with a fixed
> > thread pool) which needs a few more lines of code.  And there is of
> course
> > the use of the WatchService.
> >
>
>
It's really interesting to read these observations and test cases... we
have a large and complicated source tree and just seeing what can be
observed about it is fascinating to me.

Thanks for writing down your observations which I find highly interesting.
> I hope your stop on writing about build does not include giving your
> opinion on my ideas in the future as well.
>
> For the record the capstone project, and my little hobby project "Build
> R.I.P." follow a third idea:
>
> We have a clear seperation of  module build and central (total) build.


+1 I would certainly go for this.

 A while back someone asked about ye olde "ld" approach -- all modules
compiled/built and then linked later down the road. If we could somehow do
something to get back to that idea in a more friendly modern way, it would
certainly make working on specific areas more feasible.




> The
> module makefile knows how to build the module, and the central makefile
> knows the relation between modules.
>
> The makefile in each module touched a file, and the central makefile only
> controls that file.
>
> But youir idea of watching for changes is very interesting.
>
> rgds
> jan I.
>
>
> > Andre
> >
> >
> > [1] http://gittup.org/tup/
> > [2] http://gittup.org/tup/build_system_rules_and_algorithms.pdf
> > [3] http://people.apache.org/~af/test.zip
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> > For additional commands, e-mail: dev-help@openoffice.apache.org
> >
> >
>



-- 
-------------------------------------------------------------------------------------------------
MzK

"Cats do not have to be shown how to have a good time,
 for they are unfailing ingenious in that respect."
                                       -- James Mason

Re: Bottom up build

Posted by jan i <ja...@apache.org>.
On 29 January 2014 10:18, Andre Fischer <aw...@gmail.com> wrote:

> I would like to report some observations that I made when thinking about
> how to make building OpenOffice with one global makefile feasible.  It will
> probably the last of build related mails in the near future.
>
> Traditional make uses a top-down approach.  It starts with a target, 'all'
> by default, and looks at its dependencies.  When one of those has to be
> made or is newer then the target then the target also has to be made.  This
> is done recursively and depth-first.  Every file on which 'all' has a
> direct or indirect dependency has to be checked.  If we would build
> OpenOffice with one makefile (included makefiles don't count) then that are
> a lot of files to check.  There are about 9800 cxx files, 3500 c files,
> 12000 hxx files, and lot of external headers.  Checking the modification
> time of so many files is one of the reasons for the delay in , say, sw/
> between starting make and its first action.
>
> As I don't have all global dependencies in a format that would allow
> experimation, I tried how long it would take to get the mtime (time of last
> modification) for all files, source, generated, compiled, linked, about
> 120000.  I wrote a small Java program for that.  With a warm cache that
> takes about 23s.  When run in 4 threads this reduced to less than 8s.
>  Could be worse.
>
> But it also could be better because in general there are only a few files
> modified, usually files that you modified yourself in an editor.  There is
> another approach, introduced, as far as I know, by the tup [1] build tool,
> that is bottom up.  If you had something similar to the oracle of
> complexity theory, that gave you the list of modified files since the last
> build, you could find the depending files much faster.  Faster for two
> reasons. Firstly, there is only one path in the dependency tree up towards
> the root (while there are many down from the root).  Only targets on this
> path are affected by the modified file. Secondly, the dependency analysis
> is comparatively cheap.  The expensive part is to determine the file
> modification times.  If they where miraculously given then even the
> top-down approach would not take noticably longer.
>
> So I made another experiment to see if such an oracle can be created.
>  Java 7 has the java.nio.file.WatchService that lets you monitor file
> modfifications in one directory.  I registered it to all directories in our
> source tree (some 16000 directories).  With the WatchService in place every
> file modification can be recorded and stored for later.  On the next build
> you only have to check the set of modified files, not all files.
>  Registering the directory watchers takes a couple of seconds but after
> that it does not cause any noticeable CPU activity. Any file modifications
> are reported almost at once.  I do not have the framework in place to start
> a build with this information but I would expect it to be as fast as
> compiling the modified files and linking takes.
>
> The tup website references a paper [2] in which the established top-down
> approaches are called alpha alogithms and the new bottom-up approach is
> called beta algorithm. Tup has implemented a file modification watcher (in
> C or C++) only for Linux.  On Windows it just scans all files (for which it
> needs a little more time than my Java program, maybe it does not use more
> than one thread).
>
>
> This is something that we should keep in mind for when we ever should get
> a build solution with global dependencies and this build tool would turn
> out to be too slow.
>
>
> If can find the source code of my Java experiments at [3]. If nothing else
> you can see an application of the ForkJoinPool that allowed my to write the
> parallel file system scan in just a few lines.  There is also an
> alternative implementation that uses the ExecutorService (with a fixed
> thread pool) which needs a few more lines of code.  And there is of course
> the use of the WatchService.
>

Thanks for writing down your observations which I find highly interesting.
I hope your stop on writing about build does not include giving your
opinion on my ideas in the future as well.

For the record the capstone project, and my little hobby project "Build
R.I.P." follow a third idea:

We have a clear seperation of  module build and central (total) build. The
module makefile knows how to build the module, and the central makefile
knows the relation between modules.

The makefile in each module touched a file, and the central makefile only
controls that file.

But youir idea of watching for changes is very interesting.

rgds
jan I.


> Andre
>
>
> [1] http://gittup.org/tup/
> [2] http://gittup.org/tup/build_system_rules_and_algorithms.pdf
> [3] http://people.apache.org/~af/test.zip
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: dev-help@openoffice.apache.org
>
>