You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ant.apache.org by Stefan Bodewig <bo...@apache.org> on 2008/09/18 11:59:47 UTC

DirectoryScanner performance

Hi all,

I've just committed a build file that uses pathconvert on a big
directory tree to measure DirectoryScanner performance.  Initially I
only wanted to use it to compare current trunk with Ant 1.7.1 but when
I saw that trunk was a tiny bit slower (something I didn't expect at
all) I threw in Ant 1.6.5 and 1.7.0 as well as several trunk revisions
between 1.7.1 and the current HEAD.

The tests are not through yet, but one thing was so surprising to me
that I wanted to show it upfront:

Running the matchall target (of sr/etc/performance/dirscanner.xml):

Ant 1.6.5               1 min 30 s      ~ 19 MB
Ant 1.7.0               3 min 53 s      ~ 24 MB
Ant 1.7.1                     10 s      ~ 14 MB

So 1.7.0 took more than twice as long as 1.6.5 to find all files in a
big directory tree without any patterns and 1.7.1 is a whole lot
faster than even 1.6.5.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org


Re: DirectoryScanner performance

Posted by Stefan Bodewig <bo...@apache.org>.
On Thu, 18 Sep 2008, Kevin Jackson <fo...@gmail.com> wrote:

>> The tests are not through yet, but one thing was so surprising to
>> me that I wanted to show it upfront:
>>
>> Running the matchall target (of sr/etc/performance/dirscanner.xml):
>>
>> Ant 1.6.5               1 min 30 s      ~ 19 MB
>> Ant 1.7.0               3 min 53 s      ~ 24 MB
>> Ant 1.7.1                     10 s      ~ 14 MB
>>
>> So 1.7.0 took more than twice as long as 1.6.5 to find all files in
>> a big directory tree without any patterns and 1.7.1 is a whole lot
>> faster than even 1.6.5.
> 
> That's a huge difference - what are we doing now in 1.7.1 that is
> different from before?

Jesse's changes in svn rev 581748 are the major contributor here,
where he reduced the number of File.isDirectory/list calls and thus
the number of OS syscalls.

I'm currently testing something that will be committed in a few
minutes and that runs the same test in 6 seconds 8-)

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org


Re: DirectoryScanner performance

Posted by Stefan Bodewig <bo...@apache.org>.
On Thu, 18 Sep 2008, Kevin Jackson <fo...@gmail.com> wrote:

>> The tests are not through yet, but one thing was so surprising to
>> me that I wanted to show it upfront:
>>
>> Running the matchall target (of sr/etc/performance/dirscanner.xml):
>>
>> Ant 1.6.5               1 min 30 s      ~ 19 MB
>> Ant 1.7.0               3 min 53 s      ~ 24 MB
>> Ant 1.7.1                     10 s      ~ 14 MB
>>
>> So 1.7.0 took more than twice as long as 1.6.5 to find all files in
>> a big directory tree without any patterns and 1.7.1 is a whole lot
>> faster than even 1.6.5.
> 
> That's a huge difference - what are we doing now in 1.7.1 that is
> different from before?

Apart from the I/O syscall reducing change there is another major
change in Union (svn rev 581394) which is similar in effect to my
VectorSet change in trunk's HEAD - avoid the linear time complexity of
List.contains().

If I change the scan macrodef in dirscanner.xml to read

  <macrodef name="scan">
    <attribute name="test"/>
    <element name="patterns" optional="true"/>
    <sequential>
      <pathconvert property="@{test}">
          <fileset dir="${test.dir}" followSymlinks="${symlinks}"
                   casesensitive="${casesensitive}">
            <patterns/>
          </fileset>
      </pathconvert>
    </sequential>
  </macrodef>

i.e. remove the <path> around <fileset> that was only there to allow
it to work in 1.6.5, then I avoid Union's code and the results become

Ant 1.7.0                     45 s      ~ 18 MB
Ant 1.7.1                      9 s      ~ 18 MB
svn rev 696674                 4 s      ~ 41 MB

for the matchall target (same setup as for the quoted results).

So directory scanning performance itself hasn't degraded as much from
1.6.5 to 1.7.0 (it may even have become faster) but evaluating a
<fileset> inside a <path> has.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org


Re: Sort Order of DirectoryScanner Results

Posted by Stefan Bodewig <bo...@apache.org>.
forget what I've written, getIncludedFiles and getIncludedDirectories
both have an Arrays.sort() in them, so sort order should be
predicatble and stable.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org


Sort Order of DirectoryScanner Results (was Re: DirectoryScanner performance)

Posted by Stefan Bodewig <bo...@apache.org>.
On Thu, 18 Sep 2008, Steve Loughran <st...@apache.org> wrote:

> Kevin Jackson wrote:
> 
> > That's a huge difference - what are we doing now in 1.7.1 that is
> > different from before?
> 
> I think it tries to sort stuff less.
>
> This broke hadoop builds as their class structure was wrong
> https://issues.apache.org/jira/browse/HADOOP-3907

Once upon a time DirectoryScanner used to return files in a
predictable order - something like DFS where the files from
directories appear as directories are visited in DFS order.

The order of files coming from the same directory as well as
directories at the same height in the tree dependend on File.list and
likely was in lexicographic order.

This still is true if there is a basedir and DirectoryScanner decides
that it needs to scan the full tree, but starting with rev 274819
(i.e. <https://issues.apache.org/bugzilla/show_bug.cgi?id=20103>,
which means Ant 1.6) this is not true if there are only include
patterns that don't start with wildcards.

In the later case only the directories that could be included will be
scanned and order of directories is undefined (they get added as keys
to a map and then scanned in iterator order of the map).

Given that this happened such a long time ago, we can savely assume
that in general order of files/directories returned by
DirectoryScanner is not defined.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org


Re: DirectoryScanner performance

Posted by Stefan Bodewig <bo...@apache.org>.
On Thu, 18 Sep 2008, Steve Loughran <st...@apache.org> wrote:

> Kevin Jackson wrote:
> >> The tests are not through yet, but one thing was so surprising to me
> >> that I wanted to show it upfront:
> >>
> >> Running the matchall target (of sr/etc/performance/dirscanner.xml):
> >>
> >> Ant 1.6.5               1 min 30 s      ~ 19 MB
> >> Ant 1.7.0               3 min 53 s      ~ 24 MB
> >> Ant 1.7.1                     10 s      ~ 14 MB
> >>
> >> So 1.7.0 took more than twice as long as 1.6.5 to find all files in a
> >> big directory tree without any patterns and 1.7.1 is a whole lot
> >> faster than even 1.6.5.
> >
> > That's a huge difference - what are we doing now in 1.7.1 that is
> > different from before?
> >
> 
> I think it tries to sort stuff less.

Not in DirectoryScanner, maybe in between 1.6.5 and 1.7.1.

The pathconvert results of 1.6.5 and 1.7.1 are identical (only
difference is that my drive letter - Windows - was lower case in 1.6.5
and is uppercase in 1.7.1).

I checked because I couldn't believe the performance difference was real.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org


Re: DirectoryScanner performance

Posted by Steve Loughran <st...@apache.org>.
Kevin Jackson wrote:
>> The tests are not through yet, but one thing was so surprising to me
>> that I wanted to show it upfront:
>>
>> Running the matchall target (of sr/etc/performance/dirscanner.xml):
>>
>> Ant 1.6.5               1 min 30 s      ~ 19 MB
>> Ant 1.7.0               3 min 53 s      ~ 24 MB
>> Ant 1.7.1                     10 s      ~ 14 MB
>>
>> So 1.7.0 took more than twice as long as 1.6.5 to find all files in a
>> big directory tree without any patterns and 1.7.1 is a whole lot
>> faster than even 1.6.5.
> 
> That's a huge difference - what are we doing now in 1.7.1 that is
> different from before?
> 

I think it tries to sort stuff less.  This broke hadoop builds as their 
class structure was wrong
https://issues.apache.org/jira/browse/HADOOP-3907

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org


Re: DirectoryScanner performance

Posted by Kevin Jackson <fo...@gmail.com>.
> The tests are not through yet, but one thing was so surprising to me
> that I wanted to show it upfront:
>
> Running the matchall target (of sr/etc/performance/dirscanner.xml):
>
> Ant 1.6.5               1 min 30 s      ~ 19 MB
> Ant 1.7.0               3 min 53 s      ~ 24 MB
> Ant 1.7.1                     10 s      ~ 14 MB
>
> So 1.7.0 took more than twice as long as 1.6.5 to find all files in a
> big directory tree without any patterns and 1.7.1 is a whole lot
> faster than even 1.6.5.

That's a huge difference - what are we doing now in 1.7.1 that is
different from before?

Kev

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org


Re: DirectoryScanner performance

Posted by Stefan Bodewig <bo...@apache.org>.
On Fri, 19 Sep 2008, Kevin Jackson <fo...@gmail.com> wrote:

> Hi,
>> At the same time memory usage has increased with 1.7.0 and never
>> decreased after that, in fact the current HEAD uses more memory
>> than ever before.  Something between revisions 687768 and 693846
>> has bumped the memory mark without gaining us much in terms of
>> performance, I'll try to isolate and remove that later.
> 
> This is an issue.

Absolutely.

> Looking at your results, memory has significantly increased
> somewhere and for large builds we were already causing OutOfMemory
> errors.

I'll be working on this and now have a self-contained testbed (and I
invite others to use the tests on their systems to see whether they
get different results).  Getting memory consumption at least down to
where it was with 1.7.1 is my goal.

OTOH my clumsy method of measuring memory consumption (watching task
manager or top) doesn't tell us the full story since it only shows the
peak memory consumption and after a GC things may look differently
inside of the VM.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org


Re: DirectoryScanner performance

Posted by Kevin Jackson <fo...@gmail.com>.
Hi,
> At the same time memory usage has increased with 1.7.0 and never
> decreased after that, in fact the current HEAD uses more memory than
> ever before.  Something between revisions 687768 and 693846 has bumped
> the memory mark without gaining us much in terms of performance, I'll
> try to isolate and remove that later.

This is an issue.  Looking at your results, memory has significantly
increased somewhere and for large builds we were already causing
OutOfMemory errors.

Kev

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org


Re: DirectoryScanner performance

Posted by Stefan Bodewig <bo...@apache.org>.
On Fri, 19 Sep 2008, <ad...@yahoo.com> wrote:

> Gilles Scokart wrote:

>> Couldn't it be optimized on windows by simply remove the check ?
>> Windows don't have symbolic link...
> 
> http://en.wikipedia.org/wiki/Symbolic_link#Windows

Not sure whether we'd detect them.

And then there may be a mounted file system coming from a Unix
machine, not sure whether they'd show differences between absolute and
canonical pathes on Windows either.

Anyway, followsymlinks is true by default and I'd expect people to
only ever set it to false if they expected to see symlinks somewhere
in the tree - which most likely means a Unix only environment.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org


Re: DirectoryScanner performance

Posted by aditsu <ad...@yahoo.com>.

Gilles Scokart wrote:
> 
> Couldn't it be optimized on windows by simply remove the check ?
> Windows don't have symbolic link...
> 

http://en.wikipedia.org/wiki/Symbolic_link#Windows
-- 
View this message in context: http://www.nabble.com/DirectoryScanner-performance-tp19549555p19567453.html
Sent from the Ant - Dev mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org


Re: DirectoryScanner performance

Posted by Gilles Scokart <gs...@gmail.com>.
2008/9/19 Stefan Bodewig <bo...@apache.org>:
>
> Interestingly the effect of followSymlinks=false is far less dramatic
> on Linux than on Windows.
>

Couldn't it be optimized on windows by simply remove the check ?
Windows don't have symbolic link...  We could either have
followSymlinks being not used if we are on window, either having it
forced to true, or using a subclass of FileUtils that has method
specificaly tuned for windows when we are on windows .

Note that, seeing the current definition of symbolic link [1], I'm not
100% sure that this wouldn't bring any regression.  Also, we will have
to take care if someone manage to have a jvm ported on cygwin.  But
AFAIK, this is not available.



[1] All files/directories for which the canonical path is different
from its path are considered symbolic links
-- 
Gilles Scokart

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org


Re: DirectoryScanner performance

Posted by Stefan Bodewig <bo...@apache.org>.
Tests on Ubuntu 8.04, Java 1.6, I only tested two svn revisions and
1.7.1 (and used top instead of task manager in my highly scientific
approach to measure memory consumption 8-)

The trends are the same.  Even though the machine is a lot slower than
my work Windows system the times are comparable.  I guess Linux'
tempfs is quite a bit faster than a real NTFS formatted disk.

The pre-tokenization changes have a stronger effect, though.  My guess
- again - is that scanning is stronger I/O bound on the Windows
machine and CPU bound on tempfs on Linux.

It also seems as if I needed to modify the test setup to make it even
larger if I want to compare further tweaks.

Interestingly the effect of followSymlinks=false is far less dramatic
on Linux than on Windows.

I removed memoization of File.list and File.getCanonicalPath (only
local, not yet committed) and it didn't change much, so I'll probably
remove it completely.

I added a new many-roots tests (not part of the timings for "all"
below) that should be sensitive to memoization and pre-tokenization.

tests with default settings
===========================

matchall
--------

Ant 1.7.1                     15 s      ~ 20 MB
trunk rev 696355              12 s      ~ 34 MB
trunk rev 696674               2 s      ~ 29 MB
no memoization                 2 s      ~ 20 MB

roots
-----

Ant 1.7.1                      0 s      ~ ?? MB
trunk rev 696355               0 s      ~ ?? MB
trunk rev 696674               0 s      ~ ?? MB
no memoization                 0 s      ~ ?? MB

recursive-excludes
------------------

Ant 1.7.1                      2 s      ~ 14 MB
trunk rev 696355               2 s      ~ 13 MB
trunk rev 696674               1 s      ~ 19 MB
no memoization                 1 s      ~ 21 MB

name-matches
------------

Ant 1.7.1                      3 s      ~ 19 MB
trunk rev 696355               3 s      ~ 31 MB
trunk rev 696674               2 s      ~ 24 MB
no memoization                 1 s      ~ 20 MB

many-patterns
-------------

Ant 1.7.1                      4 s      ~ 21 MB
trunk rev 696355               4 s      ~ 28 MB
trunk rev 696674               1 s      ~ 35 MB
no memoization                 1 s      ~ 22 MB

all
---

Ant 1.7.1                     20 s      ~ 41 MB
trunk rev 696355              20 s      ~ 44 MB
trunk rev 696674               6 s      ~ 43 MB
no memoization                 5 s      ~ 44 MB

many-roots
----------

Ant 1.7.1                     14 s      ~ 27 MB
trunk rev 696674               2 s      ~ 35 MB
no memoization                 2 s      ~ 30 MB

Case-insensitive scan
=====================

all
---

Ant 1.7.1                     21 s      ~ 40 MB
trunk rev 696355              20 s      ~ 44 MB
trunk rev 696674               6 s      ~ 43 MB
no memoization                 5 s      ~ 43 MB

No followSymlinks
=================

all
---

Ant 1.7.1                     25 s      ~ 38 MB
trunk rev 696355              24 s      ~ 45 MB
trunk rev 696674               9 s      ~ 45 MB
no memoization                 8 s      ~ 45 MB

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org


Re: DirectoryScanner performance

Posted by Stefan Bodewig <bo...@apache.org>.
My complete test results are below.  Ant 1.7.1 has been consistently a
lot faster than 1.6.5 which consistently took half the time of 1.7.0.
svn trunk's HEAD is consistently faster than 1.7.1.

At the same time memory usage has increased with 1.7.0 and never
decreased after that, in fact the current HEAD uses more memory than
ever before.  Something between revisions 687768 and 693846 has bumped
the memory mark without gaining us much in terms of performance, I'll
try to isolate and remove that later.

The symlink loop detection has cost us a bit of performance but has
been compensated by the later tokenization changes (and been more than
made up for by VectorSet).

Memoization of File.list and File.getCanonicalPath doesn't seem to win
much, but before I remove it, I want to try it on a different OS.

followsymlinks=false costs a lot of performance, and it seems to suck
up memory as well - but this is completely due to
File.getCanonicalPath and I don't think there was anything we could
do.

Stefan

Raw test results to follow

Method
======

All tests run on the same machine (WinXP) with the same JDK (1.4).

For all tests < 60s the values are the median values of three
consecutive runs (but to tell the truth, in most cases all three runs
yielded the exact same results).

Times are what Ant reports itself when it says "Build finished",
memory numbers are the maximum values seen while watching the windows
task manager - which also means it is the total memory of the Java
process which could itself contain an empty heap.


Contestants:
============

Ant 1.6.5
Ant 1.7.0
Ant 1.7.1
trunk rev 687768        - PathPattern
trunk rev 693846        - pre loop detection
trunk rev 694254        - with loop detection
trunk rev 695389        - memoization of canonical path
trunk rev 696146        - more pre-tokenization work
trunk rev 696345        - full pre-tokenization, no File.list memo
trunk rev 696355        - bring back memoization of File.list
trunk rev 696674        - introduction of VectorSet

tests with default settings
===========================

matchall
--------

Ant 1.6.5               1 min 30 s      ~ 19 MB
Ant 1.7.0               3 min 53 s      ~ 24 MB
Ant 1.7.1                     10 s      ~ 14 MB
trunk rev 687768               9 s      ~ 19 MB
trunk rev 693846               9 s      ~ 30 MB
trunk rev 694254              10 s      ~ 41 MB
trunk rev 695389              10 s      ~ 39 MB
trunk rev 696146              11 s      ~ 39 MB
trunk rev 696345              10 s      ~ 41 MB
trunk rev 696355              10 s      ~ 43 MB
trunk rev 696674               6 s      ~ 44 MB

roots
-----

Ant 1.6.5                      1 s      ~ 14 MB
Ant 1.7.0                      3 s      ~ 12 MB
Ant 1.7.1                      0 s      ~ ?? MB
trunk rev 687768               0 s      ~ ?? MB
trunk rev 693846               0 s      ~ ?? MB
trunk rev 694254               0 s      ~ ?? MB
trunk rev 695389               0 s      ~ ?? MB
trunk rev 696146               0 s      ~ ?? MB
trunk rev 696345               0 s      ~ ?? MB
trunk rev 696355               0 s      ~ ?? MB
trunk rev 696674               0 s      ~ ?? MB

recursive-excludes
------------------

Ant 1.6.5                      8 s      ~ 19 MB
Ant 1.7.0                     16 s      ~ 16 MB
Ant 1.7.1                      3 s      ~ 14 MB
trunk rev 687768               3 s      ~ 14 MB
trunk rev 693846               3 s      ~ 24 MB
trunk rev 694254               4 s      ~ 29 MB
trunk rev 695389               4 s      ~ 28 MB
trunk rev 696146               4 s      ~ 28 MB
trunk rev 696345               4 s      ~ 27 MB
trunk rev 696355               4 s      ~ 28 MB
trunk rev 696674               3 s      ~ 31 MB

name-matches
------------

Ant 1.6.5                     11 s      ~ 19 MB
Ant 1.7.0                     28 s      ~ 17 MB
Ant 1.7.1                      3 s      ~ 15 MB
trunk rev 687768               3 s      ~ 14 MB
trunk rev 693846               4 s      ~ 31 MB
trunk rev 694254               5 s      ~ 32 MB
trunk rev 695389               5 s      ~ 34 MB
trunk rev 696146               5 s      ~ 38 MB
trunk rev 696345               4 s      ~ 29 MB
trunk rev 696355               5 s      ~ 30 MB
trunk rev 696674               3 s      ~ 35 MB

many-patterns
-------------

Ant 1.6.5                      7 s      ~ 19 MB
Ant 1.7.0                     13 s      ~ 17 MB
Ant 1.7.1                      4 s      ~ 17 MB
trunk rev 687768               3 s      ~ 14 MB
trunk rev 693846               4 s      ~ 20 MB
trunk rev 694254               4 s      ~ 23 MB
trunk rev 695389               4 s      ~ 24 MB
trunk rev 696146               4 s      ~ 25 MB
trunk rev 696345               4 s      ~ 29 MB
trunk rev 696355               4 s      ~ 25 MB
trunk rev 696674               3 s      ~ 26 MB

all
---

Ant 1.6.5               1 min 49 s      ~ 30 MB
Ant 1.7.0               4 min 54 s      ~ 44 MB
Ant 1.7.1                     19 s      ~ 43 MB
trunk rev 687768              18 s      ~ 43 MB
trunk rev 693846              19 s      ~ 42 MB
trunk rev 694254              21 s      ~ 43 MB
trunk rev 695389              21 s      ~ 44 MB
trunk rev 696146              21 s      ~ 47 MB
trunk rev 696345              20 s      ~ 47 MB
trunk rev 696355              20 s      ~ 49 MB
trunk rev 696674              11 s      ~ 52 MB

Case-insensitive scan
=====================

all
---

Ant 1.6.5               1 min 49 s      ~ 30 MB
Ant 1.7.0               4 min 53 s      ~ 44 MB
Ant 1.7.1                     19 s      ~ 44 MB
trunk rev 687768              19 s      ~ 43 MB
trunk rev 693846              20 s      ~ 42 MB
trunk rev 694254              21 s      ~ 47 MB
trunk rev 695389              21 s      ~ 46 MB
trunk rev 696146              21 s      ~ 53 MB
trunk rev 696345              20 s      ~ 47 MB
trunk rev 696355              20 s      ~ 48 MB
trunk rev 696674              11 s      ~ 51 MB

No followSymlinks
=================

all
---

Ant 1.6.5               2 min 20 s      ~ 67 MB
Ant 1.7.0               5 min 30 s      ~ 71 MB
Ant 1.7.1                     45 s      ~ 71 MB
trunk rev 687768              40 s      ~ 70 MB
trunk rev 693846              43 s      ~ 71 MB
trunk rev 694254              43 s      ~ 72 MB
trunk rev 695389              43 s      ~ 71 MB
trunk rev 696146              43 s      ~ 70 MB
trunk rev 696345              41 s      ~ 71 MB
trunk rev 696355              42 s      ~ 73 MB
trunk rev 696674              29 s      ~ 73 MB

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org