You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@ant.apache.org by bu...@apache.org on 2022/05/01 18:22:00 UTC

[Bug 66048] New: Directory Scanner extremely slow

https://bz.apache.org/bugzilla/show_bug.cgi?id=66048

            Bug ID: 66048
           Summary: Directory Scanner extremely slow
           Product: Ant
           Version: 1.10.12
          Hardware: Macintosh
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Core
          Assignee: notifications@ant.apache.org
          Reporter: len@winequest.com
  Target Milestone: ---

Directory scans in '/slowdir' have been super slow. '/slowdir' is a small
directory on my workstation’s flash drive. This directory does contain some
symbolic links to large directories on the large external disk drives, so it
does appear to be traversing down the directory tree. Every time I run it, my
large external drives are thrashing. That explains why it’s slow.

Analyzed by Adam Retter:

For your example, I recreated and traced the behaviour of what is happening
through the Apache Ant code. As far as I can see it is not actually accessing
every file in every folder recursively. However, what it is doing, which is
unnecessary, is listing the contents of each sub-folder of your /slowdir. That
occurs because of this line of code in Ant, which lists the contents of the
first level of sub-folders to work out if they are directories or not:
https://github.com/apache/ant/blob/rel/1.10.12/src/main/org/apache/tools/ant/DirectoryScanner.java#L1239

IMHO this is not a "bug" per-se in Ant, as the function does what it is meant
to do, however it certainly does more work than it needs to when looking for
patterns in a sub-folder that don't require recursion. I would suggest opening
an issue with the Apache Ant project, and then an eXist-db issue and linking in
that to the Apache Ant issue. If this can get fixed upstream, it would be easy
for us to update to a newer version of Ant to solve this. (edited)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 66048] Directory Scanner extremely slow

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=66048

Jaikiran Pai <ja...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|                            |All

--- Comment #1 from Jaikiran Pai <ja...@apache.org> ---
Hello Len, would you be able to point us to the original discussion/issue where
this was discussed? That will give us a bit more context on what changes (if
any) might be necessary here.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 66048] Directory Scanner extremely slow

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=66048

--- Comment #3 from Len <le...@winequest.com> ---
One more piece of context.

/slowdir has a couple hundred files and is on a flash drive.  /slowdir/bigdir
is a symbolic link to a directory on a magnetic drive which has 1.8M image
files.

As Adam said, the scan is for pattern '*.*' in /slowdir, but Ant is also
querying all files in /slowdir/bigdir, even though that information is not
necessary and is not being used.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 66048] Directory Scanner extremely slow

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=66048

Stefan Bodewig <bo...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |1.10.13

--- Comment #5 from Stefan Bodewig <bo...@apache.org> ---
I believe commit eeacf50 in master should help with your situation. Is there
any chance you can use an Ant build from master and verify the slow directory
is not listed?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 66048] Directory Scanner extremely slow

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=66048

--- Comment #2 from Adam Retter <ad...@googlemail.com> ---
Hi Jaikiran, thanks for responding. I analyzed the initial issue that Len
reported here: https://github.com/eXist-db/exist/issues/4367

We are using Apache Ant 1.10.12

The underlying Apache Ant Java code that is actually getting invoked looks like
this:

Path baseDir = Paths.get("/slowdir");
String pattern = "*.*;
DirectoryScanner directoryScanner = new DirectoryScanner();
directoryScanner.setIncludes(new String[] { pattern });
directoryScanner.setBasedir(baseDir.toFile());
directoryScanner.setCaseSensitive(true);
directoryScanner.scan();

Kind regards. Adam.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 66048] Directory Scanner extremely slow

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=66048

--- Comment #4 from Stefan Bodewig <bo...@apache.org> ---
"it's always been that way" - this is not an excuse for not changing anything
but rather a warning that probably nobody of us remembers why the call to
list() is there.

One thing I know is that you can't restrict listing to files for which
File.isDirectory returns true - it will return false for symlinks to
directories and there may be other non-Directory list()able Files on strange
file systems (OpenVMS logicals?) I've forgotten.

We want to separate files from directories and at least at one point in time
invoking list() seemed to be the best option.

It is quite possible using NIO and reading file attributes is more reliable
today than Java 1.2 has been at the time the initial code has been written.
We'll have to triple-check changes in such a central and heavy used piece of
code, though.

-- 
You are receiving this mail because:
You are the assignee for the bug.