You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@ant.apache.org by bu...@apache.org on 2014/11/24 16:21:12 UTC

[Bug 57253] New: DirectoryScanner holds on to too many strings

https://issues.apache.org/bugzilla/show_bug.cgi?id=57253

            Bug ID: 57253
           Summary: DirectoryScanner holds on to too many strings
           Product: Ant
           Version: 1.8.4
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Core
          Assignee: notifications@ant.apache.org
          Reporter: jglick@apache.org

Found in 1.8.4 but I do not see any relevant changes since then. Upstream issue
is

  https://issues.jenkins-ci.org/browse/JENKINS-25759

When traversing a large directory structure looking for matches, the
filesNotIncluded, dirsNotIncluded, and scannedDirs fields of DirectoryScanner
(and probably also notFollowedSymlinks) grow to be quite large.

In the case of filesNotIncluded this is senseless if you never plan on calling
getNotIncludedFiles() etc. Even if you did, this method forces a slow scan to
be performed! Yet the field is built up during a fast scan too. Same for
dirsNotIncluded. Not sure about notFollowedSymlinks.

scannedDirs is trickier, because it actually seems to be used during the fast
scan, though
https://github.com/apache/ant/commit/9e89cec932a798d958cf6bb310936d9a00c09a9d
does not offer any explanation (or test) other than that it “avoids double
scanning”.

The partial workaround is to subclass DirectoryScanner and periodically clear
the filesNotIncluded and dirsNotIncluded fields. notFollowedSymlinks is private
but does not seem to hold anything in typical cases. scannedDirs is private but
(as above) it is unclear whether it is safe to clear it anyway. clearResults()
can clear all these, but also throws out things which are wanted
(filesIncluded, for example), and setting these fields back to their former
values after calling clearResults() seems risky.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 57253] DirectoryScanner holds on to too many strings

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=57253

--- Comment #1 from Stefan Bodewig <bo...@apache.org> ---
It's not quite fair to ask me to explain a code change I've created more than
eleven years ago ;-)

Looking through my archives this was the first time we tried various options
speeding up DirectoryScanner - this has been long before we introduced
TokenizedPath/Pattern.  It is quite possible later changes rendered the
hasBeenScanned test redundant.  I'll perform some experiments.

Note getScannedDirs is used by a test (and nothing else AFAICT), so if you
wanted to remove the extra check, this will need to be adapted.

I haven't thought through all the reasons we keep the structures you mention,
will do so later today, just wanted to provide an early feedback.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 57253] DirectoryScanner holds on to too many strings

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=57253

--- Comment #2 from Stefan Bodewig <bo...@apache.org> ---
If you look at how slowScan works you'll see the code needs to record
not-included and excluded files and dirs it encounters during a fast scan.  The
slow scan only scans the excluded/not-included dirs and would miss all the
files and directories that have been seen previously.

We could avoid holding on to those files and dirs at the cost of making the
slow scan even slower (it would have to traverse the whole filesystem again
rather than just the parts that haven't been scanned already).

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 57253] DirectoryScanner holds on to too many strings

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=57253

--- Comment #4 from Stefan Bodewig <bo...@apache.org> ---
Yes, I understand that, I was just pointing out why the code is what it is and
what consequences a change would have.

Unfortunately we don't have any proper way to tell DirectoryScanner whether we
intend to ever use the slow scan results.

AFAICT Ant itself never uses the results of a slow-scan, only <delete> needs
the list of not-followed symlinks.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 57253] DirectoryScanner holds on to too many strings

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=57253

Jesse Glick <jg...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jglick@apache.org

--- Comment #3 from Jesse Glick <jg...@apache.org> ---
(In reply to Stefan Bodewig from comment #2)
> We could avoid holding on to those files and dirs at the cost of making the
> slow scan even slower

For a caller which does not intend to use the slow scan mode at all, this would
be fine. The problem is that the current API forces a mode of operation which
uses unbounded heap.

-- 
You are receiving this mail because:
You are the assignee for the bug.