You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Rob Spoor (Jira)" <ji...@apache.org> on 2020/10/15 14:30:00 UTC
[jira] [Commented] (IO-597) FileUtils.iterateFiles goes out of
memory when executed for a directory with large number of files
[ https://issues.apache.org/jira/browse/IO-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17214739#comment-17214739 ]
Rob Spoor commented on IO-597:
------------------------------
Whatever solution uses {{File.listFiles}} will potentially have the same issue. The iterator could be improved for folders where the files are distributed among several sub folders, but as soon as a folder has a large number of files directly in it, the same problem can arise. This is just a flaw in {{File.listFiles}} - it collects files before returning them.
Any proper solution would need to make use of {{Files.newDirectoryStream}}. The issue here is that any {{DirectoryStream}} needs to be closed. That makes it difficult to implement a single iterator that will recurse into directories.
However, is this really needed? There already is a mechanism that creates a {{Stream}} over a directory, with nesting: {{Files.walk}}. There could be some more wrapper methods that use the parameters of the current {{listFiles}} and {{iterateFiles}} methods, and that ends with a {{map(Path::toFile)}} so it returns a {{Stream<File>}} and not a {{Stream<Path>}}.
> FileUtils.iterateFiles goes out of memory when executed for a directory with large number of files
> --------------------------------------------------------------------------------------------------
>
> Key: IO-597
> URL: https://issues.apache.org/jira/browse/IO-597
> Project: Commons IO
> Issue Type: Bug
> Components: Utilities
> Reporter: Arvind
> Priority: Major
>
> FileUtils.iterateFiles goes out of memory when executed for a directory with large number of files because it uses the listFiles method which returns an array of java.io.File objects. The iterator itself should not be derived from a list but from a Java Stream which will have lesser memory footprint. This feature however can be used only with Java 8 or later because streams were introduced only in Java 8.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)