You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Rob Spoor (Jira)" <ji...@apache.org> on 2020/10/15 14:30:00 UTC

[jira] [Commented] (IO-597) FileUtils.iterateFiles goes out of memory when executed for a directory with large number of files

    [ https://issues.apache.org/jira/browse/IO-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17214739#comment-17214739 ] 

Rob Spoor commented on IO-597:
------------------------------

Whatever solution uses {{File.listFiles}} will potentially have the same issue. The iterator could be improved for folders where the files are distributed among several sub folders, but as soon as a folder has a large number of files directly in it, the same problem can arise. This is just a flaw in {{File.listFiles}} - it collects files before returning them.

Any proper solution would need to make use of {{Files.newDirectoryStream}}. The issue here is that any {{DirectoryStream}} needs to be closed. That makes it difficult to implement a single iterator that will recurse into directories.

However, is this really needed? There already is a mechanism that creates a {{Stream}} over a directory, with nesting: {{Files.walk}}. There could be some more wrapper methods that use the parameters of the current {{listFiles}} and {{iterateFiles}} methods, and that ends with a {{map(Path::toFile)}} so it returns a {{Stream<File>}} and not a {{Stream<Path>}}.

> FileUtils.iterateFiles goes out of memory when executed for a directory with large number of files
> --------------------------------------------------------------------------------------------------
>
>                 Key: IO-597
>                 URL: https://issues.apache.org/jira/browse/IO-597
>             Project: Commons IO
>          Issue Type: Bug
>          Components: Utilities
>            Reporter: Arvind
>            Priority: Major
>
> FileUtils.iterateFiles goes out of memory when executed for a directory with large number of files because it uses the listFiles method which returns an array of java.io.File objects. The iterator itself should not be derived from a list but from a Java Stream which will have lesser memory footprint. This feature however can be used only with Java 8 or later because streams were introduced only in Java 8.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)