You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Benjamin Wood (JIRA)" <ji...@apache.org> on 2017/09/29 22:22:00 UTC

[jira] [Commented] (NIFI-3423) List based processors don't support source directories with high file count.

    [ https://issues.apache.org/jira/browse/NIFI-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16186596#comment-16186596 ] 

Benjamin Wood commented on NIFI-3423:
-------------------------------------

Looks like a potential fix would be to use
{{java.nio.file.newDirectoryStream}}
instead of
{{java.io.File.listFiles}}

listFiles is known to not handle large directories. DirectoryStream has a much smaller memory footprint and is a more efficient method.

> List based processors don't support source directories with high file count.
> ----------------------------------------------------------------------------
>
>                 Key: NIFI-3423
>                 URL: https://issues.apache.org/jira/browse/NIFI-3423
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>    Affects Versions: 1.1.1
>            Reporter: Matthew Clarke
>
> NiFi FlowFile attributes/metadata lives in heap.  The List based processors return a complete listing from the target and then creates a FlowFile for each File in that returned listing. The FlowFiles being created are not committed to the list processor's success relationship until all have been created.  So you end up running out of NiFi JVM heap memory before that can happen when the returned listing is very large.
> It would be nice if the list based processors could commit batches (10,000)  of FlowFiles at a time from the returned listing instead of trying to commit them all at once to help avoid heap exhaustion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)