You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/12/09 19:43:00 UTC

[jira] [Commented] (NIFI-9382) Improve startup time when loading flow that uses many HDFS related processors

    [ https://issues.apache.org/jira/browse/NIFI-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456697#comment-17456697 ] 

ASF subversion and git services commented on NIFI-9382:
-------------------------------------------------------

Commit 97198e35a04c12e66684d9545ff24156d16c60f6 in nifi's branch refs/heads/main from Mark Payne
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=97198e3 ]

NIFI-9382: This closes #5584. Added system test that replicates issue in which a closed shared classloader causes issues when used again
NIFI-9382: Fixed issue with SharedInstanceClassLoader where the classloader may get closed but then get used again. When the SharedInstanceClassLoader is closed, we will now ensure that we don't use anymore and instead create a new one.

Signed-off-by: Joe Witt <jo...@apache.org>


> Improve startup time when loading flow that uses many HDFS related processors
> -----------------------------------------------------------------------------
>
>                 Key: NIFI-9382
>                 URL: https://issues.apache.org/jira/browse/NIFI-9382
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework, Extensions
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>             Fix For: 1.16.0
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> When starting NiFI, if a flow has many HDFS related processors (hundreds to thousands) the startup time can be very long. In one case, I have a user flow that has > 1000 HDFS processors and it takes 1-2 hours to fully start NiFi.
> This is because the HDFS makes a lot of assumptions about the environment that it's running in. These assumptions are not always true, unfortunately, when running in NiFi. The use of static methods in the UserGroupInformation class means that in order to interact with an HDFS cluster using multiple Kerberos Principals, we have to create ClassLoader isolation, using a separate, duplicate ClassLoader for each HDFS processor.
> Because of this, the HDFS client components must be initialized once for each processor, and the initialization of the client is very expensive. We need to improve this so that we don't create a separate ClassLoader that loads hundreds or thousands of classes for each instance of the Processor.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)