You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/01/21 20:33:01 UTC

[jira] [Commented] (NIFI-6999) Encrypt Config Toolkit fails on very large flow.xml.gz files

    [ https://issues.apache.org/jira/browse/NIFI-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17269618#comment-17269618 ] 

ASF subversion and git services commented on NIFI-6999:
-------------------------------------------------------

Commit 1c361d45ae94f155b6e2def7bd4430b1c9ca8b3b in nifi's branch refs/heads/main from Nathan Gough
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=1c361d4 ]

NIFI-6999 - Made changes to load flow.xml files using streams. Updated tests.

NIFI-6999 - Slight change to test to check for WARN message.

NIFI-6999 - Removed very large flow file and test that uses it. This test ran for about 2 minutes so was excessive to keep in. The other changed tests to handle streams proves the functionality. A large file can be used on the command line to manually test large flow files. Some other cleanup.

NIFI-6999 - Removed comments and altered the code a little bit for readability as per code review.

NIFI-6999 - Removed commented code

NIFI-6999 - Renamed variable and removed assert comment.

Signed-off-by: Nathan Gough <th...@gmail.com>

This closes #4715.


> Encrypt Config Toolkit fails on very large flow.xml.gz files
> ------------------------------------------------------------
>
>                 Key: NIFI-6999
>                 URL: https://issues.apache.org/jira/browse/NIFI-6999
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Tools and Build
>    Affects Versions: 1.2.0, 1.10.0
>            Reporter: Andy LoPresto
>            Assignee: Nathan Gough
>            Priority: Critical
>              Labels: documentation, encryption, heap, security, streaming, toolkit
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> A user reported failure when using the encrypt config toolkit to process (encrypt) a large {{flow.xml.gz}}. The compressed file was 49 MB, but was 687 MB uncompressed. It contained 545 encrypted values, and approximately 90 templates. This caused the toolkit to fail during {{loadFlowXml()}} unless the toolkit invocation set the heap to 8 GB via {{-Xms2g -Xmx8g}}. Even with the expanded heap, the serialization of the newly-encrypted flow XML to the file system fails with the following exception:
> {code}
> Exception in thread "main" java.lang.OutOfMemoryError: Requested array size exceeds VM limit
> at java.lang.StringCoding.encode(StringCoding.java:350)
> at java.lang.String.getBytes(String.java:941)
> at org.apache.commons.io.IOUtils.write(IOUtils.java:1857)
> at org.apache.commons.io.IOUtils$write$0.call(Unknown Source)
> at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
> at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
> at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:141)
> at org.apache.nifi.properties.ConfigEncryptionTool$_writeFlowXmlToFile_closure5$_closure20.doCall(ConfigEncryptionTool.groovy:692)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
> at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
> at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294)
> at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1019)
> at groovy.lang.Closure.call(Closure.java:426)
> at groovy.lang.Closure.call(Closure.java:442)
> at org.codehaus.groovy.runtime.IOGroovyMethods.withCloseable(IOGroovyMethods.java:1622)
> at org.codehaus.groovy.runtime.NioGroovyMethods.withCloseable(NioGroovyMethods.java:1754)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.codehaus.groovy.runtime.metaclass.ReflectionMetaMethod.invoke(ReflectionMetaMethod.java:54)
> at org.codehaus.groovy.runtime.metaclass.NewInstanceMetaMethod.invoke(NewInstanceMetaMethod.java:56)
> at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274)
> at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:56)
> at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
> at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
> at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
> at org.apache.nifi.properties.ConfigEncryptionTool$_writeFlowXmlToFile_closure5.doCall(ConfigEncryptionTool.groovy:691)
> {code}
> The immediate fix was to remove the duplicated template definitions in the flow definition, returning the file to a reasonable size. However, if run as an inline replacement, this can cause the {{flow.xml.gz}} to be overwritten with an empty file, potentially leading to data loss. The following steps should be taken:
> # Guard against loading/operating on/serializing large files (log statements, simple conditional checks)
> # Handle large files internally (change from direct {{String}} access to {{BufferedInputStream}}, etc.)
> # Document the internal memory usage of the toolkit in the toolkit guide
> # Document best practices and steps to resolve issue in the toolkit guide



--
This message was sent by Atlassian Jira
(v8.3.4#803005)