You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Mark Payne (JIRA)" <ji...@apache.org> on 2018/01/18 21:22:00 UTC
[jira] [Created] (NIFI-4794) Improve Garbage Collection required by
Provenance Repository
Mark Payne created NIFI-4794:
--------------------------------
Summary: Improve Garbage Collection required by Provenance Repository
Key: NIFI-4794
URL: https://issues.apache.org/jira/browse/NIFI-4794
Project: Apache NiFi
Issue Type: Improvement
Components: Core Framework
Reporter: Mark Payne
Assignee: Mark Payne
The EventIdFirstSchemaRecordWriter that is used by the provenance repository has a writeRecord(ProvenanceEventRecord) method. Within this method, it serializes the given record into a byte array by serializing to a ByteArrayOutputStream (after wrapping the BAOS in a DataOutputStream). Once this is done, it calls toByteArray() on that BAOS so that it can write the byte[] directly to another OutputStream.
This can create a rather large amount of garbage to be collected. We can improve this significantly:
# Instead of creating a new ByteArrayOutputStream each time, create a pool of them. This avoids constantly having to garbage collect them.
# If said BAOS grows beyond a certain size, we should not return it to the pool because we don't want to keep a huge impact on the heap.
# Instead of wrapping the BAOS in a new DataOutputStream, the DataOutputStream should be pooled/recycled as well. Since it must create an internal byte[] for the writeUTF method, this can save a significant amount of garbage.
# Avoid calling ByteArrayOutputStream.toByteArray(). We can instead just use ByteArrayOutputStream.writeTo(OutputStream). This avoids both allocating that new array/copying the data, and the GC overhead.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)