You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Mark Payne (JIRA)" <ji...@apache.org> on 2018/01/18 21:22:00 UTC

[jira] [Created] (NIFI-4794) Improve Garbage Collection required by Provenance Repository

Mark Payne created NIFI-4794:
--------------------------------

             Summary: Improve Garbage Collection required by Provenance Repository
                 Key: NIFI-4794
                 URL: https://issues.apache.org/jira/browse/NIFI-4794
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework
            Reporter: Mark Payne
            Assignee: Mark Payne


The EventIdFirstSchemaRecordWriter that is used by the provenance repository has a writeRecord(ProvenanceEventRecord) method. Within this method, it serializes the given record into a byte array by serializing to a ByteArrayOutputStream (after wrapping the BAOS in a DataOutputStream). Once this is done, it calls toByteArray() on that BAOS so that it can write the byte[] directly to another OutputStream.

This can create a rather large amount of garbage to be collected. We can improve this significantly:
 # Instead of creating a new ByteArrayOutputStream each time, create a pool of them. This avoids constantly having to garbage collect them.
 # If said BAOS grows beyond a certain size, we should not return it to the pool because we don't want to keep a huge impact on the heap.
 # Instead of wrapping the BAOS in a new DataOutputStream, the DataOutputStream should be pooled/recycled as well. Since it must create an internal byte[] for the writeUTF method, this can save a significant amount of garbage.
 # Avoid calling ByteArrayOutputStream.toByteArray(). We can instead just use ByteArrayOutputStream.writeTo(OutputStream). This avoids both allocating that new array/copying the data, and the GC overhead.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)