You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/06/15 14:07:00 UTC

[jira] [Commented] (NIFI-8609) Improve efficiency of converting Record object to Avro GenericRecord object

    [ https://issues.apache.org/jira/browse/NIFI-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363660#comment-17363660 ] 

ASF subversion and git services commented on NIFI-8609:
-------------------------------------------------------

Commit e06afbdd22139a1744e83b46ce1d6e9c78bbdc36 in nifi's branch refs/heads/main from Mark Payne
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=e06afbd ]

NIFI-8609: Optimized AvroTypeUtil Record creation and conversion

Added unit test that is ignored so that it can be manually run for testing performance before/after changes to AvroTypeUtil. Updated AvroTypeUtil to be more efficient by not using Record.getValue() and instead iterating over the Map of values directly. getValue() is less efficient here because we know the RecordField's we are iterating over exist in the schema since they are retrieved from there directly; as a result, any null values still have be looked up by aliaases, but that step can be skipped in this situation. Also avoided looking for fields that exist in Avro Schema and not in RecordSchema just to set default values on GenericRecord - there's no need to set them if they are default values.

This closes #5080

Signed-off-by: David Handermann <ex...@apache.org>


> Improve efficiency of converting Record object to Avro GenericRecord object
> ---------------------------------------------------------------------------
>
>                 Key: NIFI-8609
>                 URL: https://issues.apache.org/jira/browse/NIFI-8609
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>             Fix For: 1.14.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> During some performance tests and profiling, I found that for a given flow, pushing Avro records to Kafka, one of the most expensive parts of the flow was converting our Record (MapRecord) object into a GenericRecord object for the Avro Writer.
> I created a simple unit test to determine a baseline for performance numbers before making any changes. The unit test creates a Record with 100 null String fields, half of which have a {{null}} value assigned to them. I then converted the record into an Avro GenericRecord via {{AvroTypeUtil.createAvroRecord(record, avroSchema);}} in a loop of 1,000,000 iterations and output how long it took; this was then repeated 1,000 times in order to allow the JVM to warm up.
> Numbers on my Macbook Pro showed after the first few iterations that the amount of time needed to convert 1 million records was on the order of 4.5 seconds.
> After updating the code, performance numbers are just under 2 seconds. So somewhere on the order of 2x better performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)