You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by "Jim Belton (jabelton)" <ja...@cisco.com> on 2018/01/29 19:37:06 UTC

Performance of Avro C Encoding

Hi all:


I've written a single threaded program using the Avro C library to convert a log file from plain text to Avro. I followed the user documentation. My team are responsible for a high performance system that produces very large logs. At maximum load, we can generate about 160000 lines per second. These are rotated into log files that contain on the order of 20000 lines. Each line contains 14 fields and is encoded as an Avro record.


I do as little as possible in the inner loop. For each line, I made the following calls:

1. Calls to construct the fields: avro_int64, avro_string, avro_null, avro_union, avro_array.

2. A call to avro_record

3. Calls to avro_record_set to add the fields

4. A call to avro_file_writer_append to write the encoded line

5. Calls to avro_datum_decref to free the fields


The resulting program was very slow. By changing the call to avro_file_writer_append to a call to avro_write_data with writers_schema set to NULL, which disables verification of the record before writing, I was able to almost triple performance, but it still can't quite keep up to 160000 records/sec. I'm running on Debian 8 on an Intel Xeon E5-26230v4 (2.10GHz).


Am I calling the right functions? What should my performance expectations be?


Thanks,

Jim.