You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@avro.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/01/17 18:31:00 UTC

[jira] [Commented] (AVRO-2300) Enhance encoder to track the total number of bytes written

    [ https://issues.apache.org/jira/browse/AVRO-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745368#comment-16745368 ] 

ASF subversion and git services commented on AVRO-2300:
-------------------------------------------------------

Commit 6d0323cc950f0b05a7e30a81fa5970df312be736 in avro's branch refs/heads/master from Thiruvalluvan M. G.
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=6d0323c ]

Merge pull request #429 from thiru-apache/AVRO-2300

Added byteCount() to Encoder interface in C++

> Enhance encoder to track the total number of bytes written
> ----------------------------------------------------------
>
>                 Key: AVRO-2300
>                 URL: https://issues.apache.org/jira/browse/AVRO-2300
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: c++
>            Reporter: Tory McKeag
>            Priority: Major
>
> I'd like to enhance the Encoder API so that it can track and report the number of bytes actually written out since init() has been called.  I'll explain my use case below:
> I'm using the Avro C++ library to publish messages to Kafka using librdkafka ([https://github.com/edenhill/librdkafka]).  I did an initial implementation using MemoryOutputStream (via avro::memoryOutputStream() of course).  After some tuning I ended up creating a couple custom implementations of avro::OutputStream to improve performance, but like the built-in MemoryOutputStream they all suffer from the same limitation:
> I send a buffer to the Kafka API to be published, but I have to tell Kafka the *whole* length of the buffer, because I don't have a way to track the number of bytes that Avro actually wrote.  For example, given a chunk size of 50, if Avro serialized 80 bytes of data, then the buffer will be of size 100.  Since that's the size I get, I tell librdkafka to publish 100 bytes.  The system works, but we have to pay for I/O and storage of publishing 20 bytes of garbage.  It doesn't seem like a lot, but as we examine message throughput at our higher volumes it is significant.  
> I would want to tell librdkafka to only publish 80 bytes in this example, but to do so I would have to have a way to determine how many bytes Avro actually wrote out.  My first guess as a user is that this should be available through the Encoder, because it would have to be part of the API, although looking at the code it seems to me like the state already exists in StreamWriter and would just need to be exposed.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)