You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Jay Kreps <ja...@gmail.com> on 2012/09/24 23:12:08 UTC

byte copies in compression code

Hey folks,

I noticed that the code path for compressed messages does a very large
number of data copies:
1. One to turn message contents to Messages
2. Then again to write all the messages into a ByteBuffer
3. Then this ByteBuffer is copied into an intermediate buffer and from
there into an unsized ByteArrayOutputStream. Since it is unsized this may
internally resize and copy several times over as the internal buffer grows.
4. Then again to copy the final contents of the ByteArrayOutputStream into
a Message
5. Then again to another ByteBuffer to add the 4 byte size delimeter

Since this is really on the core data path, I would like to ask people to
be a little more careful!

A few of these are easy to fix and can be eliminated as part of KAFKA-506.

I filed a bug to to optimize this further:
  https://issues.apache.org/jira/browse/KAFKA-527

-Jay