You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2018/04/20 20:02:01 UTC

[GitHub] nicmichael opened a new pull request #1361: Issue #791: Eliminate byte[] copies in AddEntry code path

nicmichael opened a new pull request #1361: Issue #791: Eliminate byte[] copies in AddEntry code path
URL: https://github.com/apache/bookkeeper/pull/1361
 
 
   The AddEntry code path in Bookkeeper Client performs several copies of the send buffer before sending it out to bookies, which not only costs CPU cycles for array copying, but also drives up object allocation rate and thus increases young GC frequency.
   
   Currently, the send buffer is prepared by DigestManager.computeDigestAndPackageForSending(), which wraps the netty ByteBuf containing the data together with a (direct) header buffer into a ByteBufList. So far, we haven't yet allocated or copied any new buffers. But in order to send the buffer to the bookies, PendingAddOp.safeRun() will invoke sendWriteRequest() for each bookie in the write set, which will eventually invoke PerChannelBookieClient.addEntry() for each bookie. In here, we are now allocating two new byte[] and copying the buffer twice per bookie for V3 protocol with protobuf:
       byte[] toSendArray = toSend.toArray();      // 1st byte[] allocation and copy
       .setBody(ByteString.copyFrom(toSendArray)); // 2nd byte[] allocation and copy
   If for example we're using a write set size of 3, then we allocate 6 byte arrays and copy the same buffer 6 times.
   
   All 6 copies can be eliminated if DigestManager.computeDigestAndPackageForSending() would return a ByteBuf (or ByteBufList) that is backed by an array, which we can access and use without further copying it. PerChannelBookieClient.addEntry() could then wrap this array into a protobuf ByteString without performing any additional copies.
   
   This change eliminates all byte[] allocations and copies in PCBC by:
   1. allocating a new (pooled) array-backed heap ByteBuf for header, digest, and data in DigestManager.computeDigestAndPackageForSending()
   2. copying the data payload into this newly allocated ByteBuf and returning it wrapped into a ByteBufList of size 1 (instead of header + body separately)
   3. enhancing ByteBufList to give access to the underlying array if it consists of only a single ByteBuf that's backed by an array
   4. extracting the underlying array in PerChannelBookieClient.addEntry() from ByteBufList and wrapping it into protobuf ByteString without copying it.
   
   With this change, no more object allocations are required in this code path. DigestManager uses a pooled array (1). We're adding one arraycopy in (2) compared to the original code path, trading this against the allocations and copies we save later on.. Later steps, in particular step (4), is free of object allocation and array copies.
   As a result, the object allocation rate in the AddEntry code path is reduced significantly. In case of a write set size of 3, it is reduced from 6 to 0. This translates into much less frequent young generation garbage collections (in my experiments with write-heavy workload by ~4x overall).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services