You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kafka.apache.org by jg...@apache.org on 2017/06/21 21:04:24 UTC

kafka git commit: MINOR: Detail message/batch size implications for conversion between old and new formats

Repository: kafka
Updated Branches:
  refs/heads/trunk f848e2cd6 -> e6e263174


MINOR: Detail message/batch size implications for conversion between old and new formats

Author: Jason Gustafson <ja...@confluent.io>

Reviewers: Ismael Juma <is...@juma.me.uk>

Closes #3373 from hachikuji/fetch-size-upgrade-notes


Project: http://git-wip-us.apache.org/repos/asf/kafka/repo
Commit: http://git-wip-us.apache.org/repos/asf/kafka/commit/e6e26317
Tree: http://git-wip-us.apache.org/repos/asf/kafka/tree/e6e26317
Diff: http://git-wip-us.apache.org/repos/asf/kafka/diff/e6e26317

Branch: refs/heads/trunk
Commit: e6e263174300ffab05676790f2a6c963ba24e5c9
Parents: f848e2c
Author: Jason Gustafson <ja...@confluent.io>
Authored: Wed Jun 21 14:04:19 2017 -0700
Committer: Jason Gustafson <ja...@confluent.io>
Committed: Wed Jun 21 14:04:19 2017 -0700

----------------------------------------------------------------------
 docs/upgrade.html | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kafka/blob/e6e26317/docs/upgrade.html
----------------------------------------------------------------------
diff --git a/docs/upgrade.html b/docs/upgrade.html
index 3b65fec..98c749c 100644
--- a/docs/upgrade.html
+++ b/docs/upgrade.html
@@ -80,10 +80,12 @@
     <li> Similarly, when compressing data with gzip, the producer and broker will use 8 KB instead of 1 KB as the buffer size. The default
          for gzip is excessively low (512 bytes). </li>
     <li>The broker configuration <code>max.message.bytes</code> now applies to the total size of a batch of messages.
-        Previously the setting applied to batches of compressed messages, or to non-compressed messages individually. In practice,
-        the change is minor since a message batch may consist of only a single message, so the limitation on the size of
-        individual messages is only reduced by the overhead of the batch format. This similarly affects the
-        producer's <code>batch.size</code> configuration.</li>
+        Previously the setting applied to batches of compressed messages, or to non-compressed messages individually.
+        A message batch may consist of only a single message, so in most cases, the limitation on the size of
+        individual messages is only reduced by the overhead of the batch format. However, there are some subtle implications
+        for message format conversion (see <a href="#upgrade_11_message_format">below</a> for more detail). Note also
+        that while previously the broker would ensure that at least one message is returned in each fetch request (regardless of the
+        total and partition-level fetch sizes), the same behavior now applies to one message batch.</li> 
     <li>GC log rotation is enabled by default, see KAFKA-3754 for details.</li>
     <li>Deprecated constructors of RecordMetadata, MetricName and Cluster classes have been removed.</li>
     <li>Added user headers support through a new Headers interface providing user headers read and write access.</li>
@@ -149,6 +151,18 @@
   initial performance analysis of the new message format. You can also find more detail on the message format in the
   <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging#KIP-98-ExactlyOnceDeliveryandTransactionalMessaging-MessageFormat">KIP-98</a> proposal.
 </p>
+<p>One of the notable differences in the new message format is that even uncompressed messages are stored together as a single batch.
+  This has a few implications for the broker configuration <code>max.message.bytes</code>, which limits the size of a single batch. First,
+  if an older client produces messages to a topic partition using the old format, and the messages are individually smaller than
+  <code>max.message.bytes</code>, the broker may still reject them after they are merged into a single batch during the up-conversion process.
+  Generally this can happen when the aggregate size of the individual messages is larger than <code>max.message.bytes</code>. There is a similar
+  effect for older consumers reading messages down-converted from the new format: if the fetch size is not set at least as large as
+  <code>max.message.bytes</code>, the consumer may not be able to make progress even if the individual uncompressed messages are smaller
+  than the configured fetch size. This behavior does not impact the Java client for 0.10.1.0 and later since it uses an updated fetch protocol
+  which ensures that at least one message can be returned even if it exceeds the fetch size. To get around these problems, you should ensure
+  1) that the producer's batch size is not set larger than <code>max.message.bytes</code>, and 2) that the consumer's fetch size is set at
+  least as large as <code>max.message.bytes</code>.
+</p>
 <p>Most of the discussion on the performance impact of <a href="#upgrade_10_performance_impact">upgrading to the 0.10.0 message format</a>
   remains pertinent to the 0.11.0 upgrade. This mainly affects clusters that are not secured with TLS since "zero-copy" transfer
   is already not possible in that case. In order to avoid the cost of down-conversion, you should ensure that consumer applications