You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kafka.apache.org by jk...@apache.org on 2014/03/06 04:52:35 UTC

svn commit: r1574759 - in /kafka/site/081: configuration.html design.html ops.html quickstart.html

Author: jkreps
Date: Thu Mar  6 03:52:35 2014
New Revision: 1574759

URL: http://svn.apache.org/r1574759
Log:
Misc. fixes suggested by Jun.


Modified:
    kafka/site/081/configuration.html
    kafka/site/081/design.html
    kafka/site/081/ops.html
    kafka/site/081/quickstart.html

Modified: kafka/site/081/configuration.html
URL: http://svn.apache.org/viewvc/kafka/site/081/configuration.html?rev=1574759&r1=1574758&r2=1574759&view=diff
==============================================================================
--- kafka/site/081/configuration.html (original)
+++ kafka/site/081/configuration.html Thu Mar  6 03:52:35 2014
@@ -43,7 +43,7 @@ Zookeeper also allows you to add a "chro
     <tr>
       <td>message.max.bytes</td>
       <td>1000000</td>
-      <td>The maximum size of a message that the server can receive. It is important that this property be in sync with the maximum fetch size your consumers use or else an unruly consumer will be able to publish messages too large for consumers to consume.</td>
+      <td>The maximum size of a message that the server can receive. It is important that this property be in sync with the maximum fetch size your consumers use or else an unruly producer will be able to publish messages too large for consumers to consume.</td>
     </tr>
     <tr>
       <td>num.network.threads</td>
@@ -107,12 +107,6 @@ Zookeeper also allows you to add a "chro
       <td>The default number of partitions per topic if a partition count isn't given at topic creation time.</td>
     </tr>
     <tr>
-      <td>max.message.bytes</td>
-      <td>1,000,000</td>
-      <td>message.max.bytes</td>
-      <td>This is largest message size Kafka will allow to be appended to this topic. Note that if you increase this size you must also increase your consumer's fetch size so they can fetch messages this large. This setting can be overridden on a per-topic basis (see <a href="#topic-config">the per-topic configuration section</a>).</td>
-    </tr>
-    <tr>
       <td>log.segment.bytes</td>
       <td nowrap>1024 * 1024 * 1024</td>
       <td>The log for a topic partition is stored as a directory of segment files. This setting controls the size to which a segment file will grow before a new segment is rolled over in the log. This setting can be overridden on a per-topic basis (see <a href="#topic-config">the per-topic configuration section</a>).</td>
@@ -135,7 +129,7 @@ Zookeeper also allows you to add a "chro
     <tr>
       <td>log.retention.bytes</td>
       <td>-1</td>
-      <td>The amount of data to retain in the log for each topic-partitions. Note that this is the limit per-partition so multiple by the number of partitions to get the total data retained for the topic. Also note that if both log.retention.hours and log.retention.bytes are both set we delete a segment when either limit is exceeded. This setting can be overridden on a per-topic basis (see <a href="#topic-config">the per-topic configuration section</a>).</td>
+      <td>The amount of data to retain in the log for each topic-partitions. Note that this is the limit per-partition so multiply by the number of partitions to get the total data retained for the topic. Also note that if both log.retention.hours and log.retention.bytes are both set we delete a segment when either limit is exceeded. This setting can be overridden on a per-topic basis (see <a href="#topic-config">the per-topic configuration section</a>).</td>
     </tr>
     <tr>
       <td>log.retention.check.interval.ms</td>
@@ -366,6 +360,12 @@ Overrides can also be changed or set lat
 <b> &gt; bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my-topic 
 	--config max.message.bytes=128000</b>
 </pre>
+
+To remove an override you can do
+<pre>
+<b> &gt; bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my-topic 
+	--deleteConfig max.message.bytes</b>
+</pre>
 	
 The following are the topic-level configurations. The server's default configuration for this property is given under the Server Default Property heading, setting this default in the server config allows you to change the default given to topics that have no override specified.
 <table class="data-table">

Modified: kafka/site/081/design.html
URL: http://svn.apache.org/viewvc/kafka/site/081/design.html?rev=1574759&r1=1574758&r2=1574759&view=diff
==============================================================================
--- kafka/site/081/design.html (original)
+++ kafka/site/081/design.html Thu Mar  6 03:52:35 2014
@@ -239,7 +239,7 @@ So far we have described only the simple
 <p>
 Let's start with a few examples of use cases that log updates, then we'll talk about how Kafka's log compaction supports these use cases.
 <ol>
-<li><i>Database change subscription</i>. It is often necessary to have a data set in multiple data systems, and often one of these systems is a database of some kind (either a RDBMS or perhaps a newfangled key-value store). For example you might have a database, a cache, a search cluster, and a Hadoop cluster. Each change to the database will need to be reflected in the cache, the search cluster, and eventually in Hadoop. In the case that one is only handling the real-time updates 
+<li><i>Database change subscription</i>. It is often necessary to have a data set in multiple data systems, and often one of these systems is a database of some kind (either a RDBMS or perhaps a new-fangled key-value store). For example you might have a database, a cache, a search cluster, and a Hadoop cluster. Each change to the database will need to be reflected in the cache, the search cluster, and eventually in Hadoop. In the case that one is only handling the real-time updates you only need recent log. But if you want to be able to reload the cache or restore a failed search node you may need a complete data set.
 <li><i>Event sourcing</i>. This is a style of application design which co-locates query processing with application design and uses a log of changes as the primary store for the application.
 <li><i>Journaling for high-availability</i>. A process that does local computation can be made fault-tolerant by logging out changes that it makes to it's local state so another process can reload these changes and carry on if it should fail. A concrete example of this is handling counts, aggregations, and other "group by"-like processing in a stream query system. Samza, a real-time stream-processing framework, <a href="http://samza.incubator.apache.org/learn/documentation/0.7.0/container/state-management.html">uses this feature</a> for exactly this purpose.
 </ol>
@@ -262,7 +262,7 @@ Here is a high-level picture that shows 
 <p>
 The head of the log is identical to a traditional Kafka log. It has dense, sequential offsets and retains all messages. Log compaction adds an option for handling the tail of the log. The picture above shows a log with a compacted tail. Note that the messages in the tail of the log retain the original offset assigned when they were first written&mdash;that never changes. Note also that all offsets remain valid positions in the log, even if the message with that offset has been compacted away; in this case this position is indistinguishable from the next highest offset that does appear in the log. For example, in the picture above the offsets 36, 37, and 38 are all equivalent positions and a read beginning at any of these offsets would return a message set beginning with 38.
 <p>
-Compaction also allows for deletes. A message with a key and a null payload will be treated as a delete from the log. This delete marker will cause any prior message with that key to be removed (as would any new message with that key), but delete markers are special in they will themselves be cleaned out of the log after a period of time to free up space. The point in time at which deletes are no longer retained is marked as the "delete retention point" in the above diagram.
+Compaction also allows for deletes. A message with a key and a null payload will be treated as a delete from the log. This delete marker will cause any prior message with that key to be removed (as would any new message with that key), but delete markers are special in that they will themselves be cleaned out of the log after a period of time to free up space. The point in time at which deletes are no longer retained is marked as the "delete retention point" in the above diagram.
 <p>
 The compaction is done in the background by periodically recopying log segments. Cleaning does not block reads and can be throttled to use no more than a configurable amount of I/O throughput to avoid impacting producers and consumers. The actual process of compacting a log segment looks something like this:
 <img src="/images/log_compaction.png">

Modified: kafka/site/081/ops.html
URL: http://svn.apache.org/viewvc/kafka/site/081/ops.html?rev=1574759&r1=1574758&r2=1574759&view=diff
==============================================================================
--- kafka/site/081/ops.html (original)
+++ kafka/site/081/ops.html Thu Mar  6 03:52:35 2014
@@ -3,7 +3,7 @@ Here is some information on actually run
 <h3><a id="datacenters">6.1 Datacenters</a></h3>
 Some deployments will need to manage a data pipeline that spans multiple datacenters. Our approach to this is to deploy a local Kafka cluster in each datacenter and machines in each location interact only with their local cluster.
 <p>
-For applications that need a global view of all data we use the <a href="/08/tools.html">mirror maker tool</a> to provide clusters which have aggregate data mirrored from all datacenters. These aggregator clusters are used for reads by applications that require this.
+For applications that need a global view of all data we use the <a href="#tools">mirror maker tool</a> to provide clusters which have aggregate data mirrored from all datacenters. These aggregator clusters are used for reads by applications that require this.
 <p>
 Likewise in order to support data load into Hadoop which resides in separate facilities we provide local read-only clusters that mirror the production data centers in the facilities where this data load occurs.
 <p>
@@ -16,11 +16,6 @@ This is not the only possible deployment
 It is generally not advisable to run a single Kafka cluster that spans multiple datacenters as this will incur very high replication latency both for Kafka writes and Zookeeper writes and neither Kafka nor Zookeeper will remain available if the network partitions.
 
 <h3><a id="config">6.2 Kafka Configuration</a></h3>
-Kafka 0.8 is the version we currently run. We are currently running with replication but with producers acks = 1. 
-<P>
-<h4><a id="serverconfig">Important Server Configurations</a></h4>
-
-The most important server configurations for performance are those that control the disk flush rate. The more often data is flushed to disk, the more "seek-bound" Kafka will be and the lower the throughput. However very low application flush rates can lead to high latency when the flush finally does occur (because of the volume of data that must be flushed). See the section below on application versus OS flush.
 
 <h4><a id="clientconfig">Important Client Configurations</a></h4>
 The most important producer configurations control

Modified: kafka/site/081/quickstart.html
URL: http://svn.apache.org/viewvc/kafka/site/081/quickstart.html?rev=1574759&r1=1574758&r2=1574759&view=diff
==============================================================================
--- kafka/site/081/quickstart.html (original)
+++ kafka/site/081/quickstart.html Thu Mar  6 03:52:35 2014
@@ -2,7 +2,7 @@
 
 <h4> Step 1: Download the code </h4>
 
-<a href="../downloads.html" title="Kafka downloads">Download</a> the 0.8 release.
+<a href="../downloads.html" title="Kafka downloads">Download</a> the 0.8.1 release.
 
 <pre>
 &gt; <b>tar xzf kafka-&lt;VERSION&gt;.tgz</b>