You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kafka.apache.org by gu...@apache.org on 2017/11/15 22:32:06 UTC

kafka git commit: MINOR: kafka-site introduction section improvements

Repository: kafka
Updated Branches:
  refs/heads/trunk 54371e63d -> 48f5f048b


MINOR: kafka-site introduction section improvements

*Clarify multi-tenant support, geo-replication, and some grammar fixes.*

Author: Joel Hamill <jo...@users.noreply.github.com>

Reviewers: GUozhang Wang

Closes #4212 from joel-hamill/intro-cleanup


Project: http://git-wip-us.apache.org/repos/asf/kafka/repo
Commit: http://git-wip-us.apache.org/repos/asf/kafka/commit/48f5f048
Tree: http://git-wip-us.apache.org/repos/asf/kafka/tree/48f5f048
Diff: http://git-wip-us.apache.org/repos/asf/kafka/diff/48f5f048

Branch: refs/heads/trunk
Commit: 48f5f048bc6fd5e059cd1311eb8428f0c1f088e8
Parents: 54371e6
Author: Joel Hamill <jo...@users.noreply.github.com>
Authored: Wed Nov 15 14:32:00 2017 -0800
Committer: Guozhang Wang <wa...@gmail.com>
Committed: Wed Nov 15 14:32:00 2017 -0800

----------------------------------------------------------------------
 docs/introduction.html | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kafka/blob/48f5f048/docs/introduction.html
----------------------------------------------------------------------
diff --git a/docs/introduction.html b/docs/introduction.html
index 5b3bb4a..7f4c3e2 100644
--- a/docs/introduction.html
+++ b/docs/introduction.html
@@ -19,22 +19,21 @@
 
 <script id="introduction-template" type="text/x-handlebars-template">
   <h3> Apache Kafka&reg; is <i>a distributed streaming platform</i>. What exactly does that mean?</h3>
-  <p>We think of a streaming platform as having three key capabilities:</p>
-  <ol>
-    <li>It lets you publish and subscribe to streams of records. In this respect it is similar to a message queue or enterprise messaging system.
-    <li>It lets you store streams of records in a fault-tolerant way.
-    <li>It lets you process streams of records as they occur.
-  </ol>
-  <p>What is Kafka good for?</p>
-  <p>It gets used for two broad classes of application:</p>
-  <ol>
+  <p>A streaming platform has three key capabilities:</p>
+  <ul>
+    <li>Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system.
+    <li>Store streams of records in a fault-tolerant durable way.
+    <li>Process streams of records as they occur.
+  </ul>
+  <p>Kafka is generally used for two broad classes of applications:</p>
+  <ul>
     <li>Building real-time streaming data pipelines that reliably get data between systems or applications
     <li>Building real-time streaming applications that transform or react to the streams of data
-  </ol>
+  </ul>
   <p>To understand how Kafka does these things, let's dive in and explore Kafka's capabilities from the bottom up.</p>
   <p>First a few concepts:</p>
   <ul>
-    <li>Kafka is run as a cluster on one or more servers.
+    <li>Kafka is run as a cluster on one or more servers that can span multiple datacenters.
       <li>The Kafka cluster stores streams of <i>records</i> in categories called <i>topics</i>.
     <li>Each record consists of a key, a value, and a timestamp.
   </ul>
@@ -60,7 +59,7 @@
   <p> Each partition is an ordered, immutable sequence of records that is continually appended to&mdash;a structured commit log. The records in the partitions are each assigned a sequential id number called the <i>offset</i> that uniquely identifies each record within the partition.
   </p>
   <p>
-  The Kafka cluster retains all published records&mdash;whether or not they have been consumed&mdash;using a configurable retention period. For example, if the retention policy is set to two days, then for the two days after a record is published, it is available for consumption, after which it will be discarded to free up space. Kafka's performance is effectively constant with respect to data size so storing data for a long time is not a problem.
+  The Kafka cluster durably persists all published records&mdash;whether or not they have been consumed&mdash;using a configurable retention period. For example, if the retention policy is set to two days, then for the two days after a record is published, it is available for consumption, after which it will be discarded to free up space. Kafka's performance is effectively constant with respect to data size so storing data for a long time is not a problem.
   </p>
   <img class="centered" src="/{{version}}/images/log_consumer.png" style="width:400px">
   <p>
@@ -82,6 +81,10 @@
   Each partition has one server which acts as the "leader" and zero or more servers which act as "followers". The leader handles all read and write requests for the partition while the followers passively replicate the leader. If the leader fails, one of the followers will automatically become the new leader. Each server acts as a leader for some of its partitions and a follower for others so load is well balanced within the cluster.
   </p>
 
+  <h4><a id="intro_geo-replication" href="#intro_geo-replication">Geo-Replication</a></h4>
+
+  <p>Kafka MirrorMaker provides geo-replication support for your clusters. With MirrorMaker, messages are replicated across multiple datacenters or cloud regions. You can use this in active/passive scenarios for backup and recovery; or in active/active scenarios to place data closer to your users, or support data locality requirements. </p>
+
   <h4><a id="intro_producers" href="#intro_producers">Producers</a></h4>
   <p>
   Producers publish data to the topics of their choice. The producer is responsible for choosing which record to assign to which partition within the topic. This can be done in a round-robin fashion simply to balance load or it can be done according to some semantic partition function (say based on some key in the record). More on the use of partitioning in a second!
@@ -111,6 +114,8 @@
   <p>
   Kafka only provides a total order over records <i>within</i> a partition, not between different partitions in a topic. Per-partition ordering combined with the ability to partition data by key is sufficient for most applications. However, if you require a total order over records this can be achieved with a topic that has only one partition, though this will mean only one consumer process per consumer group.
   </p>
+  <h4><a id="intro_multi-tenancy" href="#intro_multi-tenancy">Multi-tenancy</a></h4>
+  <p>You can deploy Kafka as a multi-tenant solution. Multi-tenancy is enabled by configuring which topics can produce or consume data. There is also operations support for quotas.  Administrators can define and enforce quotas on requests to control the broker resources that are used by clients.  For more information, see the <a href="https://kafka.apache.org/documentation/#security">security documentation</a>. </p>
   <h4><a id="intro_guarantees" href="#intro_guarantees">Guarantees</a></h4>
   <p>
   At a high-level Kafka gives the following guarantees: