You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kafka.apache.org by gw...@apache.org on 2016/01/09 03:46:42 UTC

kafka-site git commit: Minor updates to api, design, security and upgrade pages

Repository: kafka-site
Updated Branches:
  refs/heads/asf-site 97a7d564b -> d0ddbb47b


Minor updates to api, design, security and upgrade pages

Changes copied from 0.9.0 branch of the kafka repo.


Project: http://git-wip-us.apache.org/repos/asf/kafka-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/kafka-site/commit/d0ddbb47
Tree: http://git-wip-us.apache.org/repos/asf/kafka-site/tree/d0ddbb47
Diff: http://git-wip-us.apache.org/repos/asf/kafka-site/diff/d0ddbb47

Branch: refs/heads/asf-site
Commit: d0ddbb47b4fb1515f5d6933d7b753dc22077ded8
Parents: 97a7d56
Author: Ismael Juma <is...@juma.me.uk>
Authored: Sat Jan 9 02:26:23 2016 +0000
Committer: Ismael Juma <is...@juma.me.uk>
Committed: Sat Jan 9 02:31:21 2016 +0000

----------------------------------------------------------------------
 090/api.html      |  2 +-
 090/design.html   |  8 +++---
 090/security.html | 67 +++++++++++++++++++++++++-------------------------
 090/upgrade.html  | 11 ++++++---
 4 files changed, 47 insertions(+), 41 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kafka-site/blob/d0ddbb47/090/api.html
----------------------------------------------------------------------
diff --git a/090/api.html b/090/api.html
index 87ba559..861eec3 100644
--- a/090/api.html
+++ b/090/api.html
@@ -39,7 +39,7 @@ here</a>.
 <h3><a id="consumerapi" href="#consumerapi">2.2 Consumer API</a></h3>
 
 As of the 0.9.0 release we have added a new Java consumer to replace our existing high-level ZooKeeper-based consumer
-and low-level consumer APIs. This client is considered beta quality. To ensure a smooth upgrade paths
+and low-level consumer APIs. This client is considered beta quality. To ensure a smooth upgrade path
 for users, we still maintain the old 0.8 consumer clients that continue to work on an 0.9 Kafka cluster.
 
 In the following sections we introduce both the old 0.8 consumer APIs (both high-level ConsumerConnector and low-level SimpleConsumer)

http://git-wip-us.apache.org/repos/asf/kafka-site/blob/d0ddbb47/090/design.html
----------------------------------------------------------------------
diff --git a/090/design.html b/090/design.html
index 5d3090c..439e087 100644
--- a/090/design.html
+++ b/090/design.html
@@ -27,9 +27,9 @@ It also meant the system would have to handle low-latency delivery to handle mor
 <p>
 We wanted to support partitioned, distributed, real-time processing of these feeds to create new, derived feeds. This motivated our partitioning and consumer model.
 <p>
-Finally in cases where the stream is fed into other data systems for serving we knew the system would have to be able to guarantee fault-tolerance in the presence of machine failures.
+Finally in cases where the stream is fed into other data systems for serving, we knew the system would have to be able to guarantee fault-tolerance in the presence of machine failures.
 <p>
-Supporting these uses led use to a design with a number of unique elements, more akin to a database log then a traditional messaging system. We will outline some elements of the design in the following sections.
+Supporting these uses led us to a design with a number of unique elements, more akin to a database log than a traditional messaging system. We will outline some elements of the design in the following sections.
 
 <h3><a id="persistence" href="#persistence">4.2 Persistence</a></h3>
 <h4><a id="design_filesystem" href="#design_filesystem">Don't fear the filesystem!</a></h4>
@@ -210,7 +210,7 @@ Kafka will remain available in the presence of node failures after a short fail-
 
 At its heart a Kafka partition is a replicated log. The replicated log is one of the most basic primitives in distributed data systems, and there are many approaches for implementing one. A replicated log can be used by other systems as a primitive for implementing other distributed systems in the <a href="http://en.wikipedia.org/wiki/State_machine_replication">state-machine style</a>.
 <p>
-A replicated log models the process of coming into consensus on the order of a series of values (generally numbering the log entries 0, 1, 2, ...). There are many ways to implement this, but the simplest and fastest is with a leader who chooses the ordering of values provided to it. As long as the leader remains alive, all followers need to only copy the values and ordering, the leader chooses.
+A replicated log models the process of coming into consensus on the order of a series of values (generally numbering the log entries 0, 1, 2, ...). There are many ways to implement this, but the simplest and fastest is with a leader who chooses the ordering of values provided to it. As long as the leader remains alive, all followers need to only copy the values and ordering the leader chooses.
 <p>
 Of course if leaders didn't fail we wouldn't need followers! When the leader does die we need to choose a new leader from among the followers. But followers themselves may fall behind or crash so we must ensure we choose an up-to-date follower. The fundamental guarantee a log replication algorithm must provide is that if we tell the client a message is committed, and the leader fails, the new leader we elect must also have that message. This yields a tradeoff: if the leader waits for more followers to acknowledge a message before declaring it committed then there will be more potentially electable leaders.
 <p>
@@ -232,7 +232,7 @@ Another important design distinction is that Kafka does not require that crashed
 
 <h4><a id="design_uncleanleader" href="#design_uncleanleader">Unclean leader election: What if they all die?</a></h4>
 
-Note that Kafka's guarantee with respect to data loss is predicated on at least on replica remaining in sync. If all the nodes replicating a partition die, this guarantee no longer holds.
+Note that Kafka's guarantee with respect to data loss is predicated on at least one replica remaining in sync. If all the nodes replicating a partition die, this guarantee no longer holds.
 <p>
 However a practical system needs to do something reasonable when all the replicas die. If you are unlucky enough to have this occur, it is important to consider what will happen. There are two behaviors that could be implemented:
 <ol>

http://git-wip-us.apache.org/repos/asf/kafka-site/blob/d0ddbb47/090/security.html
----------------------------------------------------------------------
diff --git a/090/security.html b/090/security.html
index 848031b..3acbbac 100644
--- a/090/security.html
+++ b/090/security.html
@@ -113,23 +113,23 @@ Apache Kafka allows clients to connect over SSL. By default SSL is disabled but
 
         Following SSL configs are needed on the broker side
         <pre>
-        ssl.keystore.location = /var/private/ssl/kafka.server.keystore.jks
-        ssl.keystore.password = test1234
-        ssl.key.password = test1234
-        ssl.truststore.location = /var/private/ssl/kafka.server.truststore.jks
-        ssl.truststore.password = test1234</pre>
+        ssl.keystore.location=/var/private/ssl/kafka.server.keystore.jks
+        ssl.keystore.password=test1234
+        ssl.key.password=test1234
+        ssl.truststore.location=/var/private/ssl/kafka.server.truststore.jks
+        ssl.truststore.password=test1234</pre>
 
         Optional settings that are worth considering:
         <ol>
-            <li>ssl.client.auth = none ("required" => client authentication is required, "requested" => client authentication is requested and client without certs can still connect. The usage of "requested" is discouraged as it provides a false sense of security and misconfigured clients will still connect successfully.)</li>
-            <li>ssl.cipher.suites = A cipher suite is a named combination of authentication, encryption, MAC and key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network protocol. (Default is an empty list)</li>
-            <li>ssl.enabled.protocols = TLSv1.2,TLSv1.1,TLSv1 (list out the SSL protocols that you are going to accept from clients. Do note that SSL is deprecated in favor of TLS and using SSL in production is not recommended)</li>
-            <li>ssl.keystore.type = JKS</li>
-            <li>ssl.truststore.type = JKS</li>
+            <li>ssl.client.auth=none ("required" => client authentication is required, "requested" => client authentication is requested and client without certs can still connect. The usage of "requested" is discouraged as it provides a false sense of security and misconfigured clients will still connect successfully.)</li>
+            <li>ssl.cipher.suites (Optional). A cipher suite is a named combination of authentication, encryption, MAC and key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network protocol. (Default is an empty list)</li>
+            <li>ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1 (list out the SSL protocols that you are going to accept from clients. Do note that SSL is deprecated in favor of TLS and using SSL in production is not recommended)</li>
+            <li>ssl.keystore.type=JKS</li>
+            <li>ssl.truststore.type=JKS</li>
         </ol>
         If you want to enable SSL for inter-broker communication, add the following to the broker properties file (it defaults to PLAINTEXT)
         <pre>
-        security.inter.broker.protocol = SSL</pre>
+        security.inter.broker.protocol=SSL</pre>
 
         <p>
         Due to import regulations in some countries, the Oracle implementation limits the strength of cryptographic algorithms available by default. If stronger algorithms are needed (for example, AES with 256-bit keys), the <a href="http://www.oracle.com/technetwork/java/javase/downloads/index.html">JCE Unlimited Strength Jurisdiction Policy Files</a> must be obtained and installed in the JDK/JRE. See the
@@ -155,22 +155,22 @@ Apache Kafka allows clients to connect over SSL. By default SSL is disabled but
         SSL is supported only for the new Kafka Producer and Consumer, the older API is not supported. The configs for SSL will be same for both producer and consumer.<br>
         If client authentication is not required in the broker, then the following is a minimal configuration example:
         <pre>
-        security.protocol = SSL
-        ssl.truststore.location = "/var/private/ssl/kafka.client.truststore.jks"
-        ssl.truststore.password = "test1234"</pre>
+        security.protocol=SSL
+        ssl.truststore.location=/var/private/ssl/kafka.client.truststore.jks
+        ssl.truststore.password=test1234</pre>
 
         If client authentication is required, then a keystore must be created like in step 1 and the following must also be configured:
         <pre>
-        ssl.keystore.location = "/var/private/ssl/kafka.client.keystore.jks"
-        ssl.keystore.password = "test1234"
-        ssl.key.password = "test1234"</pre>
+        ssl.keystore.location=/var/private/ssl/kafka.client.keystore.jks
+        ssl.keystore.password=test1234
+        ssl.key.password=test1234</pre>
         Other configuration settings that may also be needed depending on our requirements and the broker configuration:
             <ol>
                 <li>ssl.provider (Optional). The name of the security provider used for SSL connections. Default value is the default security provider of the JVM.</li>
                 <li>ssl.cipher.suites (Optional). A cipher suite is a named combination of authentication, encryption, MAC and key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network protocol.</li>
                 <li>ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1. It should list at least one of the protocols configured on the broker side</li>
-                <li>ssl.truststore.type = "JKS"</li>
-                <li>ssl.keystore.type = "JKS"</li>
+                <li>ssl.truststore.type=JKS</li>
+                <li>ssl.keystore.type=JKS</li>
             </ol>
 <br>
         Examples using console-producer and console-consumer:
@@ -231,7 +231,7 @@ Apache Kafka allows clients to connect over SSL. By default SSL is disabled but
 
         We must also configure the service name in server.properties, which should match the principal name of the kafka brokers. In the above example, principal is "kafka/kafka1.hostname.com@EXAMPLE.com", so:
         <pre>
-    sasl.kerberos.service.name="kafka"</pre>
+    sasl.kerberos.service.name=kafka</pre>
 
         <u>Important notes:</u>
         <ol>
@@ -270,7 +270,7 @@ Apache Kafka allows clients to connect over SSL. By default SSL is disabled but
             <li>Configure the following properties in producer.properties or consumer.properties:
                 <pre>
     security.protocol=SASL_PLAINTEXT (or SASL_SSL)
-    sasl.kerberos.service.name="kafka"</pre>
+    sasl.kerberos.service.name=kafka</pre>
             </li>
         </ol></li>
 </ol>
@@ -336,7 +336,7 @@ Kafka Authorization management CLI can be found under bin directory with all the
         <td>Resource</td>
     </tr>
     <tr>
-        <td>--consumer-group [group-name]</td>
+        <td>--group [group-name]</td>
         <td>Specifies the consumer-group as resource.</td>
         <td></td>
         <td>Resource</td>
@@ -355,13 +355,13 @@ Kafka Authorization management CLI can be found under bin directory with all the
     </tr>
     <tr>
         <td>--allow-host</td>
-        <td>Host from which principals listed in --allow-principal will have access.</td>
+        <td>IP address from which principals listed in --allow-principal will have access.</td>
         <td> if --allow-principal is specified defaults to * which translates to "all hosts"</td>
         <td>Host</td>
     </tr>
     <tr>
         <td>--deny-host</td>
-        <td>Host from which principals listed in --deny-principal will be denied access.</td>
+        <td>IP address from which principals listed in --deny-principal will be denied access.</td>
         <td>if --deny-principal is specified defaults to * which translates to "all hosts"</td>
         <td>Host</td>
     </tr>
@@ -390,25 +390,26 @@ Kafka Authorization management CLI can be found under bin directory with all the
 <h4><a id="security_authz_examples" href="#security_authz_examples">Examples</a></h4>
 <ul>
     <li><b>Adding Acls</b><br>
-Suppose you want to add an acl "Principals User:Bob and User:Alice are allowed to perform Operation Read and Write on Topic Test-Topic from Host1 and Host2". You can do that by executing the CLI with following options:
-        <pre>bin/kafka-acls.sh --authorizer kafka.security.auth.SimpleAclAuthorizer --authorizer-properties zookeeper.connect=localhost:2181 --add --allow-principal User:Bob --allow-principal User:Alice --allow-host Host1 --allow-host Host2 --operation Read --operation Write --topic Test-topic</pre>
-        By default all principals that don't have an explicit acl that allows access for an operation to a resource are denied. In rare cases where an allow acl is defined that allows access to all but some principal we will have to use the --deny-principal and --deny-host option. For example, if we want to allow all users to Read from Test-topic but only deny User:BadBob from host bad-host we can do so using following commands:
-        <pre>bin/kafka-acls.sh --authorizer kafka.security.auth.SimpleAclAuthorizer --authorizer-properties zookeeper.connect=localhost:2181 --add --allow-principal User:* --allow-host * --deny-principal User:BadBob --deny-host bad-host --operation Read --topic Test-topic</pre>
-        Above examples add acls to a topic by specifying --topic [topic-name] as the resource option. Similarly user can add acls to cluster by specifying --cluster and to a consumer group by specifying --consumer-group [group-name].</li>
+Suppose you want to add an acl "Principals User:Bob and User:Alice are allowed to perform Operation Read and Write on Topic Test-Topic from IP 198.51.100.0 and IP 198.51.100.1". You can do that by executing the CLI with following options:
+        <pre>bin/kafka-acls.sh --authorizer-properties zookeeper.connect=localhost:2181 --add --allow-principal User:Bob --allow-principal User:Alice --allow-host 198.51.100.0 --allow-host 198.51.100.1 --operation Read --operation Write --topic Test-topic</pre>
+        By default all principals that don't have an explicit acl that allows access for an operation to a resource are denied. In rare cases where an allow acl is defined that allows access to all but some principal we will have to use the --deny-principal and --deny-host option. For example, if we want to allow all users to Read from Test-topic but only deny User:BadBob from IP 198.51.100.3 we can do so using following commands:
+        <pre>bin/kafka-acls.sh --authorizer-properties zookeeper.connect=localhost:2181 --add --allow-principal User:* --allow-host * --deny-principal User:BadBob --deny-host 198.51.100.3 --operation Read --topic Test-topic</pre>
+        Note that ``--allow-host`` and ``deny-host`` only support IP addresses (hostnames are not supported).
+        Above examples add acls to a topic by specifying --topic [topic-name] as the resource option. Similarly user can add acls to cluster by specifying --cluster and to a consumer group by specifying --group [group-name].</li>
 
     <li><b>Removing Acls</b><br>
             Removing acls is pretty much the same. The only difference is instead of --add option users will have to specify --remove option. To remove the acls added by the first example above we can execute the CLI with following options:
-           <pre> bin/kafka-acls.sh --authorizer kafka.security.auth.SimpleAclAuthorizer --authorizer-properties zookeeper.connect=localhost:2181 --remove --allow-principal User:Bob --allow-principal User:Alice --allow-host Host1 --allow-host Host2 --operation Read --operation Write --topic Test-topic </pre></li>
+           <pre> bin/kafka-acls.sh --authorizer-properties zookeeper.connect=localhost:2181 --remove --allow-principal User:Bob --allow-principal User:Alice --allow-host 198.51.100.0 --allow-host 198.51.100.1 --operation Read --operation Write --topic Test-topic </pre></li>
 
     <li><b>List Acls</b><br>
             We can list acls for any resource by specifying the --list option with the resource. To list all acls for Test-topic we can execute the CLI with following options:
-            <pre>bin/kafka-acls.sh --authorizer kafka.security.auth.SimpleAclAuthorizer --authorizer-properties zookeeper.connect=localhost:2181 --list --topic Test-topic</pre></li>
+            <pre>bin/kafka-acls.sh --authorizer-properties zookeeper.connect=localhost:2181 --list --topic Test-topic</pre></li>
 
     <li><b>Adding or removing a principal as producer or consumer</b><br>
             The most common use case for acl management are adding/removing a principal as producer or consumer so we added convenience options to handle these cases. In order to add User:Bob as a producer of  Test-topic we can execute the following command:
-           <pre> bin/kafka-acls.sh --authorizer kafka.security.auth.SimpleAclAuthorizer --authorizer-properties zookeeper.connect=localhost:2181 --add --allow-principal User:Bob --producer --topic Test-topic</pre>
+           <pre> bin/kafka-acls.sh --authorizer-properties zookeeper.connect=localhost:2181 --add --allow-principal User:Bob --producer --topic Test-topic</pre>
             Similarly to add Alice as a consumer of Test-topic with consumer group Group-1 we just have to pass --consumer option:
-           <pre> bin/kafka-acls.sh --authorizer kafka.security.auth.SimpleAclAuthorizer --authorizer-properties zookeeper.connect=localhost:2181 --add --allow-principal User:Bob --consumer --topic test-topic --consumer-group Group-1 </pre>
+           <pre> bin/kafka-acls.sh --authorizer-properties zookeeper.connect=localhost:2181 --add --allow-principal User:Bob --consumer --topic test-topic --group Group-1 </pre>
             Note that for consumer option we must also specify the consumer group.
             In order to remove a principal from producer or consumer role we just need to pass --remove option. </li>
     </ul>

http://git-wip-us.apache.org/repos/asf/kafka-site/blob/d0ddbb47/090/upgrade.html
----------------------------------------------------------------------
diff --git a/090/upgrade.html b/090/upgrade.html
index 704ec4f..98ac570 100644
--- a/090/upgrade.html
+++ b/090/upgrade.html
@@ -19,7 +19,7 @@
 
 <h4><a id="upgrade_9" href="#upgrade_9">Upgrading from 0.8.0, 0.8.1.X or 0.8.2.X to 0.9.0.0</a></h4>
 
-0.9.0.0 has an inter-broker protocol change from previous versions. For a rolling upgrade:
+0.9.0.0 has <a href="#upgrade_9_breaking">potential breaking changes</a> (please review before upgrading) and an inter-broker protocol change from previous versions. For a rolling upgrade:
 <ol>
 	<li> Update server.properties file on all brokers and add the following property: inter.broker.protocol.version=0.8.2.X </li>
 	<li> Upgrade the brokers. This can be done a broker at a time by simply bringing it down, updating the code, and restarting it. </li>
@@ -27,15 +27,20 @@
 	<li> Restart the brokers one by one for the new protocol version to take effect </li>
 </ol>
 
-Note: If you are willing to accept downtime, you can simply take all the brokers down, update the code and start all of them. They will start with the new protocol by default.
+<p><b>Note:</b> If you are willing to accept downtime, you can simply take all the brokers down, update the code and start all of them. They will start with the new protocol by default.
 
-Note: Bumping the protocol version and restarting can be done any time after the brokers were upgraded. It does not have to be immediately after.
+<p><b>Note:</b> Bumping the protocol version and restarting can be done any time after the brokers were upgraded. It does not have to be immediately after.
 
 <h5><a id="upgrade_9_breaking" href="#upgrade_9_breaking">Potential breaking changes in 0.9.0.0</a></h5>
 
 <ul>
     <li> Java 1.6 is no longer supported. </li>
     <li> Scala 2.9 is no longer supported. </li>
+    <li> Broker IDs above 1000 are now reserved by default to automatically assigned broker IDs. If your cluster has existing broker IDs above that threshold make sure to increase the reserved.broker.max.id broker configuration property accordingly. </li>
+    <li> Configuration parameter replica.lag.max.messages was removed. Partition leaders will no longer consider the number of lagging messages when deciding which replicas are in sync. </li>
+    <li> Configuration parameter replica.lag.time.max.ms now refers not just to the time passed since last fetch request from replica, but also to time since the replica last caught up. Replicas that are still fetching messages from leaders but did not catch up to the latest messages in replica.lag.time.max.ms will be considered out of sync. </li>
+    <li> Configuration parameter log.cleaner.enable is now true by default. This means topics with a cleanup.policy=compact will now be compacted by default, and 128 MB of heap will be allocated to the cleaner process via log.cleaner.dedupe.buffer.size. You may want to review log.cleaner.dedupe.buffer.size and the other log.cleaner configuration values based on your usage of compacted topics. </li>
+    <li> MirrorMaker no longer supports multiple target clusters. As a result it will only accept a single --consumer.config parameter. To mirror multiple source clusters, you will need at least one MirrorMaker instance per source cluster, each with its own consumer configuration. </li>
     <li> Tools packaged under <em>org.apache.kafka.clients.tools.*</em> have been moved to <em>org.apache.kafka.tools.*</em>. All included scripts will still function as usual, only custom code directly importing these classes will be affected. </li>
     <li> The default Kafka JVM performance options (KAFKA_JVM_PERFORMANCE_OPTS) have been changed in kafka-run-class.sh. </li>
     <li> The kafka-topics.sh script (kafka.admin.TopicCommand) now exits with non-zero exit code on failure. </li>