You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kafka.apache.org by vv...@apache.org on 2019/12/07 05:18:12 UTC

[kafka] branch 2.0 updated: MINOR: clarify node grouping of input topics using pattern subscription (#7793)

This is an automated email from the ASF dual-hosted git repository.

vvcephei pushed a commit to branch 2.0
in repository https://gitbox.apache.org/repos/asf/kafka.git


The following commit(s) were added to refs/heads/2.0 by this push:
     new 6c8ad27  MINOR: clarify node grouping of input topics using pattern subscription (#7793)
6c8ad27 is described below

commit 6c8ad2706060cb374a55316131c56d5002c58bbc
Author: A. Sophie Blee-Goldman <so...@confluent.io>
AuthorDate: Fri Dec 6 14:03:42 2019 -0800

    MINOR: clarify node grouping of input topics using pattern subscription (#7793)
    
    Updates the HTML docs and the javadoc.
    
    Reviewers: John Roesler <vv...@apache.org>
---
 docs/streams/developer-guide/dsl-api.html                         | 2 +-
 .../src/main/java/org/apache/kafka/streams/StreamsBuilder.java    | 8 ++++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/docs/streams/developer-guide/dsl-api.html b/docs/streams/developer-guide/dsl-api.html
index 49af92a..700fd2a 100644
--- a/docs/streams/developer-guide/dsl-api.html
+++ b/docs/streams/developer-guide/dsl-api.html
@@ -256,7 +256,7 @@
                         <p>You <strong>must specify SerDes explicitly</strong> if the key or value types of the records in the Kafka input
                             topics do not match the configured default SerDes. For information about configuring default SerDes, available
                             SerDes, and implementing your own custom SerDes see <a class="reference internal" href="datatypes.html#streams-developer-guide-serdes"><span class="std std-ref">Data Types and Serialization</span></a>.</p>
-                        <p class="last">Several variants of <code class="docutils literal"><span class="pre">stream</span></code> exist, for example to specify a regex pattern for input topics to read from).</p>
+                        <p class="last">Several variants of <code class="docutils literal"><span class="pre">stream</span></code> exist. For example, you can specify a regex pattern for input topics to read from (note that all matching topics will be part of the same input topic group, and the work will not be parallelized for different topics if subscribed to in this way).</p>
                     </td>
                 </tr>
                 <tr class="row-odd"><td><p class="first"><strong>Table</strong></p>
diff --git a/streams/src/main/java/org/apache/kafka/streams/StreamsBuilder.java b/streams/src/main/java/org/apache/kafka/streams/StreamsBuilder.java
index ae6d44c..7ea298b 100644
--- a/streams/src/main/java/org/apache/kafka/streams/StreamsBuilder.java
+++ b/streams/src/main/java/org/apache/kafka/streams/StreamsBuilder.java
@@ -144,7 +144,9 @@ public class StreamsBuilder {
      * deserializers as specified in the {@link StreamsConfig config} are used.
      * <p>
      * If multiple topics are matched by the specified pattern, the created {@link KStream} will read data from all of
-     * them and there is no ordering guarantee between records from different topics.
+     * them and there is no ordering guarantee between records from different topics. This also means that the work
+     * will not be parallelized for multiple topics, and the number of tasks will scale with the maximum partition
+     * count of any matching topic rather than the total number of partitions across all topics.
      * <p>
      * Note that the specified input topics must be partitioned by key.
      * If this is not the case it is the user's responsibility to repartition the data before any key based operation
@@ -163,7 +165,9 @@ public class StreamsBuilder {
      * are defined by the options in {@link Consumed} are used.
      * <p>
      * If multiple topics are matched by the specified pattern, the created {@link KStream} will read data from all of
-     * them and there is no ordering guarantee between records from different topics.
+     * them and there is no ordering guarantee between records from different topics. This also means that the work
+     * will not be parallelized for multiple topics, and the number of tasks will scale with the maximum partition
+     * count of any matching topic rather than the total number of partitions across all topics.
      * <p>
      * Note that the specified input topics must be partitioned by key.
      * If this is not the case it is the user's responsibility to repartition the data before any key based operation