You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by "Matthias J. Sax (JIRA)" <ji...@apache.org> on 2017/02/01 01:16:51 UTC

[jira] [Created] (KAFKA-4718) Revisit DSL partitioning assumption for KStream source topics

Matthias J. Sax created KAFKA-4718:
--------------------------------------

             Summary: Revisit DSL partitioning assumption for KStream source topics
                 Key: KAFKA-4718
                 URL: https://issues.apache.org/jira/browse/KAFKA-4718
             Project: Kafka
          Issue Type: Improvement
          Components: streams
            Reporter: Matthias J. Sax
            Priority: Minor


Currently, when reading one or multiple topics via a single call to {{KStreamBuilder#stream()}}, it is assumed that the data is correctly partitions by key.

For "single topic" {{KStream}}, this is a fair assumption, however, for multi-topic {{KStream}}, the assumption is most likely not true if input topics have a different number of partitions, because producer use hash partitioning per default. Thus, to get correct partitions, all producer for those input topics need to use (the same or at least a compatible) custom partitioner.

Making this the default assumption seem rather risky, and we might want to revisit this. Or at least update some docs with corresponding hints.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)