You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Lee Dongjin (JIRA)" <ji...@apache.org> on 2019/02/05 02:56:00 UTC
[jira] [Assigned] (KAFKA-7293) Merge followed by groupByKey/join
might violate co-partioning
[ https://issues.apache.org/jira/browse/KAFKA-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lee Dongjin reassigned KAFKA-7293:
----------------------------------
Assignee: Lee Dongjin
> Merge followed by groupByKey/join might violate co-partioning
> -------------------------------------------------------------
>
> Key: KAFKA-7293
> URL: https://issues.apache.org/jira/browse/KAFKA-7293
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Reporter: Matthias J. Sax
> Assignee: Lee Dongjin
> Priority: Major
>
> The merge() operations can be applied to input KStreams that have a different number of tasks (ie, input topic partitions). For this case, the input topics are not co-partitioned and thus the result KStream is not partitioned even if each input KStream is partitioned by its own.
> Because, no "repartitionRequired" flag is set on the input KStreams, the flag is also not set on the output KStream. Hence, if a groupByKey() or join() operation is applied the output KStream, we don't insert a repartition topic. However, repartitioning would be required because the KStream is not partitioned.
> We cannot detect this during compile time, because the number or partitions is unknown, and thus, we cannot decide if repartitioning is required or not. However, we can add a runtime check similar to joins() that checks if data is correctly (co-)partitioned and if not, we can raise a runtime exception.
> Note, for merge() in contrast to join(), we should only check for co-partitioning, if the merge() is followed by a groupByKey() or join() operations.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)