You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Levani Kokhreidze (Jira)" <ji...@apache.org> on 2019/11/16 11:59:00 UTC

[jira] [Created] (KAFKA-9197) Consider introducing numberOfPartitions configuration field to Grouped configuration class

Levani Kokhreidze created KAFKA-9197:
----------------------------------------

             Summary: Consider introducing numberOfPartitions configuration field to Grouped configuration class
                 Key: KAFKA-9197
                 URL: https://issues.apache.org/jira/browse/KAFKA-9197
             Project: Kafka
          Issue Type: New Feature
          Components: streams
            Reporter: Levani Kokhreidze


In the [KIP-221|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+DSL+with+Connecting+Topic+Creation+and+Repartition+Hint]] there was an idea of introducing number of partitions field to Grouped config class. During the discussion in the mailing list, couple of valid concerns were raised against this approach. 

Main argument against it was that, whenever user specifies number of partitions for internal, repartition topics, he/she really cares that those configurations will be applied. Case with group by is that, repartitioning will not happen, if key changing operation isn't performed. Therefore, number of partitions configuration specified by the user will never be applied. Alternatively, if user cares about manual repartitioning, one may do following in order to scale up/down sub topologies:

 
{code:java}
builder
  .stream("topic")
  .repartition((key, value) -> value.newKey(), Repartitioned.withNumberOfPartitions(5))       
  .groupByKey()       
  .count();
{code}
 

On the other hand, there were other valid arguments for adding numberOfPartitions field to Grouped config class. It was raised in the mailing list that, we should treat `numberOfPartitions` field as "desired" number of partitions specified by the user, so that _if repartitioning is required_, Kafka Streams must use value specified in there.

 

Idea of this ticket is to follow-up on this discussion and implement this feature if there's an actual need from the Kafka Streams users.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)