You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "A. Sophie Blee-Goldman (Jira)" <ji...@apache.org> on 2022/12/05 23:30:00 UTC

[jira] [Created] (KAFKA-14444) Simplify user experience of customizing partitioning strategy in Streams

A. Sophie Blee-Goldman created KAFKA-14444:
----------------------------------------------

             Summary: Simplify user experience of customizing partitioning strategy in Streams
                 Key: KAFKA-14444
                 URL: https://issues.apache.org/jira/browse/KAFKA-14444
             Project: Kafka
          Issue Type: New Feature
          Components: streams
            Reporter: A. Sophie Blee-Goldman


The current process of plugging a custom partitioning scheme across a Streams application is fairly intensive and extremely error prone. While defining their topology users must pay close attention to where an operator/node may be connected to or creating a topic that will be produced to, or else print out their topology description and try to locate all sink nodes in this way. If they miss passing in their custom partitioner to one or more such locations in the topology, everything downstream will be affected by the inconsistent/unintended partitioning scheme.

It can also be easy for users to miss this process entirely and try to customize the partitioning scheme via the producer config. This does not work, and unfortunately results in a runtime exception that's difficult for users to interpret. Ideally we would provide a similar config for Streams where users could define a default implementation of the StreamPartitioner interface.

...unfortunately, this is not so straightforward. Unlike the case of the Producer config, where there is a clearly defined key and value type, there's no guarantee each sink node requiring the custom partitioner deals with the same key/value type as the others.

We could utilize the default.key/value configs for this, and only require users to plug in their partitioner where the key/value types differ from the default, but this would likely limit the usefulness of a default partitioner significantly. We could push this to the user to write a generic implementation class with type checking and handling, but this would be pretty awkward and error prone as well.

Either way this will take some thought, which is why the idea was pulled from the proposal in KIP-878 and left for a follow-up KIP



--
This message was sent by Atlassian Jira
(v8.20.10#820010)