You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/08/06 18:21:00 UTC

[jira] [Commented] (KAFKA-8611) Add KStream#repartition operation

    [ https://issues.apache.org/jira/browse/KAFKA-8611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901351#comment-16901351 ] 

ASF GitHub Bot commented on KAFKA-8611:
---------------------------------------

lkokhreidze commented on pull request #7170: KAFKA-8611 / Add KStream#repartition operation
URL: https://github.com/apache/kafka/pull/7170
 
 
   # KIP-221: Enhance DSL with Connecting Topic Creation and Repartition Hint
   
   ## Description
   This is first PR for [KIP-221](https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+DSL+with+Connecting+Topic+Creation+and+Repartition+Hint). Goal of this PR is to introduce new `KStream#repartition` operator that can be used to for triggering repartitioning on `KStream` instance.
   
   ## Notable Changes
   - Introduced `org.apache.kafka.streams.processor.internals.InternalTopicProperties` class that can be used for capturing repartition topic configurations passed via DSL operations
   - Enhanced `OptimizableRepartitionNode` with `StreamPartitioner<K,V> partitioner` and `InternalTopicProperties internalTopicProperties` fields
   - Added `org.apache.kafka.streams.processor.internals.InternalTopologyBuilder#internalTopicNamesWithProperties` map for storing mapping between internal topics and their corresponding configuration. If configuration is present `RepartitionTopicConfig` is enriched with configurations passed via DSL operations (In this case via `org.apache.kafka.streams.kstream.Repartitioned` class).
   - Added `KStreamRepartitionIntegrationTest` for testing different scenarios of `KStream#repartition`
   - - Repartition topic shouldn't be created when key changing operation wasn't performed and `Repartitioned#numberOfPartitions` was not specified
   - - Repartition topic should be created when key changing operation was performed
   - - Repartition topic should be created when key changing operation wasn't performed, but `Repartitioned#numberOfPartitions` was specified
   - - Repartition topic name should be picked up from `Repartitioned#name` configuration
   - - Repartition topic name should be generated automatically if `Repartitioned#name` is not specified.
   - - `KStream#repartition(KeyValueMapper, Repartitioned)` works as expected with `KStream#groupByKey` (There should be only one repartition topic).
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Add KStream#repartition operation
> ---------------------------------
>
>                 Key: KAFKA-8611
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8611
>             Project: Kafka
>          Issue Type: New Feature
>          Components: streams
>            Reporter: Levani Kokhreidze
>            Assignee: Levani Kokhreidze
>            Priority: Minor
>              Labels: kip
>
> When using DSL in Kafka Streams, data re-partition happens only when key-changing operation is followed by stateful operation. On the other hand, in DSL, stateful computation can happen using _transform()_ operation as well. Problem with this approach is that, even if any upstream operation was key-changing before calling _transform()_, no auto-repartition is triggered. If repartitioning is required, a call to _through(String)_ should be performed before _transform()_. With the current implementation, burden of managing and creating the topic falls on user and introduces extra complexity of managing Kafka Streams application.
> KIP-221: [https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+DSL+with+Connecting+Topic+Creation+and+Repartition+Hint]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)