You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Yuan Mei (Jira)" <ji...@apache.org> on 2020/05/16 00:31:00 UTC

[jira] [Comment Edited] (FLINK-15670) Provide a Kafka Source/Sink pair that aligns Kafka's Partitions and Flink's KeyGroups

    [ https://issues.apache.org/jira/browse/FLINK-15670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108783#comment-17108783 ] 

Yuan Mei edited comment on FLINK-15670 at 5/16/20, 12:30 AM:
-------------------------------------------------------------

Things to follow up and discuss (listed here in case my forgetting about them):

 
 # Address cases when # of partitions != # of consumer tasks
 # Batch emitting Kafka fetcher records (similar to FLINK-17307 Add collector to deserialize in KafkaDeserializationSchema)
 # Whether to separate sink (producer) and source (consumer) to different jobs. 
 ** Although they are recovered independently according to regional failover, however, they share the same checkpoint coordinator, and correspondingly share the same global checkpoint snapshot
 ** That says if the consumer fails, the producer can not commit to writing the data because of two-phase commit set-up (it needs a checkpoint-complete signal to complete the second stage)

 

 


was (Author: ym):
Things to follow up and discuss (listed here in case forgotten):

 
 # Address cases when # of partitions != # of consumer tasks
 # Batch emitting Kafka fetcher records (similar to [FLINK-17307] Add collector to deserialize in KafkaDeserializationSchema)
 # Whether to separate sink (producer) and source (consumer) to different jobs. 
 ** Although they are recovered independently according to regional failover, however, they share the same checkpoint coordinator, and correspondingly share the same global checkpoint snapshot
 ** That says if the consumer fails, the producer can not commit to writing the data because of two-phase commit set-up (it needs a checkpoint-complete signal to complete the second stage)

 

 

> Provide a Kafka Source/Sink pair that aligns Kafka's Partitions and Flink's KeyGroups
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-15670
>                 URL: https://issues.apache.org/jira/browse/FLINK-15670
>             Project: Flink
>          Issue Type: New Feature
>          Components: API / DataStream, Connectors / Kafka
>            Reporter: Stephan Ewen
>            Assignee: Yuan Mei
>            Priority: Major
>              Labels: pull-request-available, usability
>             Fix For: 1.11.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> This Source/Sink pair would serve two purposes:
> 1. You can read topics that are already partitioned by key and process them without partitioning them again (avoid shuffles)
> 2. You can use this to shuffle through Kafka, thereby decomposing the job into smaller jobs and independent pipelined regions that fail over independently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)