You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Yuan Mei (Jira)" <ji...@apache.org> on 2020/05/16 00:28:00 UTC
[jira] [Commented] (FLINK-15670) Provide a Kafka Source/Sink pair
that aligns Kafka's Partitions and Flink's KeyGroups
[ https://issues.apache.org/jira/browse/FLINK-15670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108783#comment-17108783 ]
Yuan Mei commented on FLINK-15670:
----------------------------------
Things to follow up and discuss (listed here in case forgotten):
# Address cases when # of partitions != # of consumer tasks
# Batch emitting Kafka fetcher records (similar to [FLINK-17307] Add collector to deserialize in KafkaDeserializationSchema)
# Whether to separate sink (producer) and source (consumer) to different jobs.
** Although they are recovered independently according to regional failover, however, they share the same checkpoint coordinator, and correspondingly share the same global checkpoint snapshot
** That says if the consumer fails, the producer can not commit to writing the data because of two-phase commit set-up (it needs a checkpoint-complete signal to complete the second stage)
> Provide a Kafka Source/Sink pair that aligns Kafka's Partitions and Flink's KeyGroups
> -------------------------------------------------------------------------------------
>
> Key: FLINK-15670
> URL: https://issues.apache.org/jira/browse/FLINK-15670
> Project: Flink
> Issue Type: New Feature
> Components: API / DataStream, Connectors / Kafka
> Reporter: Stephan Ewen
> Assignee: Yuan Mei
> Priority: Major
> Labels: pull-request-available, usability
> Fix For: 1.11.0
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> This Source/Sink pair would serve two purposes:
> 1. You can read topics that are already partitioned by key and process them without partitioning them again (avoid shuffles)
> 2. You can use this to shuffle through Kafka, thereby decomposing the job into smaller jobs and independent pipelined regions that fail over independently.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)