You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Yuan Mei (Jira)" <ji...@apache.org> on 2020/05/16 00:31:00 UTC
[jira] [Comment Edited] (FLINK-15670) Provide a Kafka Source/Sink
pair that aligns Kafka's Partitions and Flink's KeyGroups
[ https://issues.apache.org/jira/browse/FLINK-15670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108783#comment-17108783 ]
Yuan Mei edited comment on FLINK-15670 at 5/16/20, 12:30 AM:
-------------------------------------------------------------
Things to follow up and discuss (listed here in case my forgetting about them):
# Address cases when # of partitions != # of consumer tasks
# Batch emitting Kafka fetcher records (similar to FLINK-17307 Add collector to deserialize in KafkaDeserializationSchema)
# Whether to separate sink (producer) and source (consumer) to different jobs.
** Although they are recovered independently according to regional failover, however, they share the same checkpoint coordinator, and correspondingly share the same global checkpoint snapshot
** That says if the consumer fails, the producer can not commit to writing the data because of two-phase commit set-up (it needs a checkpoint-complete signal to complete the second stage)
was (Author: ym):
Things to follow up and discuss (listed here in case forgotten):
# Address cases when # of partitions != # of consumer tasks
# Batch emitting Kafka fetcher records (similar to [FLINK-17307] Add collector to deserialize in KafkaDeserializationSchema)
# Whether to separate sink (producer) and source (consumer) to different jobs.
** Although they are recovered independently according to regional failover, however, they share the same checkpoint coordinator, and correspondingly share the same global checkpoint snapshot
** That says if the consumer fails, the producer can not commit to writing the data because of two-phase commit set-up (it needs a checkpoint-complete signal to complete the second stage)
> Provide a Kafka Source/Sink pair that aligns Kafka's Partitions and Flink's KeyGroups
> -------------------------------------------------------------------------------------
>
> Key: FLINK-15670
> URL: https://issues.apache.org/jira/browse/FLINK-15670
> Project: Flink
> Issue Type: New Feature
> Components: API / DataStream, Connectors / Kafka
> Reporter: Stephan Ewen
> Assignee: Yuan Mei
> Priority: Major
> Labels: pull-request-available, usability
> Fix For: 1.11.0
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> This Source/Sink pair would serve two purposes:
> 1. You can read topics that are already partitioned by key and process them without partitioning them again (avoid shuffles)
> 2. You can use this to shuffle through Kafka, thereby decomposing the job into smaller jobs and independent pipelined regions that fail over independently.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)