You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Ryan van Huuksloot <ry...@shopify.com.INVALID> on 2022/09/02 15:57:27 UTC

RE: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

Hi Mason,

First off, thanks for putting this FLIP together! Sorry for the delay. Full
disclosure Mason and I chatted a little bit at Flink Forward 2022 but I
have tried to capture the questions I had for him then.

I'll start the conversation with a few questions:

1. The concept of streamIds is not clear to me in the proposal and could
use some more information. If I understand correctly, they will be used in
the MetadataService to link KafkaClusters to ones you want to use? If you
assign stream ids using `setStreamIds`, how can you dynamically increase
the number of clusters you consume if the list of StreamIds is static? I am
basing this off of your example .setStreamIds(List.of("my-stream-1",
"my-stream-2")) so I could be off base with my assumption. If you don't
mind clearing up the intention, that would be great!

2. How would offsets work if you wanted to use this MultiClusterKafkaSource
with a file based backfill? In the case I am thinking of, you have a bucket
backed archive of Kafka data per cluster. and you want to pick up from the
last offset in the archived system, how would you set OffsetInitializers
"per cluster" potentially as a function or are you limited to setting an
OffsetInitializer for the entire Source?

3. Just to make sure - because this system will layer on top of Flink-27
and use KafkaSource for some aspects under the hood, the watermark
alignment that was introduced in FLIP-182 / Flink 1.15 would be possible
across multiple clusters if you assign them to the same alignment group?

Thanks!
Ryan


On 2022/07/20 17:35:29 Mason Chen wrote:
> Hi all,
>
> We would like to start a discussion thread on FLIP-246: Multi Cluster
Kafka
> Source [1] where we propose to provide a source connector for dynamically
> reading from Kafka multiple clusters, which will not require Flink job
> restart. This can greatly improve the Kafka migration experience for
> clusters and topics, and it solves some existing problems with the current
> KafkaSource. There was some interest from users [2] from a meetup and the
> mailing list. Looking forward to comments and feedback, thanks!
>
> [1]
>
https://cwiki.apache.org/confluence/display/FLINK/FLIP-246%3A+Multi+Cluster+Kafka+Source
> [2] https://lists.apache.org/thread/zmpnzx6jjsqc0oldvdm5y2n674xzc3jc
>
> Best,
> Mason
>