You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Alexey Trenikhun <ye...@msn.com> on 2022/05/12 02:28:41 UTC

Source without persistent state

Hello,

I'm working on custom Source, something like heartbeat generator using new Source API, HeartSource is constructed with list of Kafka topics, SplitEnumerator for each topic queries number of partitions, and either creates a split per topic-partition or single split for all topic-partitions (undecided which is simpler). SourceReader compares current time with threshold (default 0) and if current time greater than threshold generates message for each topic-partition, updates threshold = current time + 2 minutes. On recovery from checkpoint/savepoint, threshold can be set to 0 again, and I want SplitEnumerator to re-discover partitions on restore, so looks like I don't need state persistence, I assume that SplitEnumerator#snapshotState should return "empty" object, but what about SourceReader#snapshotState, could it return empty List? Is SourceReader#snapshotState is used for checkpoint/savepoint or it is for some other purposes as well? The documentation says:
"
All the state of a SourceReader should be maintained inside the SourceSplits which are returned at the snapshotState() invocation. Doing this allows the SourceSplits to be reassigned to other SourceReaders when needed.
"
When SourceSplits may need to be reassigned to other SourceReaders ?

Thanks,
Alexey