You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Benchao Li (Jira)" <ji...@apache.org> on 2022/07/02 02:47:00 UTC

[jira] [Commented] (FLINK-28303) Kafka SQL Connector loses data when restoring from a savepoint with a topic with empty partitions

    [ https://issues.apache.org/jira/browse/FLINK-28303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17561652#comment-17561652 ] 

Benchao Li commented on FLINK-28303:
------------------------------------

Besides this case, I would like to mention that when the Kafka Cluster is unhealthy, e.g. some partitions are in under replica status, the problem also arises.
In our internal use cases, we suffers this a lot, there will be two cases:
1. if we do not enable cp/sp, when Flink job starts with 'group-offsets', we will omit the 'under replica' partition. when the partition recovers, we'll treat it as a new added partition, and we'll consume from earliest. This will lead to repeated consuming.
2. if we enabled cp/sp, we'll use the offsets stored in state, however, it's not consumable for now. When the partition recovers, we'll again add it as a new partition, and consume this partition twice. This will also lead to repeated consuming.

PS: We are using 1.11 now, I didn't checked the master's code whether this still exist.

> Kafka SQL Connector loses data when restoring from a savepoint with a topic with empty partitions
> -------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-28303
>                 URL: https://issues.apache.org/jira/browse/FLINK-28303
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Kafka
>    Affects Versions: 1.14.4
>            Reporter: Robert Metzger
>            Priority: Major
>
> Steps to reproduce:
> - Set up a Kafka topic with 10 partitions
> - produce records 0-9 into the topic
> - take a savepoint and stop the job
> - produce records 10-19 into the topic
> - restore the job from the savepoint.
> The job will be missing usually 2-4 records from 10-19.
> My assumption is that if a partition never had data (which is likely with 10 partitions and 10 records), the savepoint will only contain offsets for partitions with data. 
> While the job was offline (and we've written record 10-19 into the topic), all partitions got filled. Now, when Kafka comes online again, it will use the "latest" offset for those partitions, skipping some data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)