You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user-zh@flink.apache.org by lingchanhu <li...@163.com> on 2021/01/04 02:38:30 UTC

flink1.11 mysql cdc checkpoint 失败后程序自动恢复，同步数据出现重复

sourcr：mysql-cdc
sink：elasticsearch

问题描述：
从mysql中同步表数据至elasticsearch后，进行新增再删除的某条数据出现问题，导致sink失败（没加primary
key）。checkpoint失败，程序自动恢复重启后，checkpoint 成功，但是elasticsearch 中的数据是mysql
表中的两倍，出现重复同步情况。
程序的自动恢复不应该是从当前checkpoint 中记录的binlog 位置再同步么？为什么会再重头同步一次呢？
（ddl 中写死了server-id,
                "  'table-name' = '"+ table +"'," +
                "  'server-id' = '"+ serverId +"'" + ）


日志：






--
Sent from: http://apache-flink.147419.n8.nabble.com/

Re:flink1.11 mysql cdc checkpoint 失败后程序自动恢复，同步数据出现重复

Posted by smailxie <sm...@163.com>.






在程序自动重启恢复的时候，binlog可能被MySQL服务器删除了，导致debeziume connector读取了新的快照。
参考连接：https://debezium.io/documentation/reference/1.3/connectors/mysql.html#mysql-purges-binlog-files_debezium









--

Name：谢波
Mobile:13764228893






在 2021-01-04 10:38:30，"lingchanhu" <li...@163.com> 写道：
>sourcr：mysql-cdc
>sink：elasticsearch
>
>问题描述：
>从mysql中同步表数据至elasticsearch后，进行新增再删除的某条数据出现问题，导致sink失败（没加primary
>key）。checkpoint失败，程序自动恢复重启后，checkpoint 成功，但是elasticsearch 中的数据是mysql
>表中的两倍，出现重复同步情况。
>程序的自动恢复不应该是从当前checkpoint 中记录的binlog 位置再同步么？为什么会再重头同步一次呢？
>（ddl 中写死了server-id,
>                "  'table-name' = '"+ table +"'," +
>                "  'server-id' = '"+ serverId +"'" + ）
>
>
>日志：
>
>
>
>
>
>
>--
>Sent from: http://apache-flink.147419.n8.nabble.com/