You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Yordan Pavlov (Jira)" <ji...@apache.org> on 2022/08/04 08:25:00 UTC
[jira] [Created] (FLINK-28802) EOFException when recovering from a checkpoint
Yordan Pavlov created FLINK-28802:
-------------------------------------
Summary: EOFException when recovering from a checkpoint
Key: FLINK-28802
URL: https://issues.apache.org/jira/browse/FLINK-28802
Project: Flink
Issue Type: Bug
Reporter: Yordan Pavlov
Flink version: 1.14.2
When recovering from a Savepoint, the TaskManager would throw an EOFException at this point:
[https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducer.java#L1681]
It looks like the transactional id is missing on deserialization. As an interesting observation, before trying to recover from the mentioned checkpoint the job failed with:
{code:java}
Aug 2, 2022 @ 16:50:40.107 Caused by: org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker received an out of order sequence number.
Aug 2, 2022 @ 16:50:40.106 2022-08-02 13:50:40.106 [flink-akka.actor.default-dispatcher-5] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Time correct sink UTXO Balances -> Sink Save to Kafka UTXO Balances bch-balances-v4 (1/1) (d2847bd31137561e3ba075a7cf70ee7c) switched from RUNNING to FAILED on 10.42.238.71:37353-fc3e48 @ bch-balances-v4-flink-taskmanager-0.bch-balances-v4-flink-taskmanager.flink.svc.cluster.local (dataPort=42553).
Aug 2, 2022 @ 16:50:40.106 org.apache.flink.util.FlinkRuntimeException: Failed to send data to Kafka bch-balances-v4-0@-1 with FlinkKafkaInternalProducer{transactionalId='bch-balances-v4-0-690507', inTransaction=true, closed=false} {code}
Could it be that Flink actually failed to construct the checkpoint but still marked it as completed. What would be the way to check this? Is there something like a checksum byte that is checked on recovery?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)