You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user-zh@flink.apache.org by "zhangzq@eastcom-sw.com" <zh...@eastcom-sw.com> on 2023/05/04 06:54:40 UTC
checkpoint Kafka Offset commit failed
hi,请问在flink(1.14、1.16) checkpoint(10s)提交 kafka偏移量提示 The coordinator is not available
查看kafka集群日志都是正常的,手动也可以正确提交偏移量,重启flink job后也可以正常提交,运行一段时间后又会失败,请问有参数可以优化一下吗?
flink 日志如下:
2023-05-04 11:31:02,636 WARN org.apache.flink.connector.kafka.source.reader.KafkaSourceReader [] - Failed to commit consumer offsets for checkpoint 69153
org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset commit failed with a retriable exception. You should retry committing the latest consumed offsets.
Caused by: org.apache.kafka.common.errors.CoordinatorNotAvailableException: The coordinator is not available.
Re: Re: checkpoint Kafka Offset commit failed
Posted by "zhangzq@eastcom-sw.com" <zh...@eastcom-sw.com>.
hi, 查看过kafka broker是没重启过,一直运行的,这边尝试升级下kafka版本看看
当前版本 kafka-clients-2.6.2、 kafkaSever 2.11-2.1.1
From: Matt Wang
Date: 2023-05-06 21:13
To: user-zh@flink.apache.org
Subject: Re: checkpoint Kafka Offset commit failed
hi,这个报错看着是一个可以重试的异常,不过 Flink 里并没有对这个异常支持相应的重试逻辑 [1]/[2],只是打印了异常及记录相应的 metrics,你的作业已经开启了 cp,这个 WARN 日志实际上没有影响,社区之前也有过关于这个问题讨论[3]/[4],如果这个错误是因为 kafka broker 重启导致的,可以尝试参考 [4] 升级 kafka 版本试一下。
1. https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/reader/KafkaPartitionSplitReader.java#L249
2. https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/reader/KafkaSourceReader.java#L149
3. https://issues.apache.org/jira/browse/FLINK-25293
4. https://issues.apache.org/jira/browse/FLINK-28060
--
Best,
Matt Wang
---- Replied Message ----
| From | zhangzq@eastcom-sw.com<zh...@eastcom-sw.com> |
| Date | 05/6/2023 09:19 |
| To | user-zh<us...@flink.apache.org> |
| Subject | Re: Re: checkpoint Kafka Offset commit failed |
hi, 感谢解答~
flink 集群跟kafka集群都在同个网段,检查过网络情况是正常的
在flink1.14中,隔几天出现一次 Time should be non negative 异常,自动重启任务后 也是可以正常自动提交偏移量
java.lang.IllegalArgumentException: Time should be non negative
at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:138)
at org.apache.flink.runtime.throughput.ThroughputEMA.calculateThroughput(ThroughputEMA.java:44)
at org.apache.flink.runtime.throughput.ThroughputCalculator.calculateThroughput(ThroughputCalculator.java:80)
at org.apache.flink.streaming.runtime.tasks.StreamTask.debloat(StreamTask.java:792)
at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$null$4(StreamTask.java:784)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:809)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:761)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
at java.lang.Thread.run(Thread.java:748)
From: Shammon FY
Date: 2023-05-05 09:48
To: user-zh
Subject: Re: checkpoint Kafka Offset commit failed
Hi
看起来像是网络问题导致flink作业source节点连接kafka失败,可以检查一下kafka集群的网络或者flink作业source节点的网络是否有问题
Best,
Shammon FY
On Fri, May 5, 2023 at 9:41 AM Leonard Xu <xb...@gmail.com> wrote:
可以发送任意内容的邮件到 user-zh-unsubscribe@flink.apache.org 取消订阅来自
user-zh@flink.apache.org 邮件列表的邮件,邮件列表的订阅管理,可以参考[1]
祝好,
Leonard
[1]
https://flink.apache.org/zh/community/#%e9%82%ae%e4%bb%b6%e5%88%97%e8%a1%a8
2023年5月4日 下午9:00,wuzhongxiu <go...@163.com> 写道:
退订
| |
go574161@163.com
|
|
邮箱:go574161@163.com
|
---- 回复的原邮件 ----
| 发件人 | zhangzq@eastcom-sw.com |
| 日期 | 2023年05月04日 14:54 |
| 收件人 | user-zh<us...@flink.apache.org> |
| 抄送至 | |
| 主题 | checkpoint Kafka Offset commit failed |
hi,请问在flink(1.14、1.16) checkpoint(10s)提交 kafka偏移量提示 The coordinator is
not available
查看kafka集群日志都是正常的,手动也可以正确提交偏移量,重启flink
job后也可以正常提交,运行一段时间后又会失败,请问有参数可以优化一下吗?
flink 日志如下:
2023-05-04 11:31:02,636 WARN
org.apache.flink.connector.kafka.source.reader.KafkaSourceReader [] -
Failed to commit consumer offsets for checkpoint 69153
org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset
commit failed with a retriable exception. You should retry committing the
latest consumed offsets.
Caused by:
org.apache.kafka.common.errors.CoordinatorNotAvailableException: The
coordinator is not available.
Re: checkpoint Kafka Offset commit failed
Posted by Matt Wang <wa...@163.com>.
hi,这个报错看着是一个可以重试的异常,不过 Flink 里并没有对这个异常支持相应的重试逻辑 [1]/[2],只是打印了异常及记录相应的 metrics,你的作业已经开启了 cp,这个 WARN 日志实际上没有影响,社区之前也有过关于这个问题讨论[3]/[4],如果这个错误是因为 kafka broker 重启导致的,可以尝试参考 [4] 升级 kafka 版本试一下。
1. https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/reader/KafkaPartitionSplitReader.java#L249
2. https://github.com/apache/flink-connector-kafka/blob/main/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/reader/KafkaSourceReader.java#L149
3. https://issues.apache.org/jira/browse/FLINK-25293
4. https://issues.apache.org/jira/browse/FLINK-28060
--
Best,
Matt Wang
---- Replied Message ----
| From | zhangzq@eastcom-sw.com<zh...@eastcom-sw.com> |
| Date | 05/6/2023 09:19 |
| To | user-zh<us...@flink.apache.org> |
| Subject | Re: Re: checkpoint Kafka Offset commit failed |
hi, 感谢解答~
flink 集群跟kafka集群都在同个网段,检查过网络情况是正常的
在flink1.14中,隔几天出现一次 Time should be non negative 异常,自动重启任务后 也是可以正常自动提交偏移量
java.lang.IllegalArgumentException: Time should be non negative
at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:138)
at org.apache.flink.runtime.throughput.ThroughputEMA.calculateThroughput(ThroughputEMA.java:44)
at org.apache.flink.runtime.throughput.ThroughputCalculator.calculateThroughput(ThroughputCalculator.java:80)
at org.apache.flink.streaming.runtime.tasks.StreamTask.debloat(StreamTask.java:792)
at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$null$4(StreamTask.java:784)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:809)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:761)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
at java.lang.Thread.run(Thread.java:748)
From: Shammon FY
Date: 2023-05-05 09:48
To: user-zh
Subject: Re: checkpoint Kafka Offset commit failed
Hi
看起来像是网络问题导致flink作业source节点连接kafka失败,可以检查一下kafka集群的网络或者flink作业source节点的网络是否有问题
Best,
Shammon FY
On Fri, May 5, 2023 at 9:41 AM Leonard Xu <xb...@gmail.com> wrote:
可以发送任意内容的邮件到 user-zh-unsubscribe@flink.apache.org 取消订阅来自
user-zh@flink.apache.org 邮件列表的邮件,邮件列表的订阅管理,可以参考[1]
祝好,
Leonard
[1]
https://flink.apache.org/zh/community/#%e9%82%ae%e4%bb%b6%e5%88%97%e8%a1%a8
2023年5月4日 下午9:00,wuzhongxiu <go...@163.com> 写道:
退订
| |
go574161@163.com
|
|
邮箱:go574161@163.com
|
---- 回复的原邮件 ----
| 发件人 | zhangzq@eastcom-sw.com |
| 日期 | 2023年05月04日 14:54 |
| 收件人 | user-zh<us...@flink.apache.org> |
| 抄送至 | |
| 主题 | checkpoint Kafka Offset commit failed |
hi,请问在flink(1.14、1.16) checkpoint(10s)提交 kafka偏移量提示 The coordinator is
not available
查看kafka集群日志都是正常的,手动也可以正确提交偏移量,重启flink
job后也可以正常提交,运行一段时间后又会失败,请问有参数可以优化一下吗?
flink 日志如下:
2023-05-04 11:31:02,636 WARN
org.apache.flink.connector.kafka.source.reader.KafkaSourceReader [] -
Failed to commit consumer offsets for checkpoint 69153
org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset
commit failed with a retriable exception. You should retry committing the
latest consumed offsets.
Caused by:
org.apache.kafka.common.errors.CoordinatorNotAvailableException: The
coordinator is not available.
Re: Re: checkpoint Kafka Offset commit failed
Posted by "zhangzq@eastcom-sw.com" <zh...@eastcom-sw.com>.
hi, 感谢解答~
flink 集群跟kafka集群都在同个网段,检查过网络情况是正常的
在flink1.14中,隔几天出现一次 Time should be non negative 异常,自动重启任务后 也是可以正常自动提交偏移量
java.lang.IllegalArgumentException: Time should be non negative
at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:138)
at org.apache.flink.runtime.throughput.ThroughputEMA.calculateThroughput(ThroughputEMA.java:44)
at org.apache.flink.runtime.throughput.ThroughputCalculator.calculateThroughput(ThroughputCalculator.java:80)
at org.apache.flink.streaming.runtime.tasks.StreamTask.debloat(StreamTask.java:792)
at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$null$4(StreamTask.java:784)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:338)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:324)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:809)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:761)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
at java.lang.Thread.run(Thread.java:748)
From: Shammon FY
Date: 2023-05-05 09:48
To: user-zh
Subject: Re: checkpoint Kafka Offset commit failed
Hi
看起来像是网络问题导致flink作业source节点连接kafka失败,可以检查一下kafka集群的网络或者flink作业source节点的网络是否有问题
Best,
Shammon FY
On Fri, May 5, 2023 at 9:41 AM Leonard Xu <xb...@gmail.com> wrote:
> 可以发送任意内容的邮件到 user-zh-unsubscribe@flink.apache.org 取消订阅来自
> user-zh@flink.apache.org 邮件列表的邮件,邮件列表的订阅管理,可以参考[1]
>
> 祝好,
> Leonard
> [1]
> https://flink.apache.org/zh/community/#%e9%82%ae%e4%bb%b6%e5%88%97%e8%a1%a8
>
> > 2023年5月4日 下午9:00,wuzhongxiu <go...@163.com> 写道:
> >
> > 退订
> >
> >
> >
> > | |
> > go574161@163.com
> > |
> > |
> > 邮箱:go574161@163.com
> > |
> >
> >
> >
> >
> > ---- 回复的原邮件 ----
> > | 发件人 | zhangzq@eastcom-sw.com |
> > | 日期 | 2023年05月04日 14:54 |
> > | 收件人 | user-zh<us...@flink.apache.org> |
> > | 抄送至 | |
> > | 主题 | checkpoint Kafka Offset commit failed |
> > hi,请问在flink(1.14、1.16) checkpoint(10s)提交 kafka偏移量提示 The coordinator is
> not available
> >
> > 查看kafka集群日志都是正常的,手动也可以正确提交偏移量,重启flink
> job后也可以正常提交,运行一段时间后又会失败,请问有参数可以优化一下吗?
> >
> > flink 日志如下:
> > 2023-05-04 11:31:02,636 WARN
> org.apache.flink.connector.kafka.source.reader.KafkaSourceReader [] -
> Failed to commit consumer offsets for checkpoint 69153
> > org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset
> commit failed with a retriable exception. You should retry committing the
> latest consumed offsets.
> > Caused by:
> org.apache.kafka.common.errors.CoordinatorNotAvailableException: The
> coordinator is not available.
>
>
Re: checkpoint Kafka Offset commit failed
Posted by Shammon FY <zj...@gmail.com>.
Hi
看起来像是网络问题导致flink作业source节点连接kafka失败,可以检查一下kafka集群的网络或者flink作业source节点的网络是否有问题
Best,
Shammon FY
On Fri, May 5, 2023 at 9:41 AM Leonard Xu <xb...@gmail.com> wrote:
> 可以发送任意内容的邮件到 user-zh-unsubscribe@flink.apache.org 取消订阅来自
> user-zh@flink.apache.org 邮件列表的邮件,邮件列表的订阅管理,可以参考[1]
>
> 祝好,
> Leonard
> [1]
> https://flink.apache.org/zh/community/#%e9%82%ae%e4%bb%b6%e5%88%97%e8%a1%a8
>
> > 2023年5月4日 下午9:00,wuzhongxiu <go...@163.com> 写道:
> >
> > 退订
> >
> >
> >
> > | |
> > go574161@163.com
> > |
> > |
> > 邮箱:go574161@163.com
> > |
> >
> >
> >
> >
> > ---- 回复的原邮件 ----
> > | 发件人 | zhangzq@eastcom-sw.com |
> > | 日期 | 2023年05月04日 14:54 |
> > | 收件人 | user-zh<us...@flink.apache.org> |
> > | 抄送至 | |
> > | 主题 | checkpoint Kafka Offset commit failed |
> > hi,请问在flink(1.14、1.16) checkpoint(10s)提交 kafka偏移量提示 The coordinator is
> not available
> >
> > 查看kafka集群日志都是正常的,手动也可以正确提交偏移量,重启flink
> job后也可以正常提交,运行一段时间后又会失败,请问有参数可以优化一下吗?
> >
> > flink 日志如下:
> > 2023-05-04 11:31:02,636 WARN
> org.apache.flink.connector.kafka.source.reader.KafkaSourceReader [] -
> Failed to commit consumer offsets for checkpoint 69153
> > org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset
> commit failed with a retriable exception. You should retry committing the
> latest consumed offsets.
> > Caused by:
> org.apache.kafka.common.errors.CoordinatorNotAvailableException: The
> coordinator is not available.
>
>
Re: checkpoint Kafka Offset commit failed
Posted by Leonard Xu <xb...@gmail.com>.
可以发送任意内容的邮件到 user-zh-unsubscribe@flink.apache.org 取消订阅来自 user-zh@flink.apache.org 邮件列表的邮件,邮件列表的订阅管理,可以参考[1]
祝好,
Leonard
[1] https://flink.apache.org/zh/community/#%e9%82%ae%e4%bb%b6%e5%88%97%e8%a1%a8
> 2023年5月4日 下午9:00,wuzhongxiu <go...@163.com> 写道:
>
> 退订
>
>
>
> | |
> go574161@163.com
> |
> |
> 邮箱:go574161@163.com
> |
>
>
>
>
> ---- 回复的原邮件 ----
> | 发件人 | zhangzq@eastcom-sw.com |
> | 日期 | 2023年05月04日 14:54 |
> | 收件人 | user-zh<us...@flink.apache.org> |
> | 抄送至 | |
> | 主题 | checkpoint Kafka Offset commit failed |
> hi,请问在flink(1.14、1.16) checkpoint(10s)提交 kafka偏移量提示 The coordinator is not available
>
> 查看kafka集群日志都是正常的,手动也可以正确提交偏移量,重启flink job后也可以正常提交,运行一段时间后又会失败,请问有参数可以优化一下吗?
>
> flink 日志如下:
> 2023-05-04 11:31:02,636 WARN org.apache.flink.connector.kafka.source.reader.KafkaSourceReader [] - Failed to commit consumer offsets for checkpoint 69153
> org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset commit failed with a retriable exception. You should retry committing the latest consumed offsets.
> Caused by: org.apache.kafka.common.errors.CoordinatorNotAvailableException: The coordinator is not available.
回复:checkpoint Kafka Offset commit failed
Posted by wuzhongxiu <go...@163.com>.
退订
| |
go574161@163.com
|
|
邮箱:go574161@163.com
|
---- 回复的原邮件 ----
| 发件人 | zhangzq@eastcom-sw.com |
| 日期 | 2023年05月04日 14:54 |
| 收件人 | user-zh<us...@flink.apache.org> |
| 抄送至 | |
| 主题 | checkpoint Kafka Offset commit failed |
hi,请问在flink(1.14、1.16) checkpoint(10s)提交 kafka偏移量提示 The coordinator is not available
查看kafka集群日志都是正常的,手动也可以正确提交偏移量,重启flink job后也可以正常提交,运行一段时间后又会失败,请问有参数可以优化一下吗?
flink 日志如下:
2023-05-04 11:31:02,636 WARN org.apache.flink.connector.kafka.source.reader.KafkaSourceReader [] - Failed to commit consumer offsets for checkpoint 69153
org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset commit failed with a retriable exception. You should retry committing the latest consumed offsets.
Caused by: org.apache.kafka.common.errors.CoordinatorNotAvailableException: The coordinator is not available.