You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Preston Price (Jira)" <ji...@apache.org> on 2021/10/14 22:52:00 UTC

[jira] [Commented] (FLINK-24497) Kafka metrics fetching throws IllegalStateException

    [ https://issues.apache.org/jira/browse/FLINK-24497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429034#comment-17429034 ] 

Preston Price commented on FLINK-24497:
---------------------------------------

It looks like this new metric "records-lag" was introduced recently here: [https://github.com/apache/flink/pull/16838]

But I have not fully groked the change there to understand its purpose, or exactly why this error is surfacing.

Some quick debugging on my side shows that the `Map<MetricName, ? extends Metric> metrics` contains metrics with the expected group `CONSUMER_FETCH_MANAGER_GROUP`, but no metrics with the expected name `records-lag`. This causes the call to `MetricUtil.getKafkaMetric(metrics, filter)` to throw the exception because it expects to get at least one match via the `orElseThrow` statement.

This leads me to believe this is a cold-start problem where the absence/initial calculation of this metric is not handled gracefully.

I am hoping there is a graceful way to mute these exceptions as they are prolific, and clogging up my output.

> Kafka metrics fetching throws IllegalStateException 
> ----------------------------------------------------
>
>                 Key: FLINK-24497
>                 URL: https://issues.apache.org/jira/browse/FLINK-24497
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Kafka
>            Reporter: Juha
>            Priority: Minor
>             Fix For: 1.14.0
>
>
> I have a simple job that just consumes from a single Kafka topic, performs some filtering and produces to another topic.
> The TaskManager log has these periodically. This is a new problem in 1.14.0, the same setup didn't have the issue when using 1.13.0 or 1.13.2.
>  {code}
> 2021-10-05T15:22:31.928316  [2021-10-05 15:22:31,927] WARN Error when getting Kafka consumer metric "records-lag" for partition "cpu.kafka-1". Metric "pendingBytes" may not be reported correctly.  (org.apache.flink.connector.kafka.source.metrics.KafkaSourceReaderMetrics:306)
> 2021-10-05T15:22:31.928316  java.lang.IllegalStateException: Cannot find Kafka metric matching current filter.
> 2021-10-05T15:22:31.928316  	at org.apache.flink.connector.kafka.MetricUtil.lambda$getKafkaMetric$1(MetricUtil.java:63) ~[flink-connector-kafka_2.12-1.14.0.jar:1.14.0]
> 2021-10-05T15:22:31.928316  	at java.util.Optional.orElseThrow(Optional.java:408) ~[?:?]
> 2021-10-05T15:22:31.928316  	at org.apache.flink.connector.kafka.MetricUtil.getKafkaMetric(MetricUtil.java:61) ~[flink-connector-kafka_2.12-1.14.0.jar:1.14.0]
> 2021-10-05T15:22:31.928316  	at org.apache.flink.connector.kafka.source.metrics.KafkaSourceReaderMetrics.getRecordsLagMetric(KafkaSourceReaderMetrics.java:304) ~[flink-connector-kafka_2.12-1.14.0.jar:1.14.0]
> 2021-10-05T15:22:31.928316  	at org.apache.flink.connector.kafka.source.metrics.KafkaSourceReaderMetrics.lambda$maybeAddRecordsLagMetric$4(KafkaSourceReaderMetrics.java:229) ~[flink-connector-kafka_2.12-1.14.0.jar:1.14.0]
> 2021-10-05T15:22:31.928316  	at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705) [?:?]
> 2021-10-05T15:22:31.928316  	at org.apache.flink.connector.kafka.source.metrics.KafkaSourceReaderMetrics.maybeAddRecordsLagMetric(KafkaSourceReaderMetrics.java:228) [flink-connector-kafka_2.12-1.14.0.jar:1.14.0]
> 2021-10-05T15:22:31.928316  	at org.apache.flink.connector.kafka.source.reader.KafkaPartitionSplitReader.fetch(KafkaPartitionSplitReader.java:187) [flink-connector-kafka_2.12-1.14.0.jar:1.14.0]
> 2021-10-05T15:22:31.928316  	at org.apache.flink.connector.base.source.reader.fetcher.FetchTask.run(FetchTask.java:58) [flink-table_2.12-1.14.0.jar:1.14.0]
> 2021-10-05T15:22:31.928316  	at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:142) [flink-table_2.12-1.14.0.jar:1.14.0]
> 2021-10-05T15:22:31.928316  	at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.run(SplitFetcher.java:105) [flink-table_2.12-1.14.0.jar:1.14.0]
> 2021-10-05T15:22:31.928316  	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
> 2021-10-05T15:22:31.928316  	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
> 2021-10-05T15:22:31.928316  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
> 2021-10-05T15:22:31.928316  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
> 2021-10-05T15:22:31.928316  	at java.lang.Thread.run(Thread.java:829) [?:?]
> {code}
> Regards,
> Juha



--
This message was sent by Atlassian Jira
(v8.3.4#803005)