You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gabor Somogyi (JIRA)" <ji...@apache.org> on 2018/12/19 14:04:00 UTC

[jira] [Commented] (SPARK-26396) Kafka consumer cache overflow since 2.4.x

    [ https://issues.apache.org/jira/browse/SPARK-26396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725021#comment-16725021 ] 

Gabor Somogyi commented on SPARK-26396:
---------------------------------------

[~Tint] seems like you're trying to scale your application vertically which requires really strong machine.

Try to scale horizontally and add further executors. This way the load will be split up. This will reduce the amount of cached consumers per JVM.

> Kafka consumer cache overflow since 2.4.x
> -----------------------------------------
>
>                 Key: SPARK-26396
>                 URL: https://issues.apache.org/jira/browse/SPARK-26396
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.4.0
>         Environment: Spark 2.4 standalone client mode
>            Reporter: Kaspar Tint
>            Priority: Major
>
> We are experiencing an issue where the Kafka consumer cache seems to overflow constantly upon starting the application. This issue appeared after upgrading to Spark 2.4.
> We would get constant warnings like this:
> {code:java}
> 18/12/18 07:03:29 WARN KafkaDataConsumer: KafkaConsumer cache hitting max capacity of 180, removing consumer for CacheKey(spark-kafka-source-6f66e0d2-beaf-4ff2-ade8-8996611de6ae--1081651087-executor,kafka-topic-76)
> 18/12/18 07:03:32 WARN KafkaDataConsumer: KafkaConsumer cache hitting max capacity of 180, removing consumer for CacheKey(spark-kafka-source-6f66e0d2-beaf-4ff2-ade8-8996611de6ae--1081651087-executor,kafka-topic-30)
> 18/12/18 07:03:32 WARN KafkaDataConsumer: KafkaConsumer cache hitting max capacity of 180, removing consumer for CacheKey(spark-kafka-source-f41d1f9e-1700-4994-9d26-2b9c0ee57881--215746753-executor,kafka-topic-57)
> 18/12/18 07:03:32 WARN KafkaDataConsumer: KafkaConsumer cache hitting max capacity of 180, removing consumer for CacheKey(spark-kafka-source-f41d1f9e-1700-4994-9d26-2b9c0ee57881--215746753-executor,kafka-topic-43)
> {code}
> This application is running 4 different Spark Structured Streaming queries against the same Kafka topic that has 90 partitions. We used to run it with just the default settings so it defaulted to cache size 64 on Spark 2.3 but now we tried to put it to 180 or 360. With 360 we will have a lot less noise about the overflow but resource need will increase substantially.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org