You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shixiong Zhu (JIRA)" <ji...@apache.org> on 2018/08/24 22:39:00 UTC
[jira] [Resolved] (SPARK-25106) A new Kafka consumer gets created
for every batch
[ https://issues.apache.org/jira/browse/SPARK-25106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shixiong Zhu resolved SPARK-25106.
----------------------------------
Resolution: Duplicate
Thanks for reporting this. I'm closing this as a duplicate of SPARK-24987
> A new Kafka consumer gets created for every batch
> -------------------------------------------------
>
> Key: SPARK-25106
> URL: https://issues.apache.org/jira/browse/SPARK-25106
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 2.3.1
> Reporter: Alexis Seigneurin
> Priority: Major
> Attachments: console.txt
>
>
> I have a fairly simple piece of code that reads from Kafka, applies some transformations - including applying a UDF - and writes the result to the console. Every time a batch is created, a new consumer is created (and not closed), eventually leading to a "too many open files" error.
> I created a test case, with the code available here: [https://github.com/aseigneurin/spark-kafka-issue]
> To reproduce:
> # Start Kafka and create a topic called "persons"
> # Run "Producer" to generate data
> # Run "Consumer"
> I am attaching the log where you can see a new consumer being initialized between every batch.
> Please note this issue does *not* appear with Spark 2.2.2, and it does not appear either when I don't apply the UDF.
> I am suspecting - although I did go far enough to confirm - that this issue is related to the improvement made in SPARK-23623.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org