You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jungtaek Lim (JIRA)" <ji...@apache.org> on 2018/08/24 07:47:00 UTC

[jira] [Comment Edited] (SPARK-25106) A new Kafka consumer gets created for every batch

    [ https://issues.apache.org/jira/browse/SPARK-25106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591278#comment-16591278 ] 

Jungtaek Lim edited comment on SPARK-25106 at 8/24/18 7:46 AM:
---------------------------------------------------------------

I played with the project and looks like it is affected by SPARK-24987 (It will be available in 2.3.2). 

Ran Consumer with attaching file leak detector ([http://file-leak-detector.kohsuke.org/]) for both 2.3.1 and 2.4.0-SNAPSHOT (SPARK-24987 is pulled), and 2.4.0-SNAPSHOT doesn't have this issue while I can reproduce this from 2.3.1.

 

[~aseigneurin]

Would you mind if you test again with either pull Spark 2.3.2 RC5 (in Spark dev mailing list) or build latest branch-2.3 of Spark source code, and see whether the issue is resolved or not?


was (Author: kabhwan):
I played with the project and looks like it is affected by [SPARK-24987|https://github.com/apache/spark/commit/b7fdf8eb2011ae76f0161caa9da91e29f52f05e4] (It will be available in 2.3.2). 

Ran Consumer with attaching file leak detector ([http://file-leak-detector.kohsuke.org/]) for both 2.3.1 and 2.4.0-SNAPSHOT (SPARK-24987 is pulled), and 2.4.0-SNAPSHOT doesn't have this issue while I can reproduce this from 2.3.1.

 

[~aseigneurin]

Would you mind if you test again with either pull Spark 2.3.2 RC5 (in Spark dev mailing list) or build latest branch-2.3 of Spark source code, and see whether the issue is resolved or not?

> A new Kafka consumer gets created for every batch
> -------------------------------------------------
>
>                 Key: SPARK-25106
>                 URL: https://issues.apache.org/jira/browse/SPARK-25106
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.3.1
>            Reporter: Alexis Seigneurin
>            Priority: Major
>         Attachments: console.txt
>
>
> I have a fairly simple piece of code that reads from Kafka, applies some transformations - including applying a UDF - and writes the result to the console. Every time a batch is created, a new consumer is created (and not closed), eventually leading to a "too many open files" error.
> I created a test case, with the code available here: [https://github.com/aseigneurin/spark-kafka-issue]
> To reproduce:
>  # Start Kafka and create a topic called "persons"
>  # Run "Producer" to generate data
>  # Run "Consumer"
> I am attaching the log where you can see a new consumer being initialized between every batch.
> Please note this issue does *not* appear with Spark 2.2.2, and it does not appear either when I don't apply the UDF.
> I am suspecting - although I did go far enough to confirm - that this issue is related to the improvement made in SPARK-23623.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org