You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Mihir Kelkar (Jira)" <ji...@apache.org> on 2022/10/26 21:59:00 UTC
[jira] [Updated] (SPARK-40927) Memory issue with Structured streaming

     [ https://issues.apache.org/jira/browse/SPARK-40927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mihir Kelkar updated SPARK-40927:
---------------------------------
    Description: 
In Pyspark Structured streaming with Kafka as source and sink, the driver as well as the executors seem to get OOM killed after a long period of time (few days). Not able to pinpoint to any specific thing.

But 8-12 hrs long runs also show the slow memory creep in Prometheus metrics values -
 # JVM Off-heap memory of both driver and executors keep on increasing over time (12-24hrs observation time) [I have NOT enabled off-heap usage]
 # JVM heap memory of executors also keeps on bumping up in slow steps.
 # JVM RSS of executors and driver keeps increasing but python RSS does not increase

-Basic operation of counting rows from within sdf.forEachBatch() is being done to debug ( -Original business logic has Some dropDuplicates, aggregations , windowing are being done within the forEachBatch.

-watermarking on a custom timestamp column is being done. 

 

Heap Dump analysis shows large no. of duplicate strings (which look like generated code). Further large no. of byte[], char[] and UTF8String objects.. Does this point to any potential memory leak in Tungsten optimizer related code?

  was:
In Pyspark Structured streaming with Kafka as source and sink, the driver as well as the executors seem to get OOM killed after a long period of time (8-12hrs). Not able to pinpoint to any specific thing. Prometheus metrics show that -
 # JVM Off-heap memory of both driver and executors keep on increasing over time (12-24hrs observation time) [I have NOT enabled off-heap usage]
 # JVM heap memory of executors also keeps on bumping up in slow steps.
 # JVM RSS of executors and driver keeps increasing but python RSS does not increase

-Basic operation of counting rows from within sdf.forEachBatch() is being done to debug ( -Original business logic has Some dropDuplicates, aggregations , windowing are being done within the forEachBatch.

-watermarking on a custom timestamp column is being done. 

 

Heap Dump analysis shows large no. of duplicate strings (which look like generated code). Further large no. of byte[], char[] and UTF8String objects.. Does this point to any potential memory leak in Tungsten optimizer related code?


> Memory issue with Structured streaming
> --------------------------------------
>
>                 Key: SPARK-40927
>                 URL: https://issues.apache.org/jira/browse/SPARK-40927
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 3.3.0, 3.2.2
>            Reporter: Mihir Kelkar
>            Priority: Major
>
> In Pyspark Structured streaming with Kafka as source and sink, the driver as well as the executors seem to get OOM killed after a long period of time (few days). Not able to pinpoint to any specific thing.
> But 8-12 hrs long runs also show the slow memory creep in Prometheus metrics values -
>  # JVM Off-heap memory of both driver and executors keep on increasing over time (12-24hrs observation time) [I have NOT enabled off-heap usage]
>  # JVM heap memory of executors also keeps on bumping up in slow steps.
>  # JVM RSS of executors and driver keeps increasing but python RSS does not increase
> -Basic operation of counting rows from within sdf.forEachBatch() is being done to debug ( -Original business logic has Some dropDuplicates, aggregations , windowing are being done within the forEachBatch.
> -watermarking on a custom timestamp column is being done. 
>  
> Heap Dump analysis shows large no. of duplicate strings (which look like generated code). Further large no. of byte[], char[] and UTF8String objects.. Does this point to any potential memory leak in Tungsten optimizer related code?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org