You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Neven Jovic (Jira)" <ji...@apache.org> on 2022/03/15 21:41:00 UTC

[jira] [Comment Edited] (SPARK-38329) High I/O wait when Spark Structured Streaming checkpoint changed to EFS

    [ https://issues.apache.org/jira/browse/SPARK-38329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17507241#comment-17507241 ] 

Neven Jovic edited comment on SPARK-38329 at 3/15/22, 9:40 PM:
---------------------------------------------------------------

[~hyukjin.kwon]  I updated Spark to 3.2.1, and I/O wait is still there. I used structured streaming monitoring tool and found out that my aggregated states in memory were continuously growing. I added watermark and that probably solved issue with State Store Provider (haven't seen that WARN message yet).  

About high I/O wait, I can assume that it comes from writing to efs. Here is screen shot of CPU Utilization with updated Spark and same load

 


was (Author: JIRAUSER285811):
[~hyukjin.kwon]  I updated Spark to 3.2.1, and I/O wait is still there. I used structured streaming monitoring tool and found out that my aggregated states in memory were continuously growing. I added watermark and that probably solved issue with State Store Provider (haven't seen that WARN message yet).  

About high I/O wait, I can assume that it comes from writing to efs. Here is screen shot of CPU Utilization with updated Spark and same load

 

> High I/O wait when Spark Structured Streaming checkpoint changed to EFS
> -----------------------------------------------------------------------
>
>                 Key: SPARK-38329
>                 URL: https://issues.apache.org/jira/browse/SPARK-38329
>             Project: Spark
>          Issue Type: Question
>          Components: EC2, Input/Output, PySpark, Structured Streaming
>    Affects Versions: 2.4.6
>            Reporter: Neven Jovic
>            Priority: Major
>         Attachments: 100k_zbx_21.png
>
>
> I'm currently running spark structured streaming application written in python(pyspark) where my source is kafka topic and sink i mongodb. I changed my checkpoint to Amazon EFS, which is distributed on all spark workers and after that I got increased I/o wait, averaging 8%
>  
> !Screenshot from 2022-02-25 14-16-11.png!
> Currently I have 6000 messages coming to kafka every second, and I get every once in a while a WARN message:
> {quote}22/02/25 13:12:31 WARN HDFSBackedStateStoreProvider: Error cleaning up files for HDFSStateStoreProvider[id = (op=0,part=90),dir = file:/mnt/efs_max_io/spark/state/0/90] java.lang.NumberFormatException: For input string: ""
> {quote}
> I'm not quite sure if that message has anything to do with high I/O wait and is this behavior expected, or something to be concerned about?
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org