You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Sean R. Owen (Jira)" <ji...@apache.org> on 2022/04/16 20:20:00 UTC

[jira] [Resolved] (SPARK-38703) High GC and memory footprint after switch to ZSTD

     [ https://issues.apache.org/jira/browse/SPARK-38703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean R. Owen resolved SPARK-38703.
----------------------------------
    Resolution: Invalid

I'd put questions on the mailing list. Desipte the "Question" type here, this is more for tracking concrete bugs, improvements

> High GC and memory footprint after switch to ZSTD
> -------------------------------------------------
>
>                 Key: SPARK-38703
>                 URL: https://issues.apache.org/jira/browse/SPARK-38703
>             Project: Spark
>          Issue Type: Question
>          Components: Input/Output
>    Affects Versions: 3.1.2
>            Reporter: Michael Taranov
>            Priority: Major
>
> Hi All,
> We started to switch our Spark pipelines to read parquet with ZSTD compression. 
> After the switch we see that memory footprint is much larger than previously with SNAPPY.
> Additionally GC stats of the jobs are much higher comparing to SNAPPY with the same workload as previously. 
> Is there any configurations that may be relevant to read path, that may help in such cases ?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org