You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Cheng Pan (Jira)" <ji...@apache.org> on 2022/04/01 12:55:00 UTC
[jira] [Commented] (SPARK-38703) High GC and memory footprint after switch to ZSTD
[ https://issues.apache.org/jira/browse/SPARK-38703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515907#comment-17515907 ]
Cheng Pan commented on SPARK-38703:
-----------------------------------
SPARK-34390 may helps, our benchmark of 1T TPC-DS shows the benefits. (compression using zstd in shuffle, not parquet)
{code:bash}
+-------------------+-----------------------+-----------------------+-----------------+
| lz4 | sum(task_cpu_time_s) | sum(task_run_time_s) | sum(gc_time_s) |
+-------------------+-----------------------+-----------------------+-----------------+
| lz4 | 1871242.5 | 3861923.8 | 197151.5 |
| zstd | 1989641.6 | 3326399.8 | 244333.2 |
| zstd_buffer_pool | 1912032.0 | 3342339.4 | 187262.3 |
+-------------------+-----------------------+-----------------------+-----------------+
{code}
> High GC and memory footprint after switch to ZSTD
> -------------------------------------------------
>
> Key: SPARK-38703
> URL: https://issues.apache.org/jira/browse/SPARK-38703
> Project: Spark
> Issue Type: Question
> Components: Input/Output
> Affects Versions: 3.1.2
> Reporter: Michael Taranov
> Priority: Major
>
> Hi All,
> We started to switch our Spark pipelines to read parquet with ZSTD compression.
> After the switch we see that memory footprint is much larger than previously with SNAPPY.
> Additionally GC stats of the jobs are much higher comparing to SNAPPY with the same workload as previously.
> Is there any configurations that may be relevant to read path, that may help in such cases ?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org