You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yang Jie (Jira)" <ji...@apache.org> on 2023/02/19 08:55:00 UTC

[jira] [Comment Edited] (SPARK-41952) Upgrade Parquet to fix off-heap memory leaks in Zstd codec

    [ https://issues.apache.org/jira/browse/SPARK-41952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690853#comment-17690853 ] 

Yang Jie edited comment on SPARK-41952 at 2/19/23 8:54 AM:
-----------------------------------------------------------

For the old Spark versions, is it possible to introduce other costs by upgrading parquet?  Should we directly introduce parquet.hadoop.CodecFactory to old Spark version and fix them accordingly? 

After that, we can also revert the changes of the Spark version(for example, master and Spark 3.4) that can be solved by upgrading parquet


was (Author: luciferyang):
For the old Spark versions, is it possible to introduce other costs by upgrading parquet?  Should we directly introduce parquet.hadoop.CodecFactory to old Spark version and fix them accordingly? 

After that, we can also revert the changes to the Spark version(for example, master and Spark 3.4) that can be solved by upgrading parquet

> Upgrade Parquet to fix off-heap memory leaks in Zstd codec
> ----------------------------------------------------------
>
>                 Key: SPARK-41952
>                 URL: https://issues.apache.org/jira/browse/SPARK-41952
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 3.1.3, 3.3.1, 3.2.3
>            Reporter: Alexey Kudinkin
>            Priority: Critical
>
> Recently, native memory leak have been discovered in Parquet in conjunction of it using Zstd decompressor from luben/zstd-jni library (PARQUET-2160).
> This is very problematic to a point where we can't use Parquet w/ Zstd due to pervasive OOMs taking down our executors and disrupting our jobs.
> Luckily fix addressing this had already landed in Parquet:
> [https://github.com/apache/parquet-mr/pull/982]
>  
> Now, we just need to
>  # Updated version of Parquet is released in a timely manner
>  # Spark is upgraded onto this new version in the upcoming release
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org