You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/10/06 00:18:00 UTC

[jira] [Updated] (SPARK-43991) Use the value of spark.eventLog.compression.codec set by user when write compact file

     [ https://issues.apache.org/jira/browse/SPARK-43991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated SPARK-43991:
-----------------------------------
    Labels: pull-request-available  (was: )

> Use the value of spark.eventLog.compression.codec set by user when write compact file
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-43991
>                 URL: https://issues.apache.org/jira/browse/SPARK-43991
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, Web UI
>    Affects Versions: 3.4.0
>            Reporter: shuyouZZ
>            Priority: Major
>              Labels: pull-request-available
>
> Currently, if enable rolling log in SHS, only {{originalFilePath}} is used to determine the path of compact file.
> {code:java}
> override val logPath: String = originalFilePath.toUri.toString + EventLogFileWriter.COMPACTED
> {code}
> If the user set {{spark.eventLog.compression.codec}} in sparkConf and it is different from the default value of spark conf, when the log compact logic is triggered, the old event log file will be compacted and use the default value of spark conf.
> {code:java}
> protected val compressionCodec =
>     if (shouldCompress) {
>       Some(CompressionCodec.createCodec(sparkConf, sparkConf.get(EVENT_LOG_COMPRESSION_CODEC)))
>     } else {
>       None
>     }
> private[history] val compressionCodecName = compressionCodec.map { c =>
>     CompressionCodec.getShortName(c.getClass.getName)
>   }
> {code}
> However, The compression codec used by EventLogFileReader to read log is split from the log path, this will lead to EventLogFileReader can not read the compacted log file normally.
> {code:java}
> def codecName(log: Path): Option[String] = {
>     // Compression codec is encoded as an extension, e.g. app_123.lzf
>     // Since we sanitize the app ID to not include periods, it is safe to split on it
>     val logName = log.getName.stripSuffix(COMPACTED).stripSuffix(IN_PROGRESS)
>     logName.split("\\.").tail.lastOption
>   }
> {code}
> So we should override the {{shouldCompress}} and {{compressionCodec}} variable in class {{{}CompactedEventLogFileWriter{}}}, use the compression codec set by the user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org