You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jungtaek Lim (Jira)" <ji...@apache.org> on 2020/09/19 08:40:00 UTC

[jira] [Updated] (SPARK-30294) Read-only state store unnecessarily creates and deletes the temp file for delta file every batch

     [ https://issues.apache.org/jira/browse/SPARK-30294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jungtaek Lim updated SPARK-30294:
---------------------------------
    Issue Type: Improvement  (was: Bug)

> Read-only state store unnecessarily creates and deletes the temp file for delta file every batch
> ------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-30294
>                 URL: https://issues.apache.org/jira/browse/SPARK-30294
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.0.0
>            Reporter: Jungtaek Lim
>            Priority: Minor
>
> [https://github.com/apache/spark/blob/d38f8167483d4d79e8360f24a8c0bffd51460659/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala#L143-L155]
> {code:java}
>     /** Abort all the updates made on this store. This store will not be usable any more. */
>     override def abort(): Unit = {
>       // This if statement is to ensure that files are deleted only if there are changes to the
>       // StateStore. We have two StateStores for each task, one which is used only for reading, and
>       // the other used for read+write. We don't want the read-only to delete state files.
>       if (state == UPDATING) {
>         state = ABORTED
>         cancelDeltaFile(compressedStream, deltaFileStream)
>       } else {
>         state = ABORTED
>       }
>       logInfo(s"Aborted version $newVersion for $this")
>     } {code}
> Despite of the comment, read-only state store also does the same things for preparing write - creates the temporary file, initializes output streams for the file, closes these output streams, and deletes the temporary file. That is just unnecessary and gives confusion as according to the log messages two different instances seem to write to same delta file.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org