You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "GuangkunJiang (Jira)" <ji...@apache.org> on 2022/06/22 10:21:00 UTC

[jira] [Comment Edited] (SPARK-39554) insertIntoHive ExternalTmpPath won't be clear when the app being killed

    [ https://issues.apache.org/jira/browse/SPARK-39554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557354#comment-17557354 ] 

GuangkunJiang edited comment on SPARK-39554 at 6/22/22 10:20 AM:
-----------------------------------------------------------------

this is my 
temporary fix code TT

    try {
      if (!FileUtils.mkdir(fs, dir, true, hadoopConf)) {
        throw new IllegalStateException("Cannot create staging directory  '" + dir.toString + "'")
      }
      createdTempDir = Some(dir)
      fs.deleteOnExit(dir)
    } catch {
      case e: IOException =>
        throw new RuntimeException(
          "Cannot create staging directory '" + dir.toString + "': " + e.getMessage, e)
    }


was (Author: JIRAUSER291404):
this is my 
temporary fix code TT

> insertIntoHive ExternalTmpPath won't be clear when the app being killed
> -----------------------------------------------------------------------
>
>                 Key: SPARK-39554
>                 URL: https://issues.apache.org/jira/browse/SPARK-39554
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.1, 3.2.1
>         Environment: ubuntu16.04
> hadoop3.1.1
> hive 3.1.2
>            Reporter: GuangkunJiang
>            Priority: Critical
>
> When there is a problem with some types of SparkSql execution (eg: InsertIntoHiveDirCommand and InsertIntoTableDirCommand)
> When exiting abnormally, such as being killed by yarn, the .hive-staging directory being written will remain and will not be deleted.
> Check the source code to find the specific location here:
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable#run
> ```scala
>     val tmpLocation = getExternalTmpPath(sparkSession, hadoopConf, tableLocation)
>     try {
>       processInsert(sparkSession, externalCatalog, hadoopConf, tableDesc, tmpLocation, child)
>     } finally {
>       // Attempt to delete the staging directory and the inclusive files. If failed, the files are
>       // expected to be dropped at the normal termination of VM since deleteOnExit is used.
>       deleteExternalTmpPath(hadoopConf)
>     }
> ```
> From spark driver log, I got spark only do shuthook when the application being killed;
> I have two questions:
> 1. The deleteExternalTmpPath method in finally has no effect when the process is killed
> 2. fs.deleteOnExit(dir) According to the annotation, the data will be cleaned up when the jvm is destroyed
> TmpFixed like this:



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org