You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "GuangkunJiang (Jira)" <ji...@apache.org> on 2022/06/22 10:17:00 UTC

[jira] [Created] (SPARK-39554) insertIntoHive ExternalTmpPath won't be clear when the app being killed

GuangkunJiang created SPARK-39554:
-------------------------------------

             Summary: insertIntoHive ExternalTmpPath won't be clear when the app being killed
                 Key: SPARK-39554
                 URL: https://issues.apache.org/jira/browse/SPARK-39554
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.2.1, 2.4.1
         Environment: ubuntu16.04

hadoop3.1.1

hive 3.1.2
            Reporter: GuangkunJiang


When there is a problem with some types of SparkSql execution (eg: InsertIntoHiveDirCommand and InsertIntoTableDirCommand)
When exiting abnormally, such as being killed by yarn, the .hive-staging directory being written will remain and will not be deleted.

Check the source code to find the specific location here:
org.apache.spark.sql.hive.execution.InsertIntoHiveTable#run

```scala
    val tmpLocation = getExternalTmpPath(sparkSession, hadoopConf, tableLocation)

    try {
      processInsert(sparkSession, externalCatalog, hadoopConf, tableDesc, tmpLocation, child)
    } finally {
      // Attempt to delete the staging directory and the inclusive files. If failed, the files are
      // expected to be dropped at the normal termination of VM since deleteOnExit is used.
      deleteExternalTmpPath(hadoopConf)
    }
```
From spark driver log, I got spark only do shuthook when the application being killed;
I have two questions:
1. The deleteExternalTmpPath method in finally has no effect when the process is killed
2. fs.deleteOnExit(dir) According to the annotation, the data will be cleaned up when the jvm is destroyed

TmpFixed like this:



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org