You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "GuangkunJiang (Jira)" <ji...@apache.org> on 2022/06/22 10:21:00 UTC
[jira] [Comment Edited] (SPARK-39554) insertIntoHive ExternalTmpPath won't be clear when the app being killed
[ https://issues.apache.org/jira/browse/SPARK-39554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557354#comment-17557354 ]
GuangkunJiang edited comment on SPARK-39554 at 6/22/22 10:20 AM:
-----------------------------------------------------------------
this is my
temporary fix code TT
try {
if (!FileUtils.mkdir(fs, dir, true, hadoopConf)) {
throw new IllegalStateException("Cannot create staging directory '" + dir.toString + "'")
}
createdTempDir = Some(dir)
fs.deleteOnExit(dir)
} catch {
case e: IOException =>
throw new RuntimeException(
"Cannot create staging directory '" + dir.toString + "': " + e.getMessage, e)
}
was (Author: JIRAUSER291404):
this is my
temporary fix code TT
> insertIntoHive ExternalTmpPath won't be clear when the app being killed
> -----------------------------------------------------------------------
>
> Key: SPARK-39554
> URL: https://issues.apache.org/jira/browse/SPARK-39554
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.1, 3.2.1
> Environment: ubuntu16.04
> hadoop3.1.1
> hive 3.1.2
> Reporter: GuangkunJiang
> Priority: Critical
>
> When there is a problem with some types of SparkSql execution (eg: InsertIntoHiveDirCommand and InsertIntoTableDirCommand)
> When exiting abnormally, such as being killed by yarn, the .hive-staging directory being written will remain and will not be deleted.
> Check the source code to find the specific location here:
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable#run
> ```scala
> val tmpLocation = getExternalTmpPath(sparkSession, hadoopConf, tableLocation)
> try {
> processInsert(sparkSession, externalCatalog, hadoopConf, tableDesc, tmpLocation, child)
> } finally {
> // Attempt to delete the staging directory and the inclusive files. If failed, the files are
> // expected to be dropped at the normal termination of VM since deleteOnExit is used.
> deleteExternalTmpPath(hadoopConf)
> }
> ```
> From spark driver log, I got spark only do shuthook when the application being killed;
> I have two questions:
> 1. The deleteExternalTmpPath method in finally has no effect when the process is killed
> 2. fs.deleteOnExit(dir) According to the annotation, the data will be cleaned up when the jvm is destroyed
> TmpFixed like this:
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org