You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Roberts (JIRA)" <ji...@apache.org> on 2018/01/17 09:50:00 UTC
[jira] [Created] (SPARK-23130) Spark Thrift does not clean-up
temporary files (/tmp/*_resources and /tmp/hive/*.pipeout)
Sean Roberts created SPARK-23130:
------------------------------------
Summary: Spark Thrift does not clean-up temporary files (/tmp/*_resources and /tmp/hive/*.pipeout)
Key: SPARK-23130
URL: https://issues.apache.org/jira/browse/SPARK-23130
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.2.0, 2.1.0, 1.6.3
Environment: * OS: Seen on SLES12, RHEL 7.3 & RHEL 7.4
* Spark versions: 1.6.3, 2.1.0, 2.2.0
* Hadoop distributions: HDP 2.5 - 2.6.3.0
Reporter: Sean Roberts
Spark Thrift is not cleaning up /tmp for files & directories named like:
/tmp/hive/*.pipeout
/tmp/*_resources
There are such a large number that /tmp quickly runs out of inodes *causing the partition to be unusable and many services to crash*. This is even true when the only jobs submitted are routine service checks.
Used `strace` to show that Spark Thrift is responsible:
{code:java}
strace.out.118864:04:53:49 open("/tmp/hive/55ad7fc1-f79a-4ad8-8e02-26bbeaa86bbc7288010135864174970.pipeout", O_RDWR|O_CREAT|O_EXCL, 0666) = 134
strace.out.118864:04:53:49 mkdir("/tmp/b6dfbf9e-2f7c-4c25-95a1-73c44318ecf4_resources", 0777) = 0
{code}
*Those files were left behind, even days later.*
**
----
Example files:
{code:java}
# stat /tmp/hive/55ad7fc1-f79a-4ad8-8e02-26bbeaa86bbc7288010135864174970.pipeout
File: ‘/tmp/hive/55ad7fc1-f79a-4ad8-8e02-26bbeaa86bbc7288010135864174970.pipeout’
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: fe09h/65033d Inode: 678 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ hive) Gid: ( 1002/ hadoop)
Access: 2017-12-19 04:53:49.126777260 -0600
Modify: 2017-12-19 04:53:49.126777260 -0600
Change: 2017-12-19 04:53:49.126777260 -0600
Birth: -
# stat /tmp/b6dfbf9e-2f7c-4c25-95a1-73c44318ecf4_resources
File: ‘/tmp/b6dfbf9e-2f7c-4c25-95a1-73c44318ecf4_resources’
Size: 4096 Blocks: 8 IO Block: 4096 directory
Device: fe09h/65033d Inode: 668 Links: 2
Access: (0700/drwx------) Uid: ( 1000/ hive) Gid: ( 1002/ hadoop)
Access: 2017-12-19 04:57:38.458937635 -0600
Modify: 2017-12-19 04:53:49.062777216 -0600
Change: 2017-12-19 04:53:49.066777218 -0600
Birth: -
{code}
Showing the large number:
{code:java}
# find /tmp/ -name '*_resources' | wc -l
68340
# find /tmp/hive -name "*.pipeout" | wc -l
51837
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org