You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "凭落 (JIRA)" <ji...@apache.org> on 2019/08/13 13:43:00 UTC
[jira] [Created] (SPARK-28712) spark structured stream with kafka
don't really delete temp files in spark standalone cluster
凭落 created SPARK-28712:
--------------------------
Summary: spark structured stream with kafka don't really delete temp files in spark standalone cluster
Key: SPARK-28712
URL: https://issues.apache.org/jira/browse/SPARK-28712
Project: Spark
Issue Type: Bug
Components: Structured Streaming
Affects Versions: 2.4.3
Environment: redhat 7
spark standalone cluster 2.4.3
kafka 0.10.2.1
Reporter: 凭落
the folder in Driver
{noformat}
/tmp/temporary-xxxxxxxx{noformat}
takes up all the space in /tmp after runing spark structured stream job a long time.
it is mainly under the offsets and commits folders.but when I watch it by us command
{noformat}
du -sh offsets du -sh commits{noformat}
it got more than 600M,but when We use command
{noformat}
ll -h offsets ll -h commits{noformat}
it got 400K.
I think it is because when the file is deleted,it is still used in job.
It wasn't released only if the job is stopped.
How can I solve it?
We use
{code}
df.writeStream.trigger(ProcessingTime("1 seconds"))
{code}
not
{code}
df.writeStream.trigger(Continuous("1 seconds"))
{code}
Is there something wrong here?
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org