You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "凭落 (JIRA)" <ji...@apache.org> on 2019/08/13 13:43:00 UTC

[jira] [Created] (SPARK-28712) spark structured stream with kafka don't really delete temp files in spark standalone cluster

凭落 created SPARK-28712:
--------------------------

             Summary: spark structured stream with kafka don't really delete temp files in spark standalone cluster
                 Key: SPARK-28712
                 URL: https://issues.apache.org/jira/browse/SPARK-28712
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 2.4.3
         Environment: redhat 7
spark standalone cluster 2.4.3
kafka 0.10.2.1
 
            Reporter: 凭落


the folder in  Driver
{noformat}
/tmp/temporary-xxxxxxxx{noformat}
 takes up all the space in /tmp after runing spark structured stream job a long time.

it is mainly under the offsets and commits folders.but when I watch it by us command
{noformat}
du -sh offsets     du -sh commits{noformat}
it got more than 600M,but when We  use command
{noformat}
ll -h offsets       ll -h commits{noformat}
it got 400K.

I think it is because when the file is deleted,it is still used in job.

It wasn't released only if the job is stopped.

How can I solve it?

We use 
{code}
df.writeStream.trigger(ProcessingTime("1 seconds"))
{code}
not
{code}
df.writeStream.trigger(Continuous("1 seconds"))
{code}
Is there something wrong here?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org