You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "zlzhang0122 (Jira)" <ji...@apache.org> on 2021/10/09 04:07:00 UTC

[jira] [Updated] (FLINK-24122) Add support to do clean in history server

     [ https://issues.apache.org/jira/browse/FLINK-24122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

zlzhang0122 updated FLINK-24122:
--------------------------------
    Description: 
Now, the history server can clean history jobs by two means:
 # if users have configured 
{code:java}
historyserver.archive.clean-expired-jobs: true{code}
, then compare the files in hdfs over two clean interval and find the delete and clean the local cache file.

 # if users have configured the 
{code:java}
historyserver.archive.retained-jobs:{code}
a positive number, then clean the oldest files in hdfs and local.

But the retained-jobs number is difficult to determine.

For example, users may want to check the history jobs yesterday while many jobs failed today and exceed the retained-jobs number, then the history jobs of yesterday will be delete. So what if add a configuration which contain a retained-times that indicate the max time the history job retain?

  was:
Now, the history server can clean history jobs by two means:
 # if users have configured 
{code:java}
historyserver.archive.clean-expired-jobs: true{code}
, then compare the files in hdfs over two clean interval and find the delete and clean the local cache file.

 # if users have configured the 
{code:java}
historyserver.archive.retained-jobs{code}
a positive number, then clean the oldest files in hdfs and local.

But it can't clean the job history which was no longer in hdfs but still cached in local filesystem and these files will store forever and can't be cleaned unless users manually do this. Maybe we can give a option and do this clean if the option says true.


> Add support to do clean in history server
> -----------------------------------------
>
>                 Key: FLINK-24122
>                 URL: https://issues.apache.org/jira/browse/FLINK-24122
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / REST
>    Affects Versions: 1.12.3, 1.13.2
>            Reporter: zlzhang0122
>            Priority: Minor
>             Fix For: 1.14.1
>
>
> Now, the history server can clean history jobs by two means:
>  # if users have configured 
> {code:java}
> historyserver.archive.clean-expired-jobs: true{code}
> , then compare the files in hdfs over two clean interval and find the delete and clean the local cache file.
>  # if users have configured the 
> {code:java}
> historyserver.archive.retained-jobs:{code}
> a positive number, then clean the oldest files in hdfs and local.
> But the retained-jobs number is difficult to determine.
> For example, users may want to check the history jobs yesterday while many jobs failed today and exceed the retained-jobs number, then the history jobs of yesterday will be delete. So what if add a configuration which contain a retained-times that indicate the max time the history job retain?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)