You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/07/26 02:50:46 UTC

[GitHub] [incubator-doris] BiteTheDDDDt opened a new pull request #6323: [Feature] Support for cleaning the trash actively

BiteTheDDDDt opened a new pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323


   ## Proposed changes
   Support for cleaning the trash actively.
   User can use 'CLEAN TRASH' to clean trash.
   
   ## Types of changes
   
   What types of changes does your code introduce to Doris?
   _Put an `x` in the boxes that apply_
   
   - [ ] Bugfix (non-breaking change which fixes an issue)
   - [X] New feature (non-breaking change which adds functionality)
   - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
   - [X] Documentation Update (if none of the other choices apply)
   - [ ] Code refactor (Modify the code structure, format the code, etc...)
   - [ ] Optimization. Including functional usability improvements and performance improvements.
   - [ ] Dependency. Such as changes related to third-party components.
   - [ ] Other.
   
   ## Checklist
   
   _Put an `x` in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code._
   
   - [X] I have created an issue on (Fix #6322) and described the bug/feature there in detail
   - [X] Compiling and unit tests pass locally with my changes
   - [ ] I have added tests that prove my fix is effective or that my feature works
   - [X] If these changes need document changes, I have updated the document
   - [ ] Any dependent changes have been merged


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] BiteTheDDDDt commented on a change in pull request #6323: [Feature] Support for cleaning the trash actively

Posted by GitBox <gi...@apache.org>.
BiteTheDDDDt commented on a change in pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323#discussion_r681408803



##########
File path: docs/zh-CN/administrator-guide/operation/disk-capacity.md
##########
@@ -129,11 +129,11 @@ capacity_min_left_bytes_flood_stage 默认 1GB。
 
     **这种操作会对 [从 BE 回收站中恢复数据](./tablet-restore-tool.md) 产生影响。**
 
-    如果BE还能够启动,则可以使用`CLEAN TRASH ON(BackendHost:BackendHeartBeatPort)`来主动清理临时文件,这里分为如下两种情况:
-    * 如果磁盘占用未达到 **危险水位(Flood Stage)** 的90%,则会清理过期trash文件和过期snapshot文件,此时会保留一些近期文件而不影响恢复数据。
-    * 如果磁盘占用已达到 **危险水位(Flood Stage)** 的90%,则会清理 **所有** trash文件和过期snapshot文件, **此时也会影响从回收站恢复数据的操作** 。
+    如果BE还能够启动,则可以使用`CLEAN TRASH ON(BackendHost:BackendHeartBeatPort);`来主动清理临时文件,会清理 **所有** trash文件和过期snapshot文件,**这将影响从回收站恢复数据的操作** 。
 
-    如果不手动执行`CLEAN TRASH`,系统仍将会在几分钟至几十分钟内自动执行清理。
+    如果不手动执行`CLEAN TRASH`,系统仍将会在几分钟至几十分钟内自动执行清理,这里分为两种情况:

Review comment:
       > 
   > 
   > same as above
   
   I fix these problems at 3c899f6




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #6323: [Feature] Support for cleaning the trash actively

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323#discussion_r684640597



##########
File path: gensrc/thrift/BackendService.thrift
##########
@@ -161,4 +161,5 @@ service BackendService {
 
     TStreamLoadRecordResult get_stream_load_record(1: i64 last_stream_record_time);
 
+    oneway void clean_trash();

Review comment:
       Is it safe to be called multi times using `oneway`?
   And is the method `start_trash_sweep()` thread safe?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] BiteTheDDDDt commented on a change in pull request #6323: [Feature] Support for cleaning the trash actively

Posted by GitBox <gi...@apache.org>.
BiteTheDDDDt commented on a change in pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323#discussion_r677152558



##########
File path: fe/fe-core/src/main/java/org/apache/doris/catalog/Catalog.java
##########
@@ -7109,4 +7113,28 @@ public void onErasePartition(Partition partition) {
             }
         }
     }
+
+    public void cleanTrash(CleanTrashStmt stmt) {
+        List<Backend> backends = stmt.getBackends();
+        for (Backend backend : backends){
+            BackendService.Client client = null;
+            TNetworkAddress address = null;
+            boolean ok = false;
+            try {
+                long start = System.currentTimeMillis();

Review comment:
       > 
   > 
   > The variable start is not used later.
   
   fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] BiteTheDDDDt commented on a change in pull request #6323: [Feature] Support for cleaning the trash actively

Posted by GitBox <gi...@apache.org>.
BiteTheDDDDt commented on a change in pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323#discussion_r676319331



##########
File path: docs/.vuepress/sidebar/en.js
##########
@@ -451,6 +451,7 @@ module.exports = [
               "SHOW MIGRATIONS",
               "SHOW PLUGINS",
               "SHOW TABLE STATUS",
+              "CLEAN TRASH",

Review comment:
       > 
   > 
   > You need to add both guide and sql reference
   
   https://github.com/apache/incubator-doris/pull/6323/commits/c31dd3e5d111da8e8cde3f00bd9367af49c04a96
   I add description at `/administrator-guide/operation/disk-capacity.md` .
   At the same time, I found that there is no corresponding English version of this document.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] nimuyuhan commented on a change in pull request #6323: [Feature] Support for cleaning the trash actively

Posted by GitBox <gi...@apache.org>.
nimuyuhan commented on a change in pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323#discussion_r677132938



##########
File path: fe/fe-core/src/main/java/org/apache/doris/catalog/Catalog.java
##########
@@ -7109,4 +7113,28 @@ public void onErasePartition(Partition partition) {
             }
         }
     }
+
+    public void cleanTrash(CleanTrashStmt stmt) {
+        List<Backend> backends = stmt.getBackends();
+        for (Backend backend : backends){
+            BackendService.Client client = null;
+            TNetworkAddress address = null;
+            boolean ok = false;
+            try {
+                long start = System.currentTimeMillis();

Review comment:
       The variable start is not used later.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] EmmyMiao87 commented on a change in pull request #6323: [Feature] Support for cleaning the trash actively

Posted by GitBox <gi...@apache.org>.
EmmyMiao87 commented on a change in pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323#discussion_r681386849



##########
File path: docs/en/administrator-guide/operation/disk-capacity.md
##########
@@ -131,12 +131,13 @@ When the disk capacity is higher than High Watermark or even Flood Stage, many o
 
     **This operation will affect [Restore data from BE Recycle Bin](./tablet-restore-tool.md).**
 
-    If the BE can still be started, you can use `CLEAN TRASH ON(BackendHost:BackendHeartBeatPort)` to actively clean up temporary files. There are two situations as follows: 
+    If the BE can still be started, you can use `CLEAN TRASH ON(BackendHost:BackendHeartBeatPort);` to actively clean up temporary files. **all trash files** and expired snapshot files will be cleaned up, **This will affect the operation of restoring data from the trash bin**.

Review comment:
       ```suggestion
       If the BE can still be started, you can use `ADMIN CLEAN TRASH ON(BackendHost:BackendHeartBeatPort);` to actively clean up temporary files. **all trash files** and expired snapshot files will be cleaned up, **This will affect the operation of restoring data from the trash bin**.
   ```

##########
File path: docs/zh-CN/administrator-guide/operation/disk-capacity.md
##########
@@ -129,11 +129,11 @@ capacity_min_left_bytes_flood_stage 默认 1GB。
 
     **这种操作会对 [从 BE 回收站中恢复数据](./tablet-restore-tool.md) 产生影响。**
 
-    如果BE还能够启动,则可以使用`CLEAN TRASH ON(BackendHost:BackendHeartBeatPort)`来主动清理临时文件,这里分为如下两种情况:
-    * 如果磁盘占用未达到 **危险水位(Flood Stage)** 的90%,则会清理过期trash文件和过期snapshot文件,此时会保留一些近期文件而不影响恢复数据。
-    * 如果磁盘占用已达到 **危险水位(Flood Stage)** 的90%,则会清理 **所有** trash文件和过期snapshot文件, **此时也会影响从回收站恢复数据的操作** 。
+    如果BE还能够启动,则可以使用`CLEAN TRASH ON(BackendHost:BackendHeartBeatPort);`来主动清理临时文件,会清理 **所有** trash文件和过期snapshot文件,**这将影响从回收站恢复数据的操作** 。
 
-    如果不手动执行`CLEAN TRASH`,系统仍将会在几分钟至几十分钟内自动执行清理。
+    如果不手动执行`CLEAN TRASH`,系统仍将会在几分钟至几十分钟内自动执行清理,这里分为两种情况:

Review comment:
       same as above

##########
File path: docs/en/administrator-guide/operation/disk-capacity.md
##########
@@ -131,12 +131,13 @@ When the disk capacity is higher than High Watermark or even Flood Stage, many o
 
     **This operation will affect [Restore data from BE Recycle Bin](./tablet-restore-tool.md).**
 
-    If the BE can still be started, you can use `CLEAN TRASH ON(BackendHost:BackendHeartBeatPort)` to actively clean up temporary files. There are two situations as follows: 
+    If the BE can still be started, you can use `CLEAN TRASH ON(BackendHost:BackendHeartBeatPort);` to actively clean up temporary files. **all trash files** and expired snapshot files will be cleaned up, **This will affect the operation of restoring data from the trash bin**.
+
+
+    If you do not manually execute `CLEAN TRASH`, the system will still automatically execute the cleanup within a few minutes to tens of minutes.There are two situations as follows: 

Review comment:
       ```suggestion
       If you do not manually execute `ADMIN CLEAN TRASH`, the system will still automatically execute the cleanup within a few minutes to tens of minutes.There are two situations as follows: 
   ```

##########
File path: be/src/service/backend_service.cpp
##########
@@ -340,6 +340,6 @@ void BackendService::get_stream_load_record(TStreamLoadRecordResult& result,
 }
 
 void BackendService::clean_trash() {
-    StorageEngine::instance()->start_trash_sweep(nullptr); // do not update usage
+    StorageEngine::instance()->start_trash_sweep(nullptr, true); // do not update usage, ignore guard_space

Review comment:
       ```suggestion
       StorageEngine::instance()->start_trash_sweep(nullptr, true); // update usage, ignore guard_space
   ```

##########
File path: docs/zh-CN/administrator-guide/operation/disk-capacity.md
##########
@@ -129,11 +129,11 @@ capacity_min_left_bytes_flood_stage 默认 1GB。
 
     **这种操作会对 [从 BE 回收站中恢复数据](./tablet-restore-tool.md) 产生影响。**
 
-    如果BE还能够启动,则可以使用`CLEAN TRASH ON(BackendHost:BackendHeartBeatPort)`来主动清理临时文件,这里分为如下两种情况:
-    * 如果磁盘占用未达到 **危险水位(Flood Stage)** 的90%,则会清理过期trash文件和过期snapshot文件,此时会保留一些近期文件而不影响恢复数据。
-    * 如果磁盘占用已达到 **危险水位(Flood Stage)** 的90%,则会清理 **所有** trash文件和过期snapshot文件, **此时也会影响从回收站恢复数据的操作** 。
+    如果BE还能够启动,则可以使用`CLEAN TRASH ON(BackendHost:BackendHeartBeatPort);`来主动清理临时文件,会清理 **所有** trash文件和过期snapshot文件,**这将影响从回收站恢复数据的操作** 。

Review comment:
       same as above




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #6323: [Feature] Support for cleaning the trash actively

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323#issuecomment-896850161


   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] BiteTheDDDDt commented on a change in pull request #6323: [Feature] Support for cleaning the trash actively

Posted by GitBox <gi...@apache.org>.
BiteTheDDDDt commented on a change in pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323#discussion_r683096713



##########
File path: be/src/service/backend_service.cpp
##########
@@ -339,4 +339,7 @@ void BackendService::get_stream_load_record(TStreamLoadRecordResult& result,
     }
 }
 
+void BackendService::clean_trash() {
+    StorageEngine::instance()->start_trash_sweep(nullptr, true);

Review comment:
       
   
   
   > 
   > 
   > It may takes a very long time to clean the trash. So I suggest to use a async call.
   I think this is already async, because of I use `oneway` to define the function at thrift file. 
   [`gensrc/thrift/BackendService.thrift`](https://github.com/apache/incubator-doris/blob/3c899f690d124d612b0d91949872cd9fb9faab80/gensrc/thrift/BackendService.thrift)
   `oneway void clean_trash();`
   

##########
File path: be/src/service/backend_service.cpp
##########
@@ -339,4 +339,7 @@ void BackendService::get_stream_load_record(TStreamLoadRecordResult& result,
     }
 }
 
+void BackendService::clean_trash() {
+    StorageEngine::instance()->start_trash_sweep(nullptr, true);

Review comment:
       
   
   > 
   > 
   > It may takes a very long time to clean the trash. So I suggest to use a async call.
   
   I think this is already async, because of I use `oneway` to define the function at thrift file. 
   [`gensrc/thrift/BackendService.thrift`](https://github.com/apache/incubator-doris/blob/3c899f690d124d612b0d91949872cd9fb9faab80/gensrc/thrift/BackendService.thrift)
   `oneway void clean_trash();`
   

##########
File path: fe/fe-core/src/main/java/org/apache/doris/analysis/AdminCleanTrashStmt.java
##########
@@ -0,0 +1,73 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.catalog.Catalog;
+import org.apache.doris.common.AnalysisException;
+import org.apache.doris.common.ErrorCode;
+import org.apache.doris.common.ErrorReport;
+import org.apache.doris.qe.ConnectContext;
+import org.apache.doris.system.Backend;
+import org.apache.doris.mysql.privilege.PrivPredicate;
+
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Lists;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+public class AdminCleanTrashStmt extends DdlStmt {
+    private List<Backend> backends = Lists.newArrayList();
+
+    public AdminCleanTrashStmt(List<String> backends) {
+        ImmutableMap<Long, Backend> backendsInfo = Catalog.getCurrentSystemInfo().getIdToBackend();
+        Map<String, Long> backendsID = new HashMap<String, Long>();
+        for (Backend backend : backendsInfo.values()) {
+            backendsID.put(String.valueOf(backend.getHost()) + ":" + String.valueOf(backend.getHeartbeatPort()), backend.getId());
+        }
+        if (backends == null) {
+            for (Backend backend : backendsInfo.values()) {
+                this.backends.add(backend);
+            }
+        } else {
+            for (String backend : backends) {
+                if (backendsID.get(backend) != null) {
+                    this.backends.add(backendsInfo.get(backendsID.get(backend)));
+                    backendsID.remove(backend); // avoid repetition
+                }
+            }
+        }
+    }
+
+    public List<Backend> getBackends() {
+        return backends;
+    }
+
+    @Override
+    public void analyze(Analyzer analyzer) throws AnalysisException {
+        if (!Catalog.getCurrentCatalog().getAuth().checkGlobalPriv(ConnectContext.get(), PrivPredicate.ADMIN)) {
+            ErrorReport.reportAnalysisException(ErrorCode.ERR_SPECIFIC_ACCESS_DENIED_ERROR, "ADMIN");
+        }
+    }
+
+    @Override
+    public RedirectStatus getRedirectStatus() {
+        return RedirectStatus.FORWARD_WITH_SYNC;

Review comment:
       > 
   > 
   > Do we need to forward this stmt to master?
   
   This does not seem to modify the metadata, so I change it to NO_FORWARD at 9765d59




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #6323: [Feature] Support for cleaning the trash actively

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323#discussion_r682783165



##########
File path: fe/fe-core/src/main/java/org/apache/doris/analysis/AdminCleanTrashStmt.java
##########
@@ -0,0 +1,73 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.catalog.Catalog;
+import org.apache.doris.common.AnalysisException;
+import org.apache.doris.common.ErrorCode;
+import org.apache.doris.common.ErrorReport;
+import org.apache.doris.qe.ConnectContext;
+import org.apache.doris.system.Backend;
+import org.apache.doris.mysql.privilege.PrivPredicate;
+
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Lists;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+public class AdminCleanTrashStmt extends DdlStmt {
+    private List<Backend> backends = Lists.newArrayList();
+
+    public AdminCleanTrashStmt(List<String> backends) {
+        ImmutableMap<Long, Backend> backendsInfo = Catalog.getCurrentSystemInfo().getIdToBackend();
+        Map<String, Long> backendsID = new HashMap<String, Long>();
+        for (Backend backend : backendsInfo.values()) {
+            backendsID.put(String.valueOf(backend.getHost()) + ":" + String.valueOf(backend.getHeartbeatPort()), backend.getId());
+        }
+        if (backends == null) {
+            for (Backend backend : backendsInfo.values()) {
+                this.backends.add(backend);
+            }
+        } else {
+            for (String backend : backends) {
+                if (backendsID.get(backend) != null) {
+                    this.backends.add(backendsInfo.get(backendsID.get(backend)));
+                    backendsID.remove(backend); // avoid repetition
+                }
+            }
+        }
+    }
+
+    public List<Backend> getBackends() {
+        return backends;
+    }
+
+    @Override
+    public void analyze(Analyzer analyzer) throws AnalysisException {
+        if (!Catalog.getCurrentCatalog().getAuth().checkGlobalPriv(ConnectContext.get(), PrivPredicate.ADMIN)) {
+            ErrorReport.reportAnalysisException(ErrorCode.ERR_SPECIFIC_ACCESS_DENIED_ERROR, "ADMIN");
+        }
+    }
+
+    @Override
+    public RedirectStatus getRedirectStatus() {
+        return RedirectStatus.FORWARD_WITH_SYNC;

Review comment:
       Do we need to forward this stmt to master?

##########
File path: be/src/service/backend_service.cpp
##########
@@ -339,4 +339,7 @@ void BackendService::get_stream_load_record(TStreamLoadRecordResult& result,
     }
 }
 
+void BackendService::clean_trash() {
+    StorageEngine::instance()->start_trash_sweep(nullptr, true);

Review comment:
       It may takes a very long time to clean the trash. So I suggest to use a async call.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] BiteTheDDDDt commented on pull request #6323: [Feature] Support for cleaning the trash actively

Posted by GitBox <gi...@apache.org>.
BiteTheDDDDt commented on pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323#issuecomment-888045591


   > 
   > 
   > How to force sweep trash?
   
   I do some change at https://github.com/apache/incubator-doris/pull/6323/commits/08df4ce559f1cd4536e0aed9d68cadd15fb1d6a4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] EmmyMiao87 commented on a change in pull request #6323: [Feature] Support for cleaning the trash actively

Posted by GitBox <gi...@apache.org>.
EmmyMiao87 commented on a change in pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323#discussion_r676266112



##########
File path: docs/.vuepress/sidebar/en.js
##########
@@ -451,6 +451,7 @@ module.exports = [
               "SHOW MIGRATIONS",
               "SHOW PLUGINS",
               "SHOW TABLE STATUS",
+              "CLEAN TRASH",

Review comment:
       You need to add both guide and sql reference




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] EmmyMiao87 commented on pull request #6323: [Feature] Support for cleaning the trash actively

Posted by GitBox <gi...@apache.org>.
EmmyMiao87 commented on pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323#issuecomment-886332546


   Please enrich your commit msg


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] EmmyMiao87 commented on a change in pull request #6323: [Feature] Support for cleaning the trash actively

Posted by GitBox <gi...@apache.org>.
EmmyMiao87 commented on a change in pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323#discussion_r676260313



##########
File path: fe/fe-core/src/main/java/org/apache/doris/qe/DdlExecutor.java
##########
@@ -237,6 +238,8 @@ public static void execute(Catalog catalog, DdlStmt ddlStmt) throws Exception {
             catalog.getResourceMgr().createResource((CreateResourceStmt) ddlStmt);
         } else if (ddlStmt instanceof DropResourceStmt) {
             catalog.getResourceMgr().dropResource((DropResourceStmt) ddlStmt);
+        } else if(ddlStmt instanceof CleanTrashStmt) {

Review comment:
       ```suggestion
           } else if (ddlStmt instanceof CleanTrashStmt) {
   ```

##########
File path: docs/zh-CN/sql-reference/sql-statements/Administration/CLEAN TRASH.md
##########
@@ -0,0 +1,47 @@
+---
+{
+    "title": "CLEAN TRASH",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# CLEAN TRASH
+## description
+    该语句用于清理 backend 内的垃圾数据。

Review comment:
       Will this statement clean up both trash and snapshot?
   Will cleaning up the snapshot involve the snapshot being restored?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg merged pull request #6323: [Feature] Support for cleaning the trash actively

Posted by GitBox <gi...@apache.org>.
yangzhg merged pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] BiteTheDDDDt commented on a change in pull request #6323: [Feature] Support for cleaning the trash actively

Posted by GitBox <gi...@apache.org>.
BiteTheDDDDt commented on a change in pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323#discussion_r686455399



##########
File path: be/src/olap/storage_engine.cpp
##########
@@ -685,6 +693,8 @@ OLAPStatus StorageEngine::_start_trash_sweep(double* usage) {
     // clean unused rowset metas in OlapMeta
     _clean_unused_rowset_metas();
 
+    _trash_sweep_lock.unlock();

Review comment:
       > 
   > 
   > The method may be returned before you unlock this lock.
   > you can use `src//util/mutex.h` to unlock automatically when deconstructing.
   
   I use unique_lock to fix it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] BiteTheDDDDt commented on a change in pull request #6323: [Feature] Support for cleaning the trash actively

Posted by GitBox <gi...@apache.org>.
BiteTheDDDDt commented on a change in pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323#discussion_r683096713



##########
File path: be/src/service/backend_service.cpp
##########
@@ -339,4 +339,7 @@ void BackendService::get_stream_load_record(TStreamLoadRecordResult& result,
     }
 }
 
+void BackendService::clean_trash() {
+    StorageEngine::instance()->start_trash_sweep(nullptr, true);

Review comment:
       
   
   
   > 
   > 
   > It may takes a very long time to clean the trash. So I suggest to use a async call.
   I think this is already async, because of I use `oneway` to define the function at thrift file. 
   [`gensrc/thrift/BackendService.thrift`](https://github.com/apache/incubator-doris/blob/3c899f690d124d612b0d91949872cd9fb9faab80/gensrc/thrift/BackendService.thrift)
   `oneway void clean_trash();`
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] BiteTheDDDDt commented on a change in pull request #6323: [Feature] Support for cleaning the trash actively

Posted by GitBox <gi...@apache.org>.
BiteTheDDDDt commented on a change in pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323#discussion_r676265235



##########
File path: docs/zh-CN/sql-reference/sql-statements/Administration/CLEAN TRASH.md
##########
@@ -0,0 +1,47 @@
+---
+{
+    "title": "CLEAN TRASH",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# CLEAN TRASH
+## description
+    该语句用于清理 backend 内的垃圾数据。

Review comment:
       > 
   > 
   > Will this statement clean up both trash and snapshot?
   > Will cleaning up the snapshot involve the snapshot being restored?
   
   Yes, this statement will clean up both trash and snapshot.
   
   This statement will call `StorageEngine::start_trash_sweep`.
   It will only clean up expired data (define at `config::snapshot_expire_time_sec`/`config::trash_file_expire_time_sec`).
   And this function will be automatically called periodically, so I think its cleanup is harmless. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] BiteTheDDDDt commented on a change in pull request #6323: [Feature] Support for cleaning the trash actively

Posted by GitBox <gi...@apache.org>.
BiteTheDDDDt commented on a change in pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323#discussion_r683096713



##########
File path: be/src/service/backend_service.cpp
##########
@@ -339,4 +339,7 @@ void BackendService::get_stream_load_record(TStreamLoadRecordResult& result,
     }
 }
 
+void BackendService::clean_trash() {
+    StorageEngine::instance()->start_trash_sweep(nullptr, true);

Review comment:
       
   
   > 
   > 
   > It may takes a very long time to clean the trash. So I suggest to use a async call.
   
   I think this is already async, because of I use `oneway` to define the function at thrift file. 
   [`gensrc/thrift/BackendService.thrift`](https://github.com/apache/incubator-doris/blob/3c899f690d124d612b0d91949872cd9fb9faab80/gensrc/thrift/BackendService.thrift)
   `oneway void clean_trash();`
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] BiteTheDDDDt commented on a change in pull request #6323: [Feature] Support for cleaning the trash actively

Posted by GitBox <gi...@apache.org>.
BiteTheDDDDt commented on a change in pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323#discussion_r683097900



##########
File path: fe/fe-core/src/main/java/org/apache/doris/analysis/AdminCleanTrashStmt.java
##########
@@ -0,0 +1,73 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.catalog.Catalog;
+import org.apache.doris.common.AnalysisException;
+import org.apache.doris.common.ErrorCode;
+import org.apache.doris.common.ErrorReport;
+import org.apache.doris.qe.ConnectContext;
+import org.apache.doris.system.Backend;
+import org.apache.doris.mysql.privilege.PrivPredicate;
+
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Lists;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+public class AdminCleanTrashStmt extends DdlStmt {
+    private List<Backend> backends = Lists.newArrayList();
+
+    public AdminCleanTrashStmt(List<String> backends) {
+        ImmutableMap<Long, Backend> backendsInfo = Catalog.getCurrentSystemInfo().getIdToBackend();
+        Map<String, Long> backendsID = new HashMap<String, Long>();
+        for (Backend backend : backendsInfo.values()) {
+            backendsID.put(String.valueOf(backend.getHost()) + ":" + String.valueOf(backend.getHeartbeatPort()), backend.getId());
+        }
+        if (backends == null) {
+            for (Backend backend : backendsInfo.values()) {
+                this.backends.add(backend);
+            }
+        } else {
+            for (String backend : backends) {
+                if (backendsID.get(backend) != null) {
+                    this.backends.add(backendsInfo.get(backendsID.get(backend)));
+                    backendsID.remove(backend); // avoid repetition
+                }
+            }
+        }
+    }
+
+    public List<Backend> getBackends() {
+        return backends;
+    }
+
+    @Override
+    public void analyze(Analyzer analyzer) throws AnalysisException {
+        if (!Catalog.getCurrentCatalog().getAuth().checkGlobalPriv(ConnectContext.get(), PrivPredicate.ADMIN)) {
+            ErrorReport.reportAnalysisException(ErrorCode.ERR_SPECIFIC_ACCESS_DENIED_ERROR, "ADMIN");
+        }
+    }
+
+    @Override
+    public RedirectStatus getRedirectStatus() {
+        return RedirectStatus.FORWARD_WITH_SYNC;

Review comment:
       > 
   > 
   > Do we need to forward this stmt to master?
   
   This does not seem to modify the metadata, so I change it to NO_FORWARD at 9765d59




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #6323: [Feature] Support for cleaning the trash actively

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323#discussion_r685915863



##########
File path: be/src/olap/storage_engine.cpp
##########
@@ -685,6 +693,8 @@ OLAPStatus StorageEngine::_start_trash_sweep(double* usage) {
     // clean unused rowset metas in OlapMeta
     _clean_unused_rowset_metas();
 
+    _trash_sweep_lock.unlock();

Review comment:
       The method may be returned before you unlock this lock.
   you can use `src//util/mutex.h` to unlock automatically when deconstructing.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #6323: [Feature] Support for cleaning the trash actively

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323#issuecomment-887253209






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #6323: [Feature] Support for cleaning the trash actively

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #6323:
URL: https://github.com/apache/incubator-doris/pull/6323#discussion_r682783165



##########
File path: fe/fe-core/src/main/java/org/apache/doris/analysis/AdminCleanTrashStmt.java
##########
@@ -0,0 +1,73 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.catalog.Catalog;
+import org.apache.doris.common.AnalysisException;
+import org.apache.doris.common.ErrorCode;
+import org.apache.doris.common.ErrorReport;
+import org.apache.doris.qe.ConnectContext;
+import org.apache.doris.system.Backend;
+import org.apache.doris.mysql.privilege.PrivPredicate;
+
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Lists;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+public class AdminCleanTrashStmt extends DdlStmt {
+    private List<Backend> backends = Lists.newArrayList();
+
+    public AdminCleanTrashStmt(List<String> backends) {
+        ImmutableMap<Long, Backend> backendsInfo = Catalog.getCurrentSystemInfo().getIdToBackend();
+        Map<String, Long> backendsID = new HashMap<String, Long>();
+        for (Backend backend : backendsInfo.values()) {
+            backendsID.put(String.valueOf(backend.getHost()) + ":" + String.valueOf(backend.getHeartbeatPort()), backend.getId());
+        }
+        if (backends == null) {
+            for (Backend backend : backendsInfo.values()) {
+                this.backends.add(backend);
+            }
+        } else {
+            for (String backend : backends) {
+                if (backendsID.get(backend) != null) {
+                    this.backends.add(backendsInfo.get(backendsID.get(backend)));
+                    backendsID.remove(backend); // avoid repetition
+                }
+            }
+        }
+    }
+
+    public List<Backend> getBackends() {
+        return backends;
+    }
+
+    @Override
+    public void analyze(Analyzer analyzer) throws AnalysisException {
+        if (!Catalog.getCurrentCatalog().getAuth().checkGlobalPriv(ConnectContext.get(), PrivPredicate.ADMIN)) {
+            ErrorReport.reportAnalysisException(ErrorCode.ERR_SPECIFIC_ACCESS_DENIED_ERROR, "ADMIN");
+        }
+    }
+
+    @Override
+    public RedirectStatus getRedirectStatus() {
+        return RedirectStatus.FORWARD_WITH_SYNC;

Review comment:
       Do we need to forward this stmt to master?

##########
File path: be/src/service/backend_service.cpp
##########
@@ -339,4 +339,7 @@ void BackendService::get_stream_load_record(TStreamLoadRecordResult& result,
     }
 }
 
+void BackendService::clean_trash() {
+    StorageEngine::instance()->start_trash_sweep(nullptr, true);

Review comment:
       It may takes a very long time to clean the trash. So I suggest to use a async call.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org