You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "weizhengte (via GitHub)" <gi...@apache.org> on 2023/04/13 14:11:15 UTC

[GitHub] [doris] weizhengte opened a new pull request, #18653: [Optimization](statistics) optimize the problem of slow deletion of statistics .

weizhengte opened a new pull request, #18653:
URL: https://github.com/apache/doris/pull/18653

   This pr solves the problem of slow deletion of some statistical information, the statistical information update field is stored in the form of timestamp, which is convenient for later calculation. Some parameters are added to make it easier to use later in the code.
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   Describe your changes.
   
   ## Checklist(Required)
   
   * [ ] Does it affect the original behavior
   * [ ] Has unit tests been added
   * [ ] Has document been added or modified
   * [ ] Does it need to update dependencies
   * [ ] Is this PR support rollback (If NO, please explain WHY)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #18653: [Optimization](statistics) optimize the problem of slow deletion of statistics .

Posted by "hello-stephen (via GitHub)" <gi...@apache.org>.
hello-stephen commented on PR #18653:
URL: https://github.com/apache/doris/pull/18653#issuecomment-1507798338

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 33.57 seconds
    stream load tsv:          445 seconds loaded 74807831229 Bytes, about 160 MB/s
    stream load json:         24 seconds loaded 2358488459 Bytes, about 93 MB/s
    stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
    stream load parquet:          31 seconds loaded 861443392 Bytes, about 26 MB/s
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230414011929_clickbench_pr_128828.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18653: [Optimization](statistics) optimize Incremental statistics collection and statistics cleaning

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18653:
URL: https://github.com/apache/doris/pull/18653#issuecomment-1514055006

   Wait until the sampling statistics pr are comerged before modifying


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on a diff in pull request #18653: [Optimization](statistics) optimize the problem of slow deletion of statistics .

Posted by "Kikyou1997 (via GitHub)" <gi...@apache.org>.
Kikyou1997 commented on code in PR #18653:
URL: https://github.com/apache/doris/pull/18653#discussion_r1168393092


##########
fe/fe-core/src/main/java/org/apache/doris/statistics/StatisticsRepository.java:
##########
@@ -179,15 +205,7 @@ private static void dropStatistics(Long dbId,
 
         Map<String, String> params = new HashMap<>();
         params.put("condition", predicate.toString());
-        StringSubstitutor stringSubstitutor = new StringSubstitutor(params);
-
-        try {
-            String statement = isHistogram ? stringSubstitutor.replace(DROP_TABLE_HISTOGRAM_TEMPLATE) :
-                    stringSubstitutor.replace(DROP_TABLE_STATISTICS_TEMPLATE);
-            StatisticsUtil.execUpdate(statement);
-        } catch (Exception e) {
-            LOG.warn("Drop statistics failed", e);
-        }
+        return params;
     }
 
     private static <T> void buildPredicate(String fieldName, Set<T> fieldValues, StringBuilder predicate) {

Review Comment:
   you're right, thx



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on a diff in pull request #18653: [Optimization](statistics) optimize the problem of slow deletion of statistics .

Posted by "Kikyou1997 (via GitHub)" <gi...@apache.org>.
Kikyou1997 commented on code in PR #18653:
URL: https://github.com/apache/doris/pull/18653#discussion_r1166238028


##########
fe/fe-core/src/main/java/org/apache/doris/statistics/StatisticsRepository.java:
##########
@@ -179,15 +205,7 @@ private static void dropStatistics(Long dbId,
 
         Map<String, String> params = new HashMap<>();
         params.put("condition", predicate.toString());
-        StringSubstitutor stringSubstitutor = new StringSubstitutor(params);
-
-        try {
-            String statement = isHistogram ? stringSubstitutor.replace(DROP_TABLE_HISTOGRAM_TEMPLATE) :
-                    stringSubstitutor.replace(DROP_TABLE_STATISTICS_TEMPLATE);
-            StatisticsUtil.execUpdate(statement);
-        } catch (Exception e) {
-            LOG.warn("Drop statistics failed", e);
-        }
+        return params;
     }
 
     private static <T> void buildPredicate(String fieldName, Set<T> fieldValues, StringBuilder predicate) {

Review Comment:
   generic type doesn't make any sense



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18653: [Optimization](statistics) optimize the problem of slow deletion of statistics .

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18653:
URL: https://github.com/apache/doris/pull/18653#issuecomment-1507043036

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18653: [Optimization](statistics) optimize Incremental statistics collection and statistics cleaning

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18653:
URL: https://github.com/apache/doris/pull/18653#issuecomment-1511772671

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on a diff in pull request #18653: [Optimization](statistics) optimize the problem of slow deletion of statistics .

Posted by "Kikyou1997 (via GitHub)" <gi...@apache.org>.
Kikyou1997 commented on code in PR #18653:
URL: https://github.com/apache/doris/pull/18653#discussion_r1166232376


##########
fe/fe-common/src/main/java/org/apache/doris/common/Config.java:
##########
@@ -1695,6 +1695,24 @@ public class Config extends ConfigBase {
     @ConfField(mutable = true, masterOnly = true)
     public static int cbo_default_sample_percentage = 10;
 
+    /*
+     * if true, statistics will be updated automatically
+     */
+    @ConfField(mutable = true, masterOnly = true)
+    public static boolean enable_auto_collect_statistics = false;
+
+    /*
+     * collect statistics periodically can be stressful on the cluster
+     */
+    @ConfField(mutable = true, masterOnly = true)
+    public static boolean enable_period_collect_statistics = false;

Review Comment:
   What's diff between enable_period_collect_statistics and enable_auto_collect_statistics, besides these two options isn't used anywhere



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte closed pull request #18653: [Optimization](statistics) optimize Incremental statistics collection and statistics cleaning

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte closed pull request #18653: [Optimization](statistics)  optimize Incremental statistics collection and statistics cleaning
URL: https://github.com/apache/doris/pull/18653


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18653: [Optimization](statistics) optimize Incremental statistics collection and statistics cleaning

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18653:
URL: https://github.com/apache/doris/pull/18653#issuecomment-1512270142

   run p0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on a diff in pull request #18653: [Optimization](statistics) optimize the problem of slow deletion of statistics .

Posted by "Kikyou1997 (via GitHub)" <gi...@apache.org>.
Kikyou1997 commented on code in PR #18653:
URL: https://github.com/apache/doris/pull/18653#discussion_r1166228540


##########
fe/fe-core/src/main/java/org/apache/doris/statistics/AnalysisHelper.java:
##########
@@ -0,0 +1,47 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.statistics;
+
+import org.apache.doris.common.Config;
+import org.apache.doris.common.ThreadPoolManager;
+import org.apache.doris.common.ThreadPoolManager.BlockedPolicy;
+import org.apache.doris.common.util.MasterDaemon;
+
+import java.util.concurrent.Future;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.ThreadPoolExecutor;
+import java.util.concurrent.TimeUnit;
+
+
+/**
+ * Used to help collect statistics.
+ * TODO add other method
+ */
+public class AnalysisHelper extends MasterDaemon {

Review Comment:
   I thnk there is no need to add master daemon for this, you'd better simply mark let analyzemanager to do this



##########
fe/fe-core/src/main/java/org/apache/doris/statistics/AnalysisHelper.java:
##########
@@ -0,0 +1,47 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.statistics;
+
+import org.apache.doris.common.Config;
+import org.apache.doris.common.ThreadPoolManager;
+import org.apache.doris.common.ThreadPoolManager.BlockedPolicy;
+import org.apache.doris.common.util.MasterDaemon;
+
+import java.util.concurrent.Future;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.ThreadPoolExecutor;
+import java.util.concurrent.TimeUnit;
+
+
+/**
+ * Used to help collect statistics.
+ * TODO add other method
+ */
+public class AnalysisHelper extends MasterDaemon {

Review Comment:
   I thnk there is no need to add master daemon for this, you'd better simply let analyzemanager to do this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #18653: [Optimization](statistics) optimize the problem of slow deletion of statistics .

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on code in PR #18653:
URL: https://github.com/apache/doris/pull/18653#discussion_r1167808271


##########
fe/fe-core/src/main/java/org/apache/doris/statistics/StatisticsRepository.java:
##########
@@ -179,15 +205,7 @@ private static void dropStatistics(Long dbId,
 
         Map<String, String> params = new HashMap<>();
         params.put("condition", predicate.toString());
-        StringSubstitutor stringSubstitutor = new StringSubstitutor(params);
-
-        try {
-            String statement = isHistogram ? stringSubstitutor.replace(DROP_TABLE_HISTOGRAM_TEMPLATE) :
-                    stringSubstitutor.replace(DROP_TABLE_STATISTICS_TEMPLATE);
-            StatisticsUtil.execUpdate(statement);
-        } catch (Exception e) {
-            LOG.warn("Drop statistics failed", e);
-        }
+        return params;
     }
 
     private static <T> void buildPredicate(String fieldName, Set<T> fieldValues, StringBuilder predicate) {

Review Comment:
   fieldValues could be Set<Long> tblIds, or Set<String> colNames... all converted in this method



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on a diff in pull request #18653: [Optimization](statistics) optimize the problem of slow deletion of statistics .

Posted by "Kikyou1997 (via GitHub)" <gi...@apache.org>.
Kikyou1997 commented on code in PR #18653:
URL: https://github.com/apache/doris/pull/18653#discussion_r1166227599


##########
fe/fe-core/src/main/java/org/apache/doris/statistics/AnalysisHelper.java:
##########
@@ -0,0 +1,47 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.statistics;
+
+import org.apache.doris.common.Config;
+import org.apache.doris.common.ThreadPoolManager;
+import org.apache.doris.common.ThreadPoolManager.BlockedPolicy;
+import org.apache.doris.common.util.MasterDaemon;
+
+import java.util.concurrent.Future;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.ThreadPoolExecutor;
+import java.util.concurrent.TimeUnit;
+
+
+/**
+ * Used to help collect statistics.
+ * TODO add other method
+ */
+public class AnalysisHelper extends MasterDaemon {

Review Comment:
   This class name is a little confused, in convention we name a class as XXXHelper when it's stateless and just provide some procedural code



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18653: [Optimization](statistics) optimize Incremental statistics collection and statistics cleaning

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18653:
URL: https://github.com/apache/doris/pull/18653#issuecomment-1518913733

   run clickbench


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18653: [Optimization](statistics) optimize the problem of slow deletion of statistics .

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18653:
URL: https://github.com/apache/doris/pull/18653#issuecomment-1507778576

   run p0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on a diff in pull request #18653: [Optimization](statistics) optimize the problem of slow deletion of statistics .

Posted by "Kikyou1997 (via GitHub)" <gi...@apache.org>.
Kikyou1997 commented on code in PR #18653:
URL: https://github.com/apache/doris/pull/18653#discussion_r1166242504


##########
fe/fe-core/src/main/java/org/apache/doris/statistics/AnalysisManager.java:
##########
@@ -100,16 +98,24 @@ public void createAnalysisJob(AnalyzeStmt analyzeStmt) throws DdlException {
         Set<String> partitionNames = analyzeStmt.getPartitionNames();
         Map<Long, AnalysisTaskInfo> analysisTaskInfos = new HashMap<>();
         long jobId = Env.getCurrentEnv().getNextId();
+
         // If the analysis is not incremental, need to delete existing statistics.
-        // we cannot collect histograms incrementally and do not support it
+        // we cannot collect histograms incrementally and do not support it.
         if (!analyzeStmt.isIncrement && !analyzeStmt.isHistogram) {
             long dbId = analyzeStmt.getDbId();
             TableIf table = analyzeStmt.getTable();
             Set<Long> tblIds = Sets.newHashSet(table.getId());
             Set<Long> partIds = partitionNames.stream()
                     .map(p -> table.getPartition(p).getId())
                     .collect(Collectors.toSet());
-            StatisticsRepository.dropStatistics(dbId, tblIds, colNames, partIds);
+            try {
+                StatisticsRepository.markStatsAsDeleted(dbId, tblIds, colNames, partIds);
+            } catch (Exception e) {
+                throw new DdlException("Fail to mark statistics as deleted", e);
+            }
+            // Removing statistics maybe be slow and use asynchronous method
+            Env.getCurrentEnv().getAnalysisHelper().asyncExecute(() ->
+                    StatisticsRepository.dropStatistics(dbId, tblIds, colNames, partIds));

Review Comment:
   It's really puzzling here, if you has an insatiable craving for deletion here AND submit a async task to do this, then what's the necessary to update `column_statistics` table synchronously to mark a row should be deleted instead of let async task delete them directly



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte closed pull request #18653: [Optimization](statistics) optimize Incremental statistics collection and statistics cleaning

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte closed pull request #18653: [Optimization](statistics)  optimize Incremental statistics collection and statistics cleaning
URL: https://github.com/apache/doris/pull/18653


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on a diff in pull request #18653: [Optimization](statistics) optimize the problem of slow deletion of statistics .

Posted by "Kikyou1997 (via GitHub)" <gi...@apache.org>.
Kikyou1997 commented on code in PR #18653:
URL: https://github.com/apache/doris/pull/18653#discussion_r1166228013


##########
fe/fe-core/src/main/java/org/apache/doris/statistics/AnalysisHelper.java:
##########
@@ -0,0 +1,47 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.statistics;
+
+import org.apache.doris.common.Config;
+import org.apache.doris.common.ThreadPoolManager;
+import org.apache.doris.common.ThreadPoolManager.BlockedPolicy;
+import org.apache.doris.common.util.MasterDaemon;
+
+import java.util.concurrent.Future;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.ThreadPoolExecutor;
+import java.util.concurrent.TimeUnit;
+
+
+/**
+ * Used to help collect statistics.
+ * TODO add other method
+ */
+public class AnalysisHelper extends MasterDaemon {
+
+    private final ThreadPoolExecutor executors = ThreadPoolManager.newDaemonThreadPool(
+            Config.statistics_helper_running_task_num,

Review Comment:
   Rename this config option either



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18653: [Optimization](statistics) optimize the problem of slow deletion of statistics .

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18653:
URL: https://github.com/apache/doris/pull/18653#issuecomment-1510256470

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18653: [Optimization](statistics) optimize Incremental statistics collection and statistics cleaning

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18653:
URL: https://github.com/apache/doris/pull/18653#issuecomment-1512270276

   run clickbench


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18653: [Optimization](statistics) optimize Incremental statistics collection and statistics cleaning

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18653:
URL: https://github.com/apache/doris/pull/18653#issuecomment-1512397558

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18653: [Optimization](statistics) optimize Incremental statistics collection and statistics cleaning

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18653:
URL: https://github.com/apache/doris/pull/18653#issuecomment-1518938838

   run clickbench


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18653: [Optimization](statistics) optimize Incremental statistics collection and statistics cleaning

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18653:
URL: https://github.com/apache/doris/pull/18653#issuecomment-1518695460

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18653: [Optimization](statistics) optimize Incremental statistics collection and statistics cleaning

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18653:
URL: https://github.com/apache/doris/pull/18653#issuecomment-1511778257

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18653: [Optimization](statistics) optimize the problem of slow deletion of statistics .

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18653:
URL: https://github.com/apache/doris/pull/18653#issuecomment-1507050365

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18653: [Optimization](statistics) optimize Incremental statistics collection and statistics cleaning

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18653:
URL: https://github.com/apache/doris/pull/18653#issuecomment-1518916562

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18653: [Optimization](statistics) optimize Incremental statistics collection and statistics cleaning

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18653:
URL: https://github.com/apache/doris/pull/18653#issuecomment-1518934199

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #18653: [Optimization](statistics) optimize the problem of slow deletion of statistics .

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on code in PR #18653:
URL: https://github.com/apache/doris/pull/18653#discussion_r1167808271


##########
fe/fe-core/src/main/java/org/apache/doris/statistics/StatisticsRepository.java:
##########
@@ -179,15 +205,7 @@ private static void dropStatistics(Long dbId,
 
         Map<String, String> params = new HashMap<>();
         params.put("condition", predicate.toString());
-        StringSubstitutor stringSubstitutor = new StringSubstitutor(params);
-
-        try {
-            String statement = isHistogram ? stringSubstitutor.replace(DROP_TABLE_HISTOGRAM_TEMPLATE) :
-                    stringSubstitutor.replace(DROP_TABLE_STATISTICS_TEMPLATE);
-            StatisticsUtil.execUpdate(statement);
-        } catch (Exception e) {
-            LOG.warn("Drop statistics failed", e);
-        }
+        return params;
     }
 
     private static <T> void buildPredicate(String fieldName, Set<T> fieldValues, StringBuilder predicate) {

Review Comment:
   fieldValues could be `Set<Long> tblIds`, or `Set<String> colNames`... all converted in this method



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18653: [Optimization](statistics) optimize the problem of slow deletion of statistics .

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18653:
URL: https://github.com/apache/doris/pull/18653#issuecomment-1510521043

   > Further modifications and discussions are needed for this PR before it could be merged
   
   I have removed the code that is not part of this pr~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18653: [Optimization](statistics) optimize the problem of slow deletion of statistics .

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18653:
URL: https://github.com/apache/doris/pull/18653#issuecomment-1507779066

   run clickbench


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on a diff in pull request #18653: [Optimization](statistics) optimize the problem of slow deletion of statistics .

Posted by "Kikyou1997 (via GitHub)" <gi...@apache.org>.
Kikyou1997 commented on code in PR #18653:
URL: https://github.com/apache/doris/pull/18653#discussion_r1166233812


##########
fe/fe-core/src/main/java/org/apache/doris/statistics/AnalysisHelper.java:
##########
@@ -0,0 +1,47 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.statistics;
+
+import org.apache.doris.common.Config;
+import org.apache.doris.common.ThreadPoolManager;
+import org.apache.doris.common.ThreadPoolManager.BlockedPolicy;
+import org.apache.doris.common.util.MasterDaemon;
+
+import java.util.concurrent.Future;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.ThreadPoolExecutor;
+import java.util.concurrent.TimeUnit;
+
+
+/**
+ * Used to help collect statistics.
+ * TODO add other method
+ */
+public class AnalysisHelper extends MasterDaemon {
+
+    private final ThreadPoolExecutor executors = ThreadPoolManager.newDaemonThreadPool(
+            Config.statistics_helper_running_task_num,
+            Config.statistics_helper_running_task_num, 0,
+            TimeUnit.DAYS, new LinkedBlockingQueue<>(),
+            new BlockedPolicy("AnalysisHelper Executor", Integer.MAX_VALUE),
+            "AnalysisHelper Executor", true);
+
+    public Future<?> asyncExecute(Runnable runnable) {
+        return executors.submit(runnable);
+    }
+}

Review Comment:
   Seems this class is only partially implemented....it even didn't overwrite runAfterCatalogReady, what's the meaning of commit this unrelated code in this PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18653: [Optimization](statistics) optimize the problem of slow deletion of statistics .

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18653:
URL: https://github.com/apache/doris/pull/18653#issuecomment-1507418472

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18653: [Optimization](statistics) optimize the problem of slow deletion of statistics .

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18653:
URL: https://github.com/apache/doris/pull/18653#issuecomment-1510300454

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org