You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "weizhengte (via GitHub)" <gi...@apache.org> on 2023/04/19 00:49:01 UTC

[GitHub] [doris] weizhengte opened a new pull request, #18801: [Enchancement](statistics) Support sampling collection of statistics

weizhengte opened a new pull request, #18801:
URL: https://github.com/apache/doris/pull/18801

   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   Describe your changes.
   
   ## Checklist(Required)
   
   * [ ] Does it affect the original behavior
   * [ ] Has unit tests been added
   * [ ] Has document been added or modified
   * [ ] Does it need to update dependencies
   * [ ] Is this PR support rollback (If NO, please explain WHY)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1513966260

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1513963976

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1516175729

   run p0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1515685837

   run p0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on a diff in pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "Kikyou1997 (via GitHub)" <gi...@apache.org>.
Kikyou1997 commented on code in PR #18801:
URL: https://github.com/apache/doris/pull/18801#discussion_r1171022538


##########
fe/fe-core/src/main/java/org/apache/doris/statistics/AnalysisManager.java:
##########
@@ -111,42 +147,37 @@ public void createAnalysisJob(AnalyzeStmt analyzeStmt) throws DdlException {
         analysisTaskInfos.values().forEach(taskScheduler::schedule);
     }
 
-    private void persistAnalysisJob(String catalogName, String db, TableName tbl,
-            long jobId) throws DdlException {
+    private void persistAnalysisJob(AnalysisTaskInfoBuilder taskInfoBuilder) throws DdlException {
         try {
-            AnalysisTaskInfo analysisTaskInfo = new AnalysisTaskInfoBuilder().setJobId(
-                            jobId).setTaskId(-1)
-                    .setCatalogName(catalogName).setDbName(db)
-                    .setTblName(tbl.getTbl())
-                    .setJobType(JobType.MANUAL)
-                    .setAnalysisMethod(AnalysisMethod.FULL).setAnalysisType(AnalysisType.INDEX)
-                    .setScheduleType(ScheduleType.ONCE).build();
+            AnalysisTaskInfoBuilder jobInfoBuilder = taskInfoBuilder.deepCopy();
+            AnalysisTaskInfo analysisTaskInfo = jobInfoBuilder.setTaskId(-1).build();
             StatisticsRepository.persistAnalysisTask(analysisTaskInfo);
         } catch (Throwable t) {
             throw new DdlException(t.getMessage(), t);
         }
     }
 
-    private void createTaskForMVIdx(AnalyzeStmt analyzeStmt, String catalogName, String db, TableName tbl,
-            Map<Long, AnalysisTaskInfo> analysisTaskInfos, long jobId) throws DdlException {
-        if (!(analyzeStmt.isWholeTbl && analyzeStmt.getTable().getType().equals(TableType.OLAP))) {
-            return;
+    private void createTaskForMVIdx(TableIf table, AnalysisTaskInfoBuilder taskInfoBuilder,
+            Map<Long, AnalysisTaskInfo> analysisTaskInfos, AnalysisType analysisType) throws DdlException {
+        TableType type = table.getType();
+        if (analysisType != AnalysisType.INDEX || !type.equals(TableType.OLAP)) {
+            return; // not need to collect statistics for materialized view

Review Comment:
   move the comment above the `return` stmt



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "hello-stephen (via GitHub)" <gi...@apache.org>.
hello-stephen commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1514003708

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 34.24 seconds
    stream load tsv:          441 seconds loaded 74807831229 Bytes, about 161 MB/s
    stream load json:         22 seconds loaded 2358488459 Bytes, about 102 MB/s
    stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
    stream load parquet:          31 seconds loaded 861443392 Bytes, about 26 MB/s
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230419012336_clickbench_pr_130779.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on code in PR #18801:
URL: https://github.com/apache/doris/pull/18801#discussion_r1171012809


##########
fe/fe-core/src/main/java/org/apache/doris/statistics/AnalysisManager.java:
##########
@@ -111,42 +147,37 @@ public void createAnalysisJob(AnalyzeStmt analyzeStmt) throws DdlException {
         analysisTaskInfos.values().forEach(taskScheduler::schedule);
     }
 
-    private void persistAnalysisJob(String catalogName, String db, TableName tbl,
-            long jobId) throws DdlException {
+    private void persistAnalysisJob(AnalysisTaskInfoBuilder taskInfoBuilder) throws DdlException {
         try {
-            AnalysisTaskInfo analysisTaskInfo = new AnalysisTaskInfoBuilder().setJobId(
-                            jobId).setTaskId(-1)
-                    .setCatalogName(catalogName).setDbName(db)
-                    .setTblName(tbl.getTbl())
-                    .setJobType(JobType.MANUAL)
-                    .setAnalysisMethod(AnalysisMethod.FULL).setAnalysisType(AnalysisType.INDEX)
-                    .setScheduleType(ScheduleType.ONCE).build();
+            AnalysisTaskInfoBuilder jobInfoBuilder = taskInfoBuilder.deepCopy();
+            AnalysisTaskInfo analysisTaskInfo = jobInfoBuilder.setTaskId(-1).build();
             StatisticsRepository.persistAnalysisTask(analysisTaskInfo);
         } catch (Throwable t) {
             throw new DdlException(t.getMessage(), t);
         }
     }
 
-    private void createTaskForMVIdx(AnalyzeStmt analyzeStmt, String catalogName, String db, TableName tbl,
-            Map<Long, AnalysisTaskInfo> analysisTaskInfos, long jobId) throws DdlException {
-        if (!(analyzeStmt.isWholeTbl && analyzeStmt.getTable().getType().equals(TableType.OLAP))) {
-            return;
+    private void createTaskForMVIdx(TableIf table, AnalysisTaskInfoBuilder taskInfoBuilder,
+            Map<Long, AnalysisTaskInfo> analysisTaskInfos, AnalysisType analysisType) throws DdlException {
+        TableType type = table.getType();
+        if (analysisType != AnalysisType.INDEX || !type.equals(TableType.OLAP)) {
+            return; // not need to collect statistics for materialized view
         }
-        OlapTable olapTable = (OlapTable) analyzeStmt.getTable();
+
+        taskInfoBuilder.setAnalysisType(analysisType);
+        OlapTable olapTable = (OlapTable) table;
+
         try {
-            olapTable.readLock();
+            table.readLock();

Review Comment:
   yes, you're right. here is what I accidentally changed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1517078995

   run p1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on a diff in pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "Kikyou1997 (via GitHub)" <gi...@apache.org>.
Kikyou1997 commented on code in PR #18801:
URL: https://github.com/apache/doris/pull/18801#discussion_r1170991314


##########
fe/fe-core/src/main/cup/sql_parser.cup:
##########
@@ -2812,30 +2817,45 @@ show_create_reporitory_stmt ::=
 
 // analyze statment
 analyze_stmt ::=
-    KW_ANALYZE opt_sync:sync KW_TABLE table_name:tbl opt_col_list:cols opt_properties:properties
-    {:
-        boolean is_whole_tbl = (cols == null);
-        boolean is_histogram = false;
-        boolean is_increment = false;
-        RESULT = new AnalyzeStmt(tbl, sync, cols, properties, is_whole_tbl, is_histogram, is_increment);
-    :}
-    | KW_ANALYZE opt_sync:sync KW_INCREMENTAL KW_TABLE table_name:tbl opt_col_list:cols opt_partition_names:partitionNames opt_properties:properties
-    {:
-        boolean is_whole_tbl = (cols == null);
-        boolean is_histogram = false;
-        boolean is_increment = true;
-        RESULT = new AnalyzeStmt(tbl, sync, cols, properties, is_whole_tbl, is_histogram, is_increment);
-    :}
-    | KW_ANALYZE opt_sync:sync KW_TABLE table_name:tbl KW_UPDATE KW_HISTOGRAM KW_ON ident_list:cols opt_partition_names:partitionNames opt_properties:properties
+    // statistics
+    KW_ANALYZE opt_sync:sync KW_TABLE table_name:tbl opt_col_list:cols
+      opt_with_analysis_properties:withAnalysisProperties opt_properties:properties
     {:
-        boolean is_whole_tbl = false;
-        boolean is_histogram = true;
-        boolean is_increment = false;
-        RESULT = new AnalyzeStmt(tbl, sync, cols, properties, is_whole_tbl, is_histogram, is_increment);
+        if (properties == null) {
+            properties = Maps.newHashMap();
+        }
+        for (Map<String, String> property : withAnalysisProperties) {
+            properties.putAll(property);
+        }
+        if (!properties.containsKey("sync")) {
+            properties.put("sync", String.valueOf(sync));
+        }
+        // Rule: If no type is specified, see if there is a specified column
+        if (!properties.containsKey("analysis.type")) {
+            if ((cols == null)) {
+                properties.put("analysis.type", "INDEX");
+            } else {
+                properties.put("analysis.type", "COLUMN");
+            }
+        }
+        RESULT = new AnalyzeStmt(tbl, cols, properties);
     :}
-    | KW_ANALYZE opt_sync:sync KW_TABLE table_name:tbl KW_UPDATE KW_HISTOGRAM
+    // histogram
+    | KW_ANALYZE opt_sync:sync KW_TABLE table_name:tbl opt_col_list:cols KW_UPDATE KW_HISTOGRAM

Review Comment:
   I think `opt_sync:sync` here is redundant now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on a diff in pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "Kikyou1997 (via GitHub)" <gi...@apache.org>.
Kikyou1997 commented on code in PR #18801:
URL: https://github.com/apache/doris/pull/18801#discussion_r1171005344


##########
fe/fe-core/src/main/java/org/apache/doris/statistics/AnalysisManager.java:
##########
@@ -111,42 +147,37 @@ public void createAnalysisJob(AnalyzeStmt analyzeStmt) throws DdlException {
         analysisTaskInfos.values().forEach(taskScheduler::schedule);
     }
 
-    private void persistAnalysisJob(String catalogName, String db, TableName tbl,
-            long jobId) throws DdlException {
+    private void persistAnalysisJob(AnalysisTaskInfoBuilder taskInfoBuilder) throws DdlException {
         try {
-            AnalysisTaskInfo analysisTaskInfo = new AnalysisTaskInfoBuilder().setJobId(
-                            jobId).setTaskId(-1)
-                    .setCatalogName(catalogName).setDbName(db)
-                    .setTblName(tbl.getTbl())
-                    .setJobType(JobType.MANUAL)
-                    .setAnalysisMethod(AnalysisMethod.FULL).setAnalysisType(AnalysisType.INDEX)
-                    .setScheduleType(ScheduleType.ONCE).build();
+            AnalysisTaskInfoBuilder jobInfoBuilder = taskInfoBuilder.deepCopy();
+            AnalysisTaskInfo analysisTaskInfo = jobInfoBuilder.setTaskId(-1).build();
             StatisticsRepository.persistAnalysisTask(analysisTaskInfo);
         } catch (Throwable t) {
             throw new DdlException(t.getMessage(), t);
         }
     }
 
-    private void createTaskForMVIdx(AnalyzeStmt analyzeStmt, String catalogName, String db, TableName tbl,
-            Map<Long, AnalysisTaskInfo> analysisTaskInfos, long jobId) throws DdlException {
-        if (!(analyzeStmt.isWholeTbl && analyzeStmt.getTable().getType().equals(TableType.OLAP))) {
-            return;
+    private void createTaskForMVIdx(TableIf table, AnalysisTaskInfoBuilder taskInfoBuilder,
+            Map<Long, AnalysisTaskInfo> analysisTaskInfos, AnalysisType analysisType) throws DdlException {
+        TableType type = table.getType();
+        if (analysisType != AnalysisType.INDEX || !type.equals(TableType.OLAP)) {
+            return; // not need to collect statistics for materialized view
         }
-        OlapTable olapTable = (OlapTable) analyzeStmt.getTable();
+
+        taskInfoBuilder.setAnalysisType(analysisType);
+        OlapTable olapTable = (OlapTable) table;
+
         try {
-            olapTable.readLock();
+            table.readLock();

Review Comment:
   Thought `table` and `olapTable` natively points to same object, but it still looks quite wired to lock on `table` and release lock on `olapTable`



##########
fe/fe-core/src/main/java/org/apache/doris/statistics/AnalysisManager.java:
##########
@@ -111,42 +147,37 @@ public void createAnalysisJob(AnalyzeStmt analyzeStmt) throws DdlException {
         analysisTaskInfos.values().forEach(taskScheduler::schedule);
     }
 
-    private void persistAnalysisJob(String catalogName, String db, TableName tbl,
-            long jobId) throws DdlException {
+    private void persistAnalysisJob(AnalysisTaskInfoBuilder taskInfoBuilder) throws DdlException {
         try {
-            AnalysisTaskInfo analysisTaskInfo = new AnalysisTaskInfoBuilder().setJobId(
-                            jobId).setTaskId(-1)
-                    .setCatalogName(catalogName).setDbName(db)
-                    .setTblName(tbl.getTbl())
-                    .setJobType(JobType.MANUAL)
-                    .setAnalysisMethod(AnalysisMethod.FULL).setAnalysisType(AnalysisType.INDEX)
-                    .setScheduleType(ScheduleType.ONCE).build();
+            AnalysisTaskInfoBuilder jobInfoBuilder = taskInfoBuilder.deepCopy();
+            AnalysisTaskInfo analysisTaskInfo = jobInfoBuilder.setTaskId(-1).build();
             StatisticsRepository.persistAnalysisTask(analysisTaskInfo);
         } catch (Throwable t) {
             throw new DdlException(t.getMessage(), t);
         }
     }
 
-    private void createTaskForMVIdx(AnalyzeStmt analyzeStmt, String catalogName, String db, TableName tbl,
-            Map<Long, AnalysisTaskInfo> analysisTaskInfos, long jobId) throws DdlException {
-        if (!(analyzeStmt.isWholeTbl && analyzeStmt.getTable().getType().equals(TableType.OLAP))) {
-            return;
+    private void createTaskForMVIdx(TableIf table, AnalysisTaskInfoBuilder taskInfoBuilder,
+            Map<Long, AnalysisTaskInfo> analysisTaskInfos, AnalysisType analysisType) throws DdlException {
+        TableType type = table.getType();
+        if (analysisType != AnalysisType.INDEX || !type.equals(TableType.OLAP)) {
+            return; // not need to collect statistics for materialized view
         }
-        OlapTable olapTable = (OlapTable) analyzeStmt.getTable();
+
+        taskInfoBuilder.setAnalysisType(analysisType);
+        OlapTable olapTable = (OlapTable) table;
+
         try {
-            olapTable.readLock();
+            table.readLock();

Review Comment:
   Though  `table` and `olapTable` natively points to same object, but it still looks quite wired to lock on `table` and release lock on `olapTable`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on a diff in pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "Kikyou1997 (via GitHub)" <gi...@apache.org>.
Kikyou1997 commented on code in PR #18801:
URL: https://github.com/apache/doris/pull/18801#discussion_r1171015284


##########
fe/fe-core/src/main/java/org/apache/doris/statistics/AnalysisManager.java:
##########
@@ -91,16 +91,52 @@ public StatisticsCache getStatisticsCache() {
 
     // Each analyze stmt corresponding to an analysis job.
     public void createAnalysisJob(AnalyzeStmt analyzeStmt) throws DdlException {
+        Map<Long, AnalysisTaskInfo> analysisTaskInfos = new HashMap<>();
+        AnalysisTaskInfoBuilder taskInfoBuilder = new AnalysisTaskInfoBuilder();
+
+        long jobId = Env.getCurrentEnv().getNextId();
         String catalogName = analyzeStmt.getCatalogName();
         String db = analyzeStmt.getDBName();
+        TableIf table = analyzeStmt.getTable();
         TableName tbl = analyzeStmt.getTblName();
         StatisticsUtil.convertTableNameToObjects(tbl);
+        String tblName = tbl.getTbl();
         Set<String> colNames = analyzeStmt.getColumnNames();
-        Map<Long, AnalysisTaskInfo> analysisTaskInfos = new HashMap<>();
-        long jobId = Env.getCurrentEnv().getNextId();
-        createTaskForEachColumns(analyzeStmt, catalogName, db, tbl, colNames, analysisTaskInfos, jobId);
-        createTaskForMVIdx(analyzeStmt, catalogName, db, tbl, analysisTaskInfos, jobId);
-        persistAnalysisJob(catalogName, db, tbl, jobId);
+        int samplePercent = analyzeStmt.getSamplePercent();
+        int sampleRows = analyzeStmt.getSampleRows();
+        AnalysisType analysisType = analyzeStmt.getAnalysisType();
+
+        // set common properties
+        taskInfoBuilder.setJobId(jobId);
+        taskInfoBuilder.setCatalogName(catalogName);
+        taskInfoBuilder.setDbName(db);
+        taskInfoBuilder.setTblName(tblName);
+        taskInfoBuilder.setJobType(JobType.MANUAL);
+        taskInfoBuilder.setState(AnalysisState.PENDING);
+        taskInfoBuilder.setScheduleType(ScheduleType.ONCE);
+
+        if (samplePercent > 0 || sampleRows > 0) {

Review Comment:
   Better to add method in `AnalyzeStmt` to determine if is sample analyze



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1517074884

   run clickbench


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1514387355

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1515804075

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1517188157

   run p1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on code in PR #18801:
URL: https://github.com/apache/doris/pull/18801#discussion_r1171025272


##########
fe/fe-core/src/main/java/org/apache/doris/statistics/AnalysisManager.java:
##########
@@ -111,42 +147,37 @@ public void createAnalysisJob(AnalyzeStmt analyzeStmt) throws DdlException {
         analysisTaskInfos.values().forEach(taskScheduler::schedule);
     }
 
-    private void persistAnalysisJob(String catalogName, String db, TableName tbl,
-            long jobId) throws DdlException {
+    private void persistAnalysisJob(AnalysisTaskInfoBuilder taskInfoBuilder) throws DdlException {
         try {
-            AnalysisTaskInfo analysisTaskInfo = new AnalysisTaskInfoBuilder().setJobId(
-                            jobId).setTaskId(-1)
-                    .setCatalogName(catalogName).setDbName(db)
-                    .setTblName(tbl.getTbl())
-                    .setJobType(JobType.MANUAL)
-                    .setAnalysisMethod(AnalysisMethod.FULL).setAnalysisType(AnalysisType.INDEX)
-                    .setScheduleType(ScheduleType.ONCE).build();
+            AnalysisTaskInfoBuilder jobInfoBuilder = taskInfoBuilder.deepCopy();
+            AnalysisTaskInfo analysisTaskInfo = jobInfoBuilder.setTaskId(-1).build();
             StatisticsRepository.persistAnalysisTask(analysisTaskInfo);
         } catch (Throwable t) {
             throw new DdlException(t.getMessage(), t);
         }
     }
 
-    private void createTaskForMVIdx(AnalyzeStmt analyzeStmt, String catalogName, String db, TableName tbl,
-            Map<Long, AnalysisTaskInfo> analysisTaskInfos, long jobId) throws DdlException {
-        if (!(analyzeStmt.isWholeTbl && analyzeStmt.getTable().getType().equals(TableType.OLAP))) {
-            return;
+    private void createTaskForMVIdx(TableIf table, AnalysisTaskInfoBuilder taskInfoBuilder,
+            Map<Long, AnalysisTaskInfo> analysisTaskInfos, AnalysisType analysisType) throws DdlException {
+        TableType type = table.getType();
+        if (analysisType != AnalysisType.INDEX || !type.equals(TableType.OLAP)) {
+            return; // not need to collect statistics for materialized view

Review Comment:
   ok



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1515631580

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1516158376

   run p0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1515668731

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1513977866

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on code in PR #18801:
URL: https://github.com/apache/doris/pull/18801#discussion_r1171028185


##########
fe/fe-core/src/main/cup/sql_parser.cup:
##########
@@ -2812,30 +2817,45 @@ show_create_reporitory_stmt ::=
 
 // analyze statment
 analyze_stmt ::=
-    KW_ANALYZE opt_sync:sync KW_TABLE table_name:tbl opt_col_list:cols opt_properties:properties
-    {:
-        boolean is_whole_tbl = (cols == null);
-        boolean is_histogram = false;
-        boolean is_increment = false;
-        RESULT = new AnalyzeStmt(tbl, sync, cols, properties, is_whole_tbl, is_histogram, is_increment);
-    :}
-    | KW_ANALYZE opt_sync:sync KW_INCREMENTAL KW_TABLE table_name:tbl opt_col_list:cols opt_partition_names:partitionNames opt_properties:properties
-    {:
-        boolean is_whole_tbl = (cols == null);
-        boolean is_histogram = false;
-        boolean is_increment = true;
-        RESULT = new AnalyzeStmt(tbl, sync, cols, properties, is_whole_tbl, is_histogram, is_increment);
-    :}
-    | KW_ANALYZE opt_sync:sync KW_TABLE table_name:tbl KW_UPDATE KW_HISTOGRAM KW_ON ident_list:cols opt_partition_names:partitionNames opt_properties:properties
+    // statistics
+    KW_ANALYZE opt_sync:sync KW_TABLE table_name:tbl opt_col_list:cols
+      opt_with_analysis_properties:withAnalysisProperties opt_properties:properties
     {:
-        boolean is_whole_tbl = false;
-        boolean is_histogram = true;
-        boolean is_increment = false;
-        RESULT = new AnalyzeStmt(tbl, sync, cols, properties, is_whole_tbl, is_histogram, is_increment);
+        if (properties == null) {
+            properties = Maps.newHashMap();
+        }
+        for (Map<String, String> property : withAnalysisProperties) {
+            properties.putAll(property);
+        }
+        if (!properties.containsKey("sync")) {
+            properties.put("sync", String.valueOf(sync));
+        }
+        // Rule: If no type is specified, see if there is a specified column
+        if (!properties.containsKey("analysis.type")) {
+            if ((cols == null)) {
+                properties.put("analysis.type", "INDEX");
+            } else {
+                properties.put("analysis.type", "COLUMN");
+            }
+        }
+        RESULT = new AnalyzeStmt(tbl, cols, properties);
     :}
-    | KW_ANALYZE opt_sync:sync KW_TABLE table_name:tbl KW_UPDATE KW_HISTOGRAM
+    // histogram
+    | KW_ANALYZE opt_sync:sync KW_TABLE table_name:tbl opt_col_list:cols KW_UPDATE KW_HISTOGRAM

Review Comment:
   remove opt_sync:sync? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1515624061

   run p1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte closed pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte closed pull request #18801: [Enchancement](statistics) Support sampling collection of statistics
URL: https://github.com/apache/doris/pull/18801


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on a diff in pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "Kikyou1997 (via GitHub)" <gi...@apache.org>.
Kikyou1997 commented on code in PR #18801:
URL: https://github.com/apache/doris/pull/18801#discussion_r1171019840


##########
fe/fe-core/src/main/java/org/apache/doris/statistics/AnalysisTaskInfoBuilder.java:
##########
@@ -109,7 +127,29 @@ public AnalysisTaskInfoBuilder setScheduleType(ScheduleType scheduleType) {
     }
 
     public AnalysisTaskInfo build() {
-        return new AnalysisTaskInfo(jobId, taskId, catalogName, dbName, tblName, colName,
-                indexId, jobType, analysisMethod, analysisType, message, lastExecTimeInMs, state, scheduleType);
+        return new AnalysisTaskInfo(jobId, taskId, catalogName, dbName, tblName,
+                colName, indexId, jobType, analysisMethod, analysisType, samplePercent,
+                sampleRows, maxBucketNum, message, lastExecTimeInMs, state, scheduleType);
+    }
+
+    public AnalysisTaskInfoBuilder deepCopy() {

Review Comment:
   Since the fields of this class is just some enum and basic type, I think name it as `copy` is enough



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1515651507

   run p0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1516070194

   run clickbench


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on a diff in pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "Kikyou1997 (via GitHub)" <gi...@apache.org>.
Kikyou1997 commented on code in PR #18801:
URL: https://github.com/apache/doris/pull/18801#discussion_r1171008579


##########
fe/fe-core/src/main/java/org/apache/doris/statistics/AnalysisManager.java:
##########
@@ -91,16 +91,52 @@ public StatisticsCache getStatisticsCache() {
 
     // Each analyze stmt corresponding to an analysis job.
     public void createAnalysisJob(AnalyzeStmt analyzeStmt) throws DdlException {
+        Map<Long, AnalysisTaskInfo> analysisTaskInfos = new HashMap<>();
+        AnalysisTaskInfoBuilder taskInfoBuilder = new AnalysisTaskInfoBuilder();
+
+        long jobId = Env.getCurrentEnv().getNextId();
         String catalogName = analyzeStmt.getCatalogName();
         String db = analyzeStmt.getDBName();
+        TableIf table = analyzeStmt.getTable();
         TableName tbl = analyzeStmt.getTblName();
         StatisticsUtil.convertTableNameToObjects(tbl);
+        String tblName = tbl.getTbl();
         Set<String> colNames = analyzeStmt.getColumnNames();
-        Map<Long, AnalysisTaskInfo> analysisTaskInfos = new HashMap<>();
-        long jobId = Env.getCurrentEnv().getNextId();
-        createTaskForEachColumns(analyzeStmt, catalogName, db, tbl, colNames, analysisTaskInfos, jobId);
-        createTaskForMVIdx(analyzeStmt, catalogName, db, tbl, analysisTaskInfos, jobId);
-        persistAnalysisJob(catalogName, db, tbl, jobId);
+        int samplePercent = analyzeStmt.getSamplePercent();
+        int sampleRows = analyzeStmt.getSampleRows();
+        AnalysisType analysisType = analyzeStmt.getAnalysisType();
+
+        // set common properties

Review Comment:
   I think logic for init a base taskInfoBuilder could be extracted to a method to keep `createAnalysisJob` method clean



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on code in PR #18801:
URL: https://github.com/apache/doris/pull/18801#discussion_r1171010742


##########
fe/fe-core/src/main/java/org/apache/doris/statistics/AnalysisManager.java:
##########
@@ -91,16 +91,52 @@ public StatisticsCache getStatisticsCache() {
 
     // Each analyze stmt corresponding to an analysis job.
     public void createAnalysisJob(AnalyzeStmt analyzeStmt) throws DdlException {
+        Map<Long, AnalysisTaskInfo> analysisTaskInfos = new HashMap<>();
+        AnalysisTaskInfoBuilder taskInfoBuilder = new AnalysisTaskInfoBuilder();
+
+        long jobId = Env.getCurrentEnv().getNextId();
         String catalogName = analyzeStmt.getCatalogName();
         String db = analyzeStmt.getDBName();
+        TableIf table = analyzeStmt.getTable();
         TableName tbl = analyzeStmt.getTblName();
         StatisticsUtil.convertTableNameToObjects(tbl);
+        String tblName = tbl.getTbl();
         Set<String> colNames = analyzeStmt.getColumnNames();
-        Map<Long, AnalysisTaskInfo> analysisTaskInfos = new HashMap<>();
-        long jobId = Env.getCurrentEnv().getNextId();
-        createTaskForEachColumns(analyzeStmt, catalogName, db, tbl, colNames, analysisTaskInfos, jobId);
-        createTaskForMVIdx(analyzeStmt, catalogName, db, tbl, analysisTaskInfos, jobId);
-        persistAnalysisJob(catalogName, db, tbl, jobId);
+        int samplePercent = analyzeStmt.getSamplePercent();
+        int sampleRows = analyzeStmt.getSampleRows();
+        AnalysisType analysisType = analyzeStmt.getAnalysisType();
+
+        // set common properties

Review Comment:
   sure



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1517285303

   This pr rebase has been done several times, and the pipeline clickbench has been stuck (Can't connect to MySQL), using the new pr #18880


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on code in PR #18801:
URL: https://github.com/apache/doris/pull/18801#discussion_r1171028185


##########
fe/fe-core/src/main/cup/sql_parser.cup:
##########
@@ -2812,30 +2817,45 @@ show_create_reporitory_stmt ::=
 
 // analyze statment
 analyze_stmt ::=
-    KW_ANALYZE opt_sync:sync KW_TABLE table_name:tbl opt_col_list:cols opt_properties:properties
-    {:
-        boolean is_whole_tbl = (cols == null);
-        boolean is_histogram = false;
-        boolean is_increment = false;
-        RESULT = new AnalyzeStmt(tbl, sync, cols, properties, is_whole_tbl, is_histogram, is_increment);
-    :}
-    | KW_ANALYZE opt_sync:sync KW_INCREMENTAL KW_TABLE table_name:tbl opt_col_list:cols opt_partition_names:partitionNames opt_properties:properties
-    {:
-        boolean is_whole_tbl = (cols == null);
-        boolean is_histogram = false;
-        boolean is_increment = true;
-        RESULT = new AnalyzeStmt(tbl, sync, cols, properties, is_whole_tbl, is_histogram, is_increment);
-    :}
-    | KW_ANALYZE opt_sync:sync KW_TABLE table_name:tbl KW_UPDATE KW_HISTOGRAM KW_ON ident_list:cols opt_partition_names:partitionNames opt_properties:properties
+    // statistics
+    KW_ANALYZE opt_sync:sync KW_TABLE table_name:tbl opt_col_list:cols
+      opt_with_analysis_properties:withAnalysisProperties opt_properties:properties
     {:
-        boolean is_whole_tbl = false;
-        boolean is_histogram = true;
-        boolean is_increment = false;
-        RESULT = new AnalyzeStmt(tbl, sync, cols, properties, is_whole_tbl, is_histogram, is_increment);
+        if (properties == null) {
+            properties = Maps.newHashMap();
+        }
+        for (Map<String, String> property : withAnalysisProperties) {
+            properties.putAll(property);
+        }
+        if (!properties.containsKey("sync")) {
+            properties.put("sync", String.valueOf(sync));
+        }
+        // Rule: If no type is specified, see if there is a specified column
+        if (!properties.containsKey("analysis.type")) {
+            if ((cols == null)) {
+                properties.put("analysis.type", "INDEX");
+            } else {
+                properties.put("analysis.type", "COLUMN");
+            }
+        }
+        RESULT = new AnalyzeStmt(tbl, cols, properties);
     :}
-    | KW_ANALYZE opt_sync:sync KW_TABLE table_name:tbl KW_UPDATE KW_HISTOGRAM
+    // histogram
+    | KW_ANALYZE opt_sync:sync KW_TABLE table_name:tbl opt_col_list:cols KW_UPDATE KW_HISTOGRAM

Review Comment:
   I will sort out these syntaxs later, and discuss them again, and remove the redundant ones



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1515823518

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1515660140

   run p0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1516151577

   run p0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1515623580

   run p0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1516039907

   run clickbench


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on pull request #18801: [Enchancement](statistics) Support sampling collection of statistics

Posted by "weizhengte (via GitHub)" <gi...@apache.org>.
weizhengte commented on PR #18801:
URL: https://github.com/apache/doris/pull/18801#issuecomment-1517075913

   run arm


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org