You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/06/15 22:25:22 UTC

[GitHub] [incubator-doris] xinyiZzz opened a new pull request, #10170: [Enhancement] Support select table sample

xinyiZzz opened a new pull request, #10170:
URL: https://github.com/apache/incubator-doris/pull/10170

   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem Summary:
   
   ### Motivation
   TABLESAMPLE allows you to limit the number of rows from a table in the FROM clause.
   
   Used for data detection, quick verification of the accuracy of SQL, table statistics collection.
   
   ### Grammar
   ```
   [TABLET tids] TABLESAMPLE n [ROWS | PERCENT] [REPEATABLE seek]
   ```
   
   Limit the number of rows read from the table in the FROM clause, 
   select a number of Tablets pseudo-randomly from the table according to the specified number of rows or percentages, 
   and specify the number of seeds in REPEATABLE to return the selected samples again. 
   In addition, can also manually specify the TableID, 
   Note that this can only be used for OLAP tables.
   
   ### Example
   ```
   SELECT * FROM t1 TABLET(10001) TABLESAMPLE(1000 ROWS) REPEATABLE (2) limit 1000;
   ```
   
   Pseudo-randomly sample 1000 rows in t1.
   Note that several Tablets are actually selected according to the statistics of the table, 
   and the total number of selected Tablet rows may be greater than 1000, 
   so if you want to explicitly return 1000 rows, you need to add Limit.
   
   ### Design
   First, determine how many rows to sample from each partition according to the number of partitions.
   Then determine the number of Tablets to be selected for each partition according to the average number of rows of Tablet,
   If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
   If seek is specified, it will be selected sequentially from the seek tablet of the partition.
   And add the manually specified Tablet id to the selected Tablet.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
morrySnow commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r990772731


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableRef.java:
##########
@@ -197,17 +205,49 @@ protected TableRef(TableRef other) {
                 lateralViewRefs.add((LateralViewRef) viewRef.clone());
             }
         }
+        if (other.sampleTabletIds.size() != 0) {

Review Comment:
   why not copy if size is 0?



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;
+    }
+
+    public TableSample(boolean isPercent, Long sampleValue, Long seek) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = seek;
+    }
+
+    public TableSample(TableSample other) {
+        this.sampleValue = other.sampleValue;
+        this.isPercent = other.isPercent;
+        this.seek = other.seek;
+    }
+
+    public Long getSampleValue() {
+        return sampleValue;
+    }
+
+    public boolean isPercent() {
+        return isPercent;
+    }
+
+    public Long getSeek() {
+        return seek;
+    }
+
+    @Override
+    public void analyze(Analyzer analyzer) throws AnalysisException {
+        if (sampleValue <= 0 || (isPercent && sampleValue > 100)) {
+            throw new AnalysisException("table sample value must be greater than 0, percent need less than 100.");
+        }
+    }
+
+    @Override
+    public String toSql() {
+        if (sampleValue == null) {
+            return "";
+        }
+        StringBuilder sb = new StringBuilder();
+        sb.append("TABLESAMPLE ( ");
+        sb.append(sampleValue);
+        if (isPercent) {
+            sb.append(" PERCENT ");
+        } else {
+            sb.append(" ROWS ");
+        }
+        sb.append(")");
+        if (seek != 0) {
+            sb.append(" REPEATABLE ");
+            sb.append(seek);

Review Comment:
   miss parentheses



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableRef.java:
##########
@@ -160,6 +161,12 @@ public TableRef(TableName name, String alias, PartitionNames partitionNames, Arr
             hasExplicitAlias = false;
         }
         this.partitionNames = partitionNames;
+        if (sampleTabletIds != null) {
+            this.sampleTabletIds = sampleTabletIds;
+        }
+        if (tableSample != null) {

Review Comment:
   i think zhengte is right, even if tableSample is null, it has no side effect.



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TupleDescriptor.java:
##########
@@ -159,6 +164,80 @@ public void setTable(TableIf tbl) {
         table = tbl;
     }
 
+    public Set<Long> getSampleTabletIds() {
+        return sampleTabletIds;
+    }
+
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds(List<Long> tabletIds, TableSample tableSample) {
+        if (table.getType() != TableType.OLAP) {

Review Comment:
   i agree, do it during analysis



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TupleDescriptor.java:
##########
@@ -159,6 +164,84 @@ public void setTable(TableIf tbl) {
         table = tbl;
     }
 
+    public Set<Long> getSampleTabletIds() {
+        return sampleTabletIds;
+    }
+
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds(List<Long> tabletIds, TableSample tableSample) {

Review Comment:
   why impl this function in TupleDescriptor? since TupleDescriptor is use for describe a tuple, this is very weired.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
weizhengte commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r991830292


##########
fe/fe-core/src/main/java/org/apache/doris/planner/OlapScanNode.java:
##########
@@ -756,6 +764,82 @@ private void getScanRangeLocations() throws UserException {
         LOG.debug("distribution prune cost: {} ms", (System.currentTimeMillis() - start));
     }
 
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds() throws UserException {
+        if (desc.getTable().getType() != TableIf.TableType.OLAP) {
+            throw new AnalysisException("Sample table type " + desc.getTable().getType() + " is not OLAP");

Review Comment:
   It would be better to check the table type when analyzing the statement.
   
   Also, if `computeSampleTabletIds()` throws an exception here, will it affect normal queries? For example, for normal queries, when querying non-OLAP tables, will `init()` have exceptions? I think it would be better for non-sampling queries or non-OLAP tables to return directly, or to judge whether it is a sampling query first.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r994432235


##########
fe/fe-core/src/main/java/org/apache/doris/planner/OlapScanNode.java:
##########
@@ -756,6 +763,79 @@ private void getScanRangeLocations() throws UserException {
         LOG.debug("distribution prune cost: {} ms", (System.currentTimeMillis() - start));
     }
 
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds() {
+        if (desc.getSampleTabletIds() != null) {

Review Comment:
   done~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
weizhengte commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r991818187


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;

Review Comment:
   > Can we use `private final Long seek = -1;` instead of assigning a value here?
   
   I didn't pay attention at first and added final
   
   > sry, I meant to give an initial value, which means `private Long seek = -1;`, and remove `this.seek = -1L;`, It looks better. but it doesn't matter much, you can ignore it~
   
   I mean no `final`,  and you can also ignore it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r982542006


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableRef.java:
##########
@@ -645,27 +684,6 @@ public String tableRefToDigest() {
         return tableRefToSql();
     }
 
-    @Override
-    public String toSql() {

Review Comment:
   dtto~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
Kikyou1997 commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r982172398


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableRef.java:
##########
@@ -645,27 +684,6 @@ public String tableRefToDigest() {
         return tableRefToSql();
     }
 
-    @Override
-    public String toSql() {

Review Comment:
   dtto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r990643132


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;

Review Comment:
   seek is `final`, cannot be reassigned after initialization.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r982558867


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TupleDescriptor.java:
##########
@@ -159,6 +164,80 @@ public void setTable(TableIf tbl) {
         table = tbl;
     }
 
+    public Set<Long> getSampleTabletIds() {
+        return sampleTabletIds;
+    }
+
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.

Review Comment:
   The default seek 0, selects the first n tablets.
   
   If the user wants to randomly sample multiple times, select different tablet ids, and hope that the results of each random sampling can be reproduced, seek is required.
   
   If the seek is the same, the selected tablet is also the same.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #10170:
URL: https://github.com/apache/doris/pull/10170#issuecomment-1261659755

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
weizhengte commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r991830292


##########
fe/fe-core/src/main/java/org/apache/doris/planner/OlapScanNode.java:
##########
@@ -756,6 +764,82 @@ private void getScanRangeLocations() throws UserException {
         LOG.debug("distribution prune cost: {} ms", (System.currentTimeMillis() - start));
     }
 
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds() throws UserException {
+        if (desc.getTable().getType() != TableIf.TableType.OLAP) {
+            throw new AnalysisException("Sample table type " + desc.getTable().getType() + " is not OLAP");

Review Comment:
   Moving this part code to the analysis phase, it means that checking the table type in the `analyze()` of `TableSample`.
   
   Also, if `computeSampleTabletIds()` throws an exception here, will it affect normal queries? For example, for normal queries, when querying non-OLAP tables, will `init()` have exceptions? I think it would be better for non-sampling queries or non-OLAP tables to return directly, or to judge whether it is a sampling query first.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r991268951


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TupleDescriptor.java:
##########
@@ -159,6 +164,84 @@ public void setTable(TableIf tbl) {
         table = tbl;
     }
 
+    public Set<Long> getSampleTabletIds() {
+        return sampleTabletIds;
+    }
+
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds(List<Long> tabletIds, TableSample tableSample) {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
Kikyou1997 commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r982254090


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TupleDescriptor.java:
##########
@@ -159,6 +164,80 @@ public void setTable(TableIf tbl) {
         table = tbl;
     }
 
+    public Set<Long> getSampleTabletIds() {
+        return sampleTabletIds;
+    }
+
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds(List<Long> tabletIds, TableSample tableSample) {
+        if (table.getType() != TableType.OLAP) {
+            return;
+        }
+        sampleTabletIds.addAll(tabletIds);
+        if (tableSample == null) {
+            return;
+        }
+        OlapTable olapTable = (OlapTable) table;
+        long sampleRows; // The total number of sample rows
+        long hitRows = 1; // The total number of rows hit by the tablet
+        long totalRows = 0; // The total number of partition rows hit
+        long totalTablet = 0; // The total number of tablets in the hit partition
+        if (tableSample.isPercent()) {
+            sampleRows = (long) Math.max(olapTable.getRowCount() * (tableSample.getSampleValues() / 100.0), 1);
+        } else {
+            sampleRows = Math.max(tableSample.getSampleValues(), 1);

Review Comment:
   Since sample rows has been promised to be a positive integer, I think this `max` call is redandunt



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r992321368


##########
fe/fe-core/src/main/java/org/apache/doris/planner/OlapScanNode.java:
##########
@@ -756,6 +764,82 @@ private void getScanRangeLocations() throws UserException {
         LOG.debug("distribution prune cost: {} ms", (System.currentTimeMillis() - start));
     }
 
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds() throws UserException {
+        if (desc.getTable().getType() != TableIf.TableType.OLAP) {
+            throw new AnalysisException("Sample table type " + desc.getTable().getType() + " is not OLAP");

Review Comment:
   Good suggestion, I moved the check whether the type of the table is OLAP to `BaseTableRef::analyze`, 
   together with analyze for other syntaxes of `from`, such as join, hint.



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;
+    }
+
+    public TableSample(boolean isPercent, Long sampleValue, Long seek) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = seek;
+    }
+
+    public TableSample(TableSample other) {
+        this.sampleValue = other.sampleValue;
+        this.isPercent = other.isPercent;
+        this.seek = other.seek;
+    }
+
+    public Long getSampleValue() {
+        return sampleValue;
+    }
+
+    public boolean isPercent() {
+        return isPercent;
+    }
+
+    public Long getSeek() {
+        return seek;
+    }
+
+    @Override
+    public void analyze(Analyzer analyzer) throws AnalysisException {

Review Comment:
   Ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on PR #10170:
URL: https://github.com/apache/doris/pull/10170#issuecomment-1273290477

   @morrySnow @weizhengte PTAL~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
weizhengte commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r991831818


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;
+    }
+
+    public TableSample(boolean isPercent, Long sampleValue, Long seek) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = seek;
+    }
+
+    public TableSample(TableSample other) {
+        this.sampleValue = other.sampleValue;
+        this.isPercent = other.isPercent;
+        this.seek = other.seek;
+    }
+
+    public Long getSampleValue() {
+        return sampleValue;
+    }
+
+    public boolean isPercent() {
+        return isPercent;
+    }
+
+    public Long getSeek() {
+        return seek;
+    }
+
+    @Override
+    public void analyze(Analyzer analyzer) throws AnalysisException {

Review Comment:
   The table for sampling query should be OLAP, check here or where relevant statements are analyzed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r991268951


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TupleDescriptor.java:
##########
@@ -159,6 +164,84 @@ public void setTable(TableIf tbl) {
         table = tbl;
     }
 
+    public Set<Long> getSampleTabletIds() {
+        return sampleTabletIds;
+    }
+
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds(List<Long> tabletIds, TableSample tableSample) {

Review Comment:
   done, I moved `computeSampleTabletIds` to `analysis::olapScanNode`.



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TupleDescriptor.java:
##########
@@ -159,6 +164,80 @@ public void setTable(TableIf tbl) {
         table = tbl;
     }
 
+    public Set<Long> getSampleTabletIds() {
+        return sampleTabletIds;
+    }
+
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds(List<Long> tabletIds, TableSample tableSample) {
+        if (table.getType() != TableType.OLAP) {

Review Comment:
   done, I moved `computeSampleTabletIds` to `analysis::olapScanNode`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r982555655


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TupleDescriptor.java:
##########
@@ -159,6 +164,80 @@ public void setTable(TableIf tbl) {
         table = tbl;
     }
 
+    public Set<Long> getSampleTabletIds() {
+        return sampleTabletIds;
+    }
+
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds(List<Long> tabletIds, TableSample tableSample) {
+        if (table.getType() != TableType.OLAP) {
+            return;
+        }
+        sampleTabletIds.addAll(tabletIds);
+        if (tableSample == null) {
+            return;
+        }
+        OlapTable olapTable = (OlapTable) table;
+        long sampleRows; // The total number of sample rows
+        long hitRows = 1; // The total number of rows hit by the tablet
+        long totalRows = 0; // The total number of partition rows hit
+        long totalTablet = 0; // The total number of tablets in the hit partition
+        if (tableSample.isPercent()) {
+            sampleRows = (long) Math.max(olapTable.getRowCount() * (tableSample.getSampleValues() / 100.0), 1);
+        } else {
+            sampleRows = Math.max(tableSample.getSampleValues(), 1);

Review Comment:
   `sampleValue` may be equal to 0, here make sure it is greater than 1,
   It seems more reasonable to report syntax errors less than 1?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r990933566


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableRef.java:
##########
@@ -197,17 +205,49 @@ protected TableRef(TableRef other) {
                 lateralViewRefs.add((LateralViewRef) viewRef.clone());
             }
         }
+        if (other.sampleTabletIds.size() != 0) {

Review Comment:
   There is a problem, I removed this if



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
weizhengte commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r990726564


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;
+    }
+
+    public TableSample(boolean isPercent, Long sampleValue, Long seek) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = seek;
+    }
+
+    public TableSample(TableSample other) {
+        this.sampleValue = other.sampleValue;
+        this.isPercent = other.isPercent;
+        this.seek = other.seek;
+    }
+
+    public Long getSampleValue() {
+        return sampleValue;
+    }
+
+    public boolean isPercent() {
+        return isPercent;
+    }
+
+    public Long getSeek() {
+        return seek;
+    }
+
+    @Override
+    public void analyze(Analyzer analyzer) throws AnalysisException {
+        if (sampleValue <= 0 || (isPercent && sampleValue > 100)) {
+            throw new AnalysisException("table sample value must be greater than 0");

Review Comment:
   It would be better to clarify the reason for the exception~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
weizhengte commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r990726467


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TupleDescriptor.java:
##########
@@ -159,6 +164,80 @@ public void setTable(TableIf tbl) {
         table = tbl;
     }
 
+    public Set<Long> getSampleTabletIds() {
+        return sampleTabletIds;
+    }
+
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds(List<Long> tabletIds, TableSample tableSample) {
+        if (table.getType() != TableType.OLAP) {
+            return;
+        }
+        sampleTabletIds.addAll(tabletIds);
+        if (tableSample == null) {
+            return;
+        }
+        OlapTable olapTable = (OlapTable) table;
+        long sampleRows; // The total number of sample rows
+        long hitRows = 1; // The total number of rows hit by the tablet
+        long totalRows = 0; // The total number of partition rows hit
+        long totalTablet = 0; // The total number of tablets in the hit partition
+        if (tableSample.isPercent()) {
+            sampleRows = (long) Math.max(olapTable.getRowCount() * (tableSample.getSampleValues() / 100.0), 1);
+        } else {
+            sampleRows = Math.max(tableSample.getSampleValues(), 1);
+        }
+
+        // calculate the number of tablets by each partition
+        long avgRowsPerPartition = sampleRows / Math.max(olapTable.getPartitions().size(), 1);
+
+        for (Partition p : olapTable.getPartitions()) {
+            List<Long> ids = p.getBaseIndex().getTabletIdsInOrder();
+
+            if (ids.isEmpty()) {
+                continue;
+            }
+
+            if (p.getBaseIndex().getRowCount() < (avgRowsPerPartition / 2)) {

Review Comment:
   ok



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
weizhengte commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r990726393


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TupleDescriptor.java:
##########
@@ -159,6 +164,80 @@ public void setTable(TableIf tbl) {
         table = tbl;
     }
 
+    public Set<Long> getSampleTabletIds() {
+        return sampleTabletIds;
+    }
+
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds(List<Long> tabletIds, TableSample tableSample) {
+        if (table.getType() != TableType.OLAP) {

Review Comment:
   You can check the type of the table during the analysis phase.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
weizhengte commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r991818187


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;

Review Comment:
   > sry, I meant to give an initial value, which means `private Long seek = -1;`, and remove `this.seek = -1L;`, It looks better. but it doesn't matter much, you can ignore it~
   
   I mean no `final` and you can also ignore it~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow commented on pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
morrySnow commented on PR #10170:
URL: https://github.com/apache/doris/pull/10170#issuecomment-1260614015

   @weizhengte PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
weizhengte commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r983052579


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;

Review Comment:
   Can we use `private final Long seek = -1;` instead of assigning a value here?



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableRef.java:
##########
@@ -160,6 +161,12 @@ public TableRef(TableName name, String alias, PartitionNames partitionNames, Arr
             hasExplicitAlias = false;
         }
         this.partitionNames = partitionNames;
+        if (sampleTabletIds != null) {
+            this.sampleTabletIds = sampleTabletIds;
+        }
+        if (tableSample != null) {

Review Comment:
   Is this OK without the if?



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;
+    }
+
+    public TableSample(boolean isPercent, Long sampleValue, Long seek) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = seek;
+    }
+
+    public TableSample(TableSample other) {
+        this.sampleValue = other.sampleValue;
+        this.isPercent = other.isPercent;
+        this.seek = other.seek;
+    }
+
+    public Long getSampleValues() {
+        return sampleValue;
+    }
+
+    public boolean isPercent() {
+        return isPercent;
+    }
+
+    public Long getSeek() {
+        return seek;
+    }
+
+    @Override
+    public void analyze(Analyzer analyzer) throws AnalysisException {
+        if (sampleValue <= 0) {

Review Comment:
   Can sampleValue be greater than 100 when isPercent is true?



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;
+    }
+
+    public TableSample(boolean isPercent, Long sampleValue, Long seek) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = seek;
+    }
+
+    public TableSample(TableSample other) {
+        this.sampleValue = other.sampleValue;
+        this.isPercent = other.isPercent;
+        this.seek = other.seek;
+    }
+
+    public Long getSampleValues() {

Review Comment:
   getSampleValues -> getSampleValue?



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TupleDescriptor.java:
##########
@@ -159,6 +164,80 @@ public void setTable(TableIf tbl) {
         table = tbl;
     }
 
+    public Set<Long> getSampleTabletIds() {
+        return sampleTabletIds;
+    }
+
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds(List<Long> tabletIds, TableSample tableSample) {
+        if (table.getType() != TableType.OLAP) {

Review Comment:
   Should we throw an exception to tell the user that other table type cannot be sampled?



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TupleDescriptor.java:
##########
@@ -159,6 +164,80 @@ public void setTable(TableIf tbl) {
         table = tbl;
     }
 
+    public Set<Long> getSampleTabletIds() {
+        return sampleTabletIds;
+    }
+
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds(List<Long> tabletIds, TableSample tableSample) {
+        if (table.getType() != TableType.OLAP) {
+            return;
+        }
+        sampleTabletIds.addAll(tabletIds);
+        if (tableSample == null) {
+            return;
+        }
+        OlapTable olapTable = (OlapTable) table;
+        long sampleRows; // The total number of sample rows
+        long hitRows = 1; // The total number of rows hit by the tablet
+        long totalRows = 0; // The total number of partition rows hit
+        long totalTablet = 0; // The total number of tablets in the hit partition
+        if (tableSample.isPercent()) {
+            sampleRows = (long) Math.max(olapTable.getRowCount() * (tableSample.getSampleValues() / 100.0), 1);

Review Comment:
   As mentioned above, is it possible that sampleValue is greater than 100 here?



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TupleDescriptor.java:
##########
@@ -159,6 +164,80 @@ public void setTable(TableIf tbl) {
         table = tbl;
     }
 
+    public Set<Long> getSampleTabletIds() {
+        return sampleTabletIds;
+    }
+
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds(List<Long> tabletIds, TableSample tableSample) {
+        if (table.getType() != TableType.OLAP) {
+            return;
+        }
+        sampleTabletIds.addAll(tabletIds);
+        if (tableSample == null) {
+            return;
+        }
+        OlapTable olapTable = (OlapTable) table;
+        long sampleRows; // The total number of sample rows
+        long hitRows = 1; // The total number of rows hit by the tablet
+        long totalRows = 0; // The total number of partition rows hit
+        long totalTablet = 0; // The total number of tablets in the hit partition
+        if (tableSample.isPercent()) {
+            sampleRows = (long) Math.max(olapTable.getRowCount() * (tableSample.getSampleValues() / 100.0), 1);
+        } else {
+            sampleRows = Math.max(tableSample.getSampleValues(), 1);
+        }
+
+        // calculate the number of tablets by each partition
+        long avgRowsPerPartition = sampleRows / Math.max(olapTable.getPartitions().size(), 1);
+
+        for (Partition p : olapTable.getPartitions()) {
+            List<Long> ids = p.getBaseIndex().getTabletIdsInOrder();
+
+            if (ids.isEmpty()) {
+                continue;
+            }
+
+            if (p.getBaseIndex().getRowCount() < (avgRowsPerPartition / 2)) {

Review Comment:
   Why `avgRowsPerPartition / 2`, and will it cause totalRows to be less than sampleRows?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r990653641


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TupleDescriptor.java:
##########
@@ -159,6 +164,80 @@ public void setTable(TableIf tbl) {
         table = tbl;
     }
 
+    public Set<Long> getSampleTabletIds() {
+        return sampleTabletIds;
+    }
+
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds(List<Long> tabletIds, TableSample tableSample) {
+        if (table.getType() != TableType.OLAP) {

Review Comment:
   I'm not sure the exception is properly catch externally, but I'll add the log



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r990652999


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TupleDescriptor.java:
##########
@@ -159,6 +164,80 @@ public void setTable(TableIf tbl) {
         table = tbl;
     }
 
+    public Set<Long> getSampleTabletIds() {
+        return sampleTabletIds;
+    }
+
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds(List<Long> tabletIds, TableSample tableSample) {
+        if (table.getType() != TableType.OLAP) {
+            return;
+        }
+        sampleTabletIds.addAll(tabletIds);
+        if (tableSample == null) {
+            return;
+        }
+        OlapTable olapTable = (OlapTable) table;
+        long sampleRows; // The total number of sample rows
+        long hitRows = 1; // The total number of rows hit by the tablet
+        long totalRows = 0; // The total number of partition rows hit
+        long totalTablet = 0; // The total number of tablets in the hit partition
+        if (tableSample.isPercent()) {
+            sampleRows = (long) Math.max(olapTable.getRowCount() * (tableSample.getSampleValues() / 100.0), 1);
+        } else {
+            sampleRows = Math.max(tableSample.getSampleValues(), 1);
+        }
+
+        // calculate the number of tablets by each partition
+        long avgRowsPerPartition = sampleRows / Math.max(olapTable.getPartitions().size(), 1);
+
+        for (Partition p : olapTable.getPartitions()) {
+            List<Long> ids = p.getBaseIndex().getTabletIdsInOrder();
+
+            if (ids.isEmpty()) {
+                continue;
+            }
+
+            if (p.getBaseIndex().getRowCount() < (avgRowsPerPartition / 2)) {

Review Comment:
   `totalRows < sampleRows` exists
   
   Skip partitions with `row count < row count / 2 expected to be sampled per partition`. It can be expected to sample a smaller number of partitions to avoid uneven distribution of sampling results.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
weizhengte commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r990726668


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;

Review Comment:
   sry, I meant to give an initial value, which means `private Long seek = -1;`, and remove `this.seek = -1L;`, It looks better. but it doesn't matter much, you can ignore it~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
weizhengte commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r991831818


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;
+    }
+
+    public TableSample(boolean isPercent, Long sampleValue, Long seek) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = seek;
+    }
+
+    public TableSample(TableSample other) {
+        this.sampleValue = other.sampleValue;
+        this.isPercent = other.isPercent;
+        this.seek = other.seek;
+    }
+
+    public Long getSampleValue() {
+        return sampleValue;
+    }
+
+    public boolean isPercent() {
+        return isPercent;
+    }
+
+    public Long getSeek() {
+        return seek;
+    }
+
+    @Override
+    public void analyze(Analyzer analyzer) throws AnalysisException {

Review Comment:
   Do we need to check the table type here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
weizhengte commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r991818187


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;

Review Comment:
   > sry, I meant to give an initial value, which means `private Long seek = -1;`, and remove `this.seek = -1L;`, It looks better. but it doesn't matter much, you can ignore it~
   
   I mean no `final`,  and you can also ignore it(I didn't pay attention at first and added final)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r982540299


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableRef.java:
##########
@@ -79,55 +79,48 @@
 public class TableRef implements ParseNode, Writable {
     private static final Logger LOG = LogManager.getLogger(TableRef.class);
     protected TableName name;
-    private PartitionNames partitionNames = null;
-
     // Legal aliases of this table ref. Contains the explicit alias as its sole element if
     // there is one. Otherwise, contains the two implicit aliases. Implicit aliases are set
     // in the c'tor of the corresponding resolved table ref (subclasses of TableRef) during
     // analysis. By convention, for table refs with multiple implicit aliases, aliases_[0]
     // contains the fully-qualified implicit alias to ensure that aliases_[0] always
     // uniquely identifies this table ref regardless of whether it has an explicit alias.
     protected String[] aliases;
-
+    protected List<Long> sampleTabletIds = Lists.newArrayList();
     // Indicates whether this table ref is given an explicit alias,
     protected boolean hasExplicitAlias;
-
     protected JoinOperator joinOp;
     protected List<String> usingColNames;
-    private ArrayList<String> joinHints;
-    private ArrayList<String> sortHints;
-    private ArrayList<String> commonHints; //The Hints is set by user

Review Comment:
   ![image](https://user-images.githubusercontent.com/13197424/192817703-713a25f2-5491-41d4-a7d7-06342a6ac2e8.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r990934807


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;
+    }
+
+    public TableSample(boolean isPercent, Long sampleValue, Long seek) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = seek;
+    }
+
+    public TableSample(TableSample other) {
+        this.sampleValue = other.sampleValue;
+        this.isPercent = other.isPercent;
+        this.seek = other.seek;
+    }
+
+    public Long getSampleValue() {
+        return sampleValue;
+    }
+
+    public boolean isPercent() {
+        return isPercent;
+    }
+
+    public Long getSeek() {
+        return seek;
+    }
+
+    @Override
+    public void analyze(Analyzer analyzer) throws AnalysisException {
+        if (sampleValue <= 0 || (isPercent && sampleValue > 100)) {
+            throw new AnalysisException("table sample value must be greater than 0, percent need less than 100.");
+        }
+    }
+
+    @Override
+    public String toSql() {
+        if (sampleValue == null) {
+            return "";
+        }
+        StringBuilder sb = new StringBuilder();
+        sb.append("TABLESAMPLE ( ");
+        sb.append(sampleValue);
+        if (isPercent) {
+            sb.append(" PERCENT ");
+        } else {
+            sb.append(" ROWS ");
+        }
+        sb.append(")");
+        if (seek != 0) {
+            sb.append(" REPEATABLE ");
+            sb.append(seek);

Review Comment:
   no parentheses, wrong description in pr comment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r991269203


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TupleDescriptor.java:
##########
@@ -159,6 +164,80 @@ public void setTable(TableIf tbl) {
         table = tbl;
     }
 
+    public Set<Long> getSampleTabletIds() {
+        return sampleTabletIds;
+    }
+
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds(List<Long> tabletIds, TableSample tableSample) {
+        if (table.getType() != TableType.OLAP) {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
weizhengte commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r991818187


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;

Review Comment:
   > sry, I meant to give an initial value, which means `private Long seek = -1;`, and remove `this.seek = -1L;`, It looks better. but it doesn't matter much, you can ignore it~
   
   I mean no `final` and you can ignore it~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
Kikyou1997 commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r982279920


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TupleDescriptor.java:
##########
@@ -159,6 +164,80 @@ public void setTable(TableIf tbl) {
         table = tbl;
     }
 
+    public Set<Long> getSampleTabletIds() {
+        return sampleTabletIds;
+    }
+
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.

Review Comment:
   What scene that manually set  `seek`  is used for?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #10170:
URL: https://github.com/apache/doris/pull/10170#issuecomment-1278408217

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r990742236


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;
+    }
+
+    public TableSample(boolean isPercent, Long sampleValue, Long seek) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = seek;
+    }
+
+    public TableSample(TableSample other) {
+        this.sampleValue = other.sampleValue;
+        this.isPercent = other.isPercent;
+        this.seek = other.seek;
+    }
+
+    public Long getSampleValue() {
+        return sampleValue;
+    }
+
+    public boolean isPercent() {
+        return isPercent;
+    }
+
+    public Long getSeek() {
+        return seek;
+    }
+
+    @Override
+    public void analyze(Analyzer analyzer) throws AnalysisException {
+        if (sampleValue <= 0 || (isPercent && sampleValue > 100)) {
+            throw new AnalysisException("table sample value must be greater than 0");

Review Comment:
   fix, I forgot~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r990649621


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;
+    }
+
+    public TableSample(boolean isPercent, Long sampleValue, Long seek) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = seek;
+    }
+
+    public TableSample(TableSample other) {
+        this.sampleValue = other.sampleValue;
+        this.isPercent = other.isPercent;
+        this.seek = other.seek;
+    }
+
+    public Long getSampleValues() {

Review Comment:
   fix



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r990649470


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TupleDescriptor.java:
##########
@@ -159,6 +164,80 @@ public void setTable(TableIf tbl) {
         table = tbl;
     }
 
+    public Set<Long> getSampleTabletIds() {
+        return sampleTabletIds;
+    }
+
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds(List<Long> tabletIds, TableSample tableSample) {
+        if (table.getType() != TableType.OLAP) {
+            return;
+        }
+        sampleTabletIds.addAll(tabletIds);
+        if (tableSample == null) {
+            return;
+        }
+        OlapTable olapTable = (OlapTable) table;
+        long sampleRows; // The total number of sample rows
+        long hitRows = 1; // The total number of rows hit by the tablet
+        long totalRows = 0; // The total number of partition rows hit
+        long totalTablet = 0; // The total number of tablets in the hit partition
+        if (tableSample.isPercent()) {
+            sampleRows = (long) Math.max(olapTable.getRowCount() * (tableSample.getSampleValues() / 100.0), 1);

Review Comment:
   As mentioned above



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
weizhengte commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r990726382


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableRef.java:
##########
@@ -160,6 +161,12 @@ public TableRef(TableName name, String alias, PartitionNames partitionNames, Arr
             hasExplicitAlias = false;
         }
         this.partitionNames = partitionNames;
+        if (sampleTabletIds != null) {
+            this.sampleTabletIds = sampleTabletIds;
+        }
+        if (tableSample != null) {

Review Comment:
   It seems that its initial value is null, `protected TableSample tableSample = null`
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r990929685


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableRef.java:
##########
@@ -160,6 +161,12 @@ public TableRef(TableName name, String alias, PartitionNames partitionNames, Arr
             hasExplicitAlias = false;
         }
         this.partitionNames = partitionNames;
+        if (sampleTabletIds != null) {
+            this.sampleTabletIds = sampleTabletIds;
+        }
+        if (tableSample != null) {

Review Comment:
   This is true, I fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r990934807


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;
+    }
+
+    public TableSample(boolean isPercent, Long sampleValue, Long seek) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = seek;
+    }
+
+    public TableSample(TableSample other) {
+        this.sampleValue = other.sampleValue;
+        this.isPercent = other.isPercent;
+        this.seek = other.seek;
+    }
+
+    public Long getSampleValue() {
+        return sampleValue;
+    }
+
+    public boolean isPercent() {
+        return isPercent;
+    }
+
+    public Long getSeek() {
+        return seek;
+    }
+
+    @Override
+    public void analyze(Analyzer analyzer) throws AnalysisException {
+        if (sampleValue <= 0 || (isPercent && sampleValue > 100)) {
+            throw new AnalysisException("table sample value must be greater than 0, percent need less than 100.");
+        }
+    }
+
+    @Override
+    public String toSql() {
+        if (sampleValue == null) {
+            return "";
+        }
+        StringBuilder sb = new StringBuilder();
+        sb.append("TABLESAMPLE ( ");
+        sb.append(sampleValue);
+        if (isPercent) {
+            sb.append(" PERCENT ");
+        } else {
+            sb.append(" ROWS ");
+        }
+        sb.append(")");
+        if (seek != 0) {
+            sb.append(" REPEATABLE ");
+            sb.append(seek);

Review Comment:
   fix



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
weizhengte commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r991831818


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;
+    }
+
+    public TableSample(boolean isPercent, Long sampleValue, Long seek) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = seek;
+    }
+
+    public TableSample(TableSample other) {
+        this.sampleValue = other.sampleValue;
+        this.isPercent = other.isPercent;
+        this.seek = other.seek;
+    }
+
+    public Long getSampleValue() {
+        return sampleValue;
+    }
+
+    public boolean isPercent() {
+        return isPercent;
+    }
+
+    public Long getSeek() {
+        return seek;
+    }
+
+    @Override
+    public void analyze(Analyzer analyzer) throws AnalysisException {

Review Comment:
   It is also necessary to check whether the type of the table is`OLAP` here.



##########
fe/fe-core/src/main/java/org/apache/doris/planner/OlapScanNode.java:
##########
@@ -756,6 +764,82 @@ private void getScanRangeLocations() throws UserException {
         LOG.debug("distribution prune cost: {} ms", (System.currentTimeMillis() - start));
     }
 
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds() throws UserException {
+        if (desc.getTable().getType() != TableIf.TableType.OLAP) {
+            throw new AnalysisException("Sample table type " + desc.getTable().getType() + " is not OLAP");

Review Comment:
   Moving this part code to the analysis phase, it means that checking the table type is in the `analyze()` of `TableSample`.
   
   Also, if `computeSampleTabletIds()` throws an exception here, will it affect normal queries? For example, for normal queries, when querying non-OLAP tables, will `init()` have exceptions? I think it would be better for non-sampling queries or non-OLAP tables to return directly, or to judge whether it is a sampling query first.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
morrySnow commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r994342637


##########
fe/fe-core/src/main/java/org/apache/doris/planner/OlapScanNode.java:
##########
@@ -756,6 +763,79 @@ private void getScanRangeLocations() throws UserException {
         LOG.debug("distribution prune cost: {} ms", (System.currentTimeMillis() - start));
     }
 
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds() {
+        if (desc.getSampleTabletIds() != null) {

Review Comment:
   why not get sample info from TableRef dierctly?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow commented on pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
morrySnow commented on PR #10170:
URL: https://github.com/apache/doris/pull/10170#issuecomment-1260613471

   @Kikyou1997 PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
Kikyou1997 commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r982166364


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableRef.java:
##########
@@ -79,55 +79,48 @@
 public class TableRef implements ParseNode, Writable {
     private static final Logger LOG = LogManager.getLogger(TableRef.class);
     protected TableName name;
-    private PartitionNames partitionNames = null;
-
     // Legal aliases of this table ref. Contains the explicit alias as its sole element if
     // there is one. Otherwise, contains the two implicit aliases. Implicit aliases are set
     // in the c'tor of the corresponding resolved table ref (subclasses of TableRef) during
     // analysis. By convention, for table refs with multiple implicit aliases, aliases_[0]
     // contains the fully-qualified implicit alias to ensure that aliases_[0] always
     // uniquely identifies this table ref regardless of whether it has an explicit alias.
     protected String[] aliases;
-
+    protected List<Long> sampleTabletIds = Lists.newArrayList();
     // Indicates whether this table ref is given an explicit alias,
     protected boolean hasExplicitAlias;
-
     protected JoinOperator joinOp;
     protected List<String> usingColNames;
-    private ArrayList<String> joinHints;
-    private ArrayList<String> sortHints;
-    private ArrayList<String> commonHints; //The Hints is set by user

Review Comment:
   Why moved those codes?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r982539965


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableRef.java:
##########
@@ -79,55 +79,48 @@
 public class TableRef implements ParseNode, Writable {
     private static final Logger LOG = LogManager.getLogger(TableRef.class);
     protected TableName name;
-    private PartitionNames partitionNames = null;
-
     // Legal aliases of this table ref. Contains the explicit alias as its sole element if
     // there is one. Otherwise, contains the two implicit aliases. Implicit aliases are set
     // in the c'tor of the corresponding resolved table ref (subclasses of TableRef) during
     // analysis. By convention, for table refs with multiple implicit aliases, aliases_[0]
     // contains the fully-qualified implicit alias to ensure that aliases_[0] always
     // uniquely identifies this table ref regardless of whether it has an explicit alias.
     protected String[] aliases;
-
+    protected List<Long> sampleTabletIds = Lists.newArrayList();
     // Indicates whether this table ref is given an explicit alias,
     protected boolean hasExplicitAlias;
-
     protected JoinOperator joinOp;
     protected List<String> usingColNames;
-    private ArrayList<String> joinHints;
-    private ArrayList<String> sortHints;
-    private ArrayList<String> commonHints; //The Hints is set by user

Review Comment:
   Just moved the position, after the code was formatted automatically



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r990641601


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableRef.java:
##########
@@ -160,6 +161,12 @@ public TableRef(TableName name, String alias, PartitionNames partitionNames, Arr
             hasExplicitAlias = false;
         }
         this.partitionNames = partitionNames;
+        if (sampleTabletIds != null) {
+            this.sampleTabletIds = sampleTabletIds;
+        }
+        if (tableSample != null) {

Review Comment:
   No, such as `SELECT * FROM t1 TABLET(10001) limit 1000;`, tableSample is null
   ![image](https://user-images.githubusercontent.com/13197424/194709585-61637b2d-1e8d-4bae-ba6c-b5119b8d724d.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
weizhengte commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r990726382


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableRef.java:
##########
@@ -160,6 +161,12 @@ public TableRef(TableName name, String alias, PartitionNames partitionNames, Arr
             hasExplicitAlias = false;
         }
         this.partitionNames = partitionNames;
+        if (sampleTabletIds != null) {
+            this.sampleTabletIds = sampleTabletIds;
+        }
+        if (tableSample != null) {

Review Comment:
   It seems that its initial value is null, `protected TableSample tableSample = null`
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r991185692


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TupleDescriptor.java:
##########
@@ -159,6 +164,80 @@ public void setTable(TableIf tbl) {
         table = tbl;
     }
 
+    public Set<Long> getSampleTabletIds() {
+        return sampleTabletIds;
+    }
+
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.
+     * And add the manually specified Tablet id to the selected Tablet.
+     * simpleTabletNums = simpleRows / partitionNums / (partitionRows / partitionTabletNums)
+     */
+    public void computeSampleTabletIds(List<Long> tabletIds, TableSample tableSample) {
+        if (table.getType() != TableType.OLAP) {
+            return;
+        }
+        sampleTabletIds.addAll(tabletIds);
+        if (tableSample == null) {
+            return;
+        }
+        OlapTable olapTable = (OlapTable) table;
+        long sampleRows; // The total number of sample rows
+        long hitRows = 1; // The total number of rows hit by the tablet
+        long totalRows = 0; // The total number of partition rows hit
+        long totalTablet = 0; // The total number of tablets in the hit partition
+        if (tableSample.isPercent()) {
+            sampleRows = (long) Math.max(olapTable.getRowCount() * (tableSample.getSampleValues() / 100.0), 1);
+        } else {
+            sampleRows = Math.max(tableSample.getSampleValues(), 1);
+        }
+
+        // calculate the number of tablets by each partition
+        long avgRowsPerPartition = sampleRows / Math.max(olapTable.getPartitions().size(), 1);
+
+        for (Partition p : olapTable.getPartitions()) {
+            List<Long> ids = p.getBaseIndex().getTabletIdsInOrder();
+
+            if (ids.isEmpty()) {
+                continue;
+            }
+
+            if (p.getBaseIndex().getRowCount() < (avgRowsPerPartition / 2)) {

Review Comment:
   sorry, totalRows will not be less than sampleRows
   ![image](https://user-images.githubusercontent.com/13197424/194856737-92eaaa7b-7313-4371-8e9e-3d2f04c3f28a.png)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r990643709


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;
+    }
+
+    public TableSample(boolean isPercent, Long sampleValue, Long seek) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = seek;
+    }
+
+    public TableSample(TableSample other) {
+        this.sampleValue = other.sampleValue;
+        this.isPercent = other.isPercent;
+        this.seek = other.seek;
+    }
+
+    public Long getSampleValues() {
+        return sampleValue;
+    }
+
+    public boolean isPercent() {
+        return isPercent;
+    }
+
+    public Long getSeek() {
+        return seek;
+    }
+
+    @Override
+    public void analyze(Analyzer analyzer) throws AnalysisException {
+        if (sampleValue <= 0) {

Review Comment:
   fix~, this is unreasonable, `SampleValue > 100` is equivalent to `SampleValue = 100`, will sample all tablets



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r991197443


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;

Review Comment:
   It is also set in another init func~
   ![image](https://user-images.githubusercontent.com/13197424/194859356-03265545-8a66-4d47-b610-8ccb5177cb51.png)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
weizhengte commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r991818187


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;

Review Comment:
   > sry, I meant to give an initial value, which means `private Long seek = -1;`, and remove `this.seek = -1L;`, It looks better. but it doesn't matter much, you can ignore it~
   
   Later I mean no `final`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
weizhengte commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r991818187


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;

Review Comment:
   > sry, I meant to give an initial value, which means `private Long seek = -1;`, and remove `this.seek = -1L;`, It looks better. but it doesn't matter much, you can ignore it~
   
   I mean no `final`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] weizhengte commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
weizhengte commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r991831818


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;
+    }
+
+    public TableSample(boolean isPercent, Long sampleValue, Long seek) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = seek;
+    }
+
+    public TableSample(TableSample other) {
+        this.sampleValue = other.sampleValue;
+        this.isPercent = other.isPercent;
+        this.seek = other.seek;
+    }
+
+    public Long getSampleValue() {
+        return sampleValue;
+    }
+
+    public boolean isPercent() {
+        return isPercent;
+    }
+
+    public Long getSeek() {
+        return seek;
+    }
+
+    @Override
+    public void analyze(Analyzer analyzer) throws AnalysisException {

Review Comment:
   The table for sampling query should be OLAP table, check here or where relevant statements are analyzed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r995286683


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TableSample.java:
##########
@@ -0,0 +1,101 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.analysis;
+
+import org.apache.doris.common.AnalysisException;
+
+/*
+ * To represent following stmt:
+ *      TABLESAMPLE (10 PERCENT)
+ *      TABLESAMPLE (100 ROWS)
+ *      TABLESAMPLE (10 PERCENT) REPEATABLE (123)
+ *      TABLESAMPLE (100 ROWS) REPEATABLE (123)R
+ *
+ * references:
+ *      https://simplebiinsights.com/sql-server-tablesample-retrieving-random-data-from-sql-server/
+ *      https://sqlrambling.net/2018/01/24/tablesample-basic-examples/
+ */
+public class TableSample implements ParseNode {
+
+    private final Long sampleValue;
+    private final boolean isPercent;
+    private final Long seek;
+
+    public TableSample(boolean isPercent, Long sampleValue) {
+        this.sampleValue = sampleValue;
+        this.isPercent = isPercent;
+        this.seek = -1L;

Review Comment:
   done~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morrySnow merged pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
morrySnow merged PR #10170:
URL: https://github.com/apache/doris/pull/10170


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Kikyou1997 commented on a diff in pull request #10170: [Enhancement](optimizer) Support select table sample

Posted by GitBox <gi...@apache.org>.
Kikyou1997 commented on code in PR #10170:
URL: https://github.com/apache/doris/pull/10170#discussion_r982279920


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/TupleDescriptor.java:
##########
@@ -159,6 +164,80 @@ public void setTable(TableIf tbl) {
         table = tbl;
     }
 
+    public Set<Long> getSampleTabletIds() {
+        return sampleTabletIds;
+    }
+
+    /**
+     * First, determine how many rows to sample from each partition according to the number of partitions.
+     * Then determine the number of Tablets to be selected for each partition according to the average number
+     * of rows of Tablet,
+     * If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
+     * If seek is specified, it will be selected sequentially from the seek tablet of the partition.

Review Comment:
   What scene the  `seek` is used for?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org