You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/12/21 11:25:08 UTC

[GitHub] [doris] JackDrogon opened a new pull request, #15250: [feature] Add auto bucket implement

JackDrogon opened a new pull request, #15250:
URL: https://github.com/apache/doris/pull/15250

   # Proposed changes
   
   ## Problem summary
   
   用户经常设置不合适的bucket,导致各种问题,这里提供一种方式,来自动设置分桶数。
   
   ## 实现思路
   根据数据量,计算分桶数。
   对于分区表,可以根据历史分区的数据量、机器数、盘数,确定一个分桶。
   主要问题是初始桶数不好确定。
   这里提供两种方式:
   1. 根据机器数、盘数,确定一个分桶数
   2. 用户可以提供一个数据量的经验值,根据这个值,确定分桶数。
   
   ### 详细设计
   1. 建表
   ```
   create table tbl1
   (...)
   [PARTITION BY RANGE(...)]
   DISTRIBUTED BY HASH(k1) BUCKETS 0
   properties(
       ["estimate_partition_size" = "100G"]
   )
   ```
   
   - BUCKETS 0 表示自动设定buckets
   - estimate_partition_size:可选参数,提供一个单分区初始数据量。
   
   2. 分桶计算逻辑
   初始分桶计算
   - 没有给 estimate_partition_size
   这种基本上不太靠谱。感觉直接拍一个就行了,比如 11。
   - 给了 estimate_partition_size
   
   这里我们先假设给的是单副本文本格式的数据量
   1. 先根据数据量得出一个桶数:N
       首先数据量除以5(按5比1的压缩比算)
       < 100MB : 1
       < 1G: 2
       > 1G:  每1G一个分桶。
   
   2. 根据桶数和盘数的乘机得出一个桶数 M
       每个BE节点算1
       磁盘容量,每50G算1
       
   4. min(M, N, 128),如果这个值小于N,也小于机器数。取机器数。
   
   举例:
   ```
   1. 100MB,10台机器,2T * 3 = 1
   2. 1G, 3台机器,500GB * 2 = 2
   5. 100G,3台机器,500GB * 2 = 60 (这个case参考tpch100,我们是48个分桶)
   3. 500G,3台机器,1T * 1 = 60
   4. 500G,10台机器,2T * 3 = 128
   6. 1T,10台机器,2T * 3 = 128 
   7. 500G,1台机器,100TB * 1 = 128
   8. 1TB, 200台机器,4T * 7 = 200
   ```
   
   计算未来分桶
   仅针对分区表。
   根据最多前7个分区的数据量的指数平均值,作为estimate_partition_size,进行评估。
   需要判断历史分区的趋势:
       比如前五个分区,每个都比前一个大,说明数据再增长,则此时不能求平均值,而应该取趋势值。
       仅考虑递增和递减的情况。其他情况,求平均。
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [ ] Yes
       - [ ] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   3. Has document been added or modified:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   4. Does it need to update dependencies:
       - [ ] Yes
       - [ ] No
   5. Are there any changes that cannot be rolled back:
       - [ ] Yes (If Yes, please explain WHY)
       - [ ] No
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] JackDrogon commented on a diff in pull request #15250: [feature] Add auto bucket implement

Posted by GitBox <gi...@apache.org>.
JackDrogon commented on code in PR #15250:
URL: https://github.com/apache/doris/pull/15250#discussion_r1056015723


##########
fe/fe-core/src/main/java/org/apache/doris/clone/DynamicPartitionScheduler.java:
##########
@@ -140,6 +142,69 @@ private Map<String, String> createDefaultRuntimeInfo() {
         return defaultRuntimeInfo;
     }
 
+    // exponential moving average
+    private static long ema(ArrayList<Long> history, int period) {
+        double alpha = 2.0 / (period + 1);
+        double ema = history.get(0);
+        for (int i = 1; i < history.size(); i++) {
+            ema = alpha * history.get(i) + (1 - alpha) * ema;
+        }
+        return (long) ema;
+    }
+
+    private static long getNextPartitionSize(ArrayList<Long> historyPartitionsSize) {
+        if (historyPartitionsSize.size() < 2) {
+            return historyPartitionsSize.get(0);
+        }
+
+        int size = historyPartitionsSize.size() > 7 ? 7 : historyPartitionsSize.size();
+
+        boolean isAscending = true;
+        for (int i = 1; i < size; i++) {
+            if (historyPartitionsSize.get(i) < historyPartitionsSize.get(i - 1)) {
+                isAscending = false;
+                break;
+            }
+        }
+
+        if (isAscending) {
+            ArrayList<Long> historyDeltaSize = Lists.newArrayList();
+            for (int i = 1; i < size; i++) {
+                historyDeltaSize.add(historyPartitionsSize.get(i) - historyPartitionsSize.get(i - 1));
+            }
+            return historyPartitionsSize.get(size - 1) + ema(historyDeltaSize, 7);
+        } else {
+            return ema(historyPartitionsSize, 7);
+        }
+    }
+
+    private static int getBucketsNum(DynamicPartitionProperty property, OlapTable table) {
+        if (!table.isAutoBucket()) {
+            return property.getBuckets();
+        }
+
+        // auto bucket
+        List<Partition> partitions = Lists.newArrayList(table.getPartitions());
+        if (partitions.size() == 0) {
+            return property.getBuckets();
+        }
+
+        Collections.sort(partitions, new Comparator<Partition>() {
+            @Override
+            public int compare(Partition p1, Partition p2) {
+                return (int) (p1.getId() - p2.getId());
+            }
+        });
+        ArrayList<Long> parititonsSize = Lists.newArrayList();
+        for (Partition partition : table.getPartitions()) {
+            parititonsSize.add(partition.getDataSize());
+        }
+
+        // * 5 for uncompressed data
+        long uncompressedPartionSize = getNextPartitionSize(parititonsSize) * 5;

Review Comment:
   AutoBucketUtils.getBucketNum accept an uncompressed partition size,



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] JackDrogon commented on a diff in pull request #15250: (improvement)[bucket] Add auto bucket implement

Posted by GitBox <gi...@apache.org>.
JackDrogon commented on code in PR #15250:
URL: https://github.com/apache/doris/pull/15250#discussion_r1057086725


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/DistributionDesc.java:
##########
@@ -52,27 +70,10 @@ public DistributionInfo toDistributionInfo(List<Column> columns) throws DdlExcep
         throw new NotImplementedException();
     }
 
-    public static DistributionDesc read(DataInput in) throws IOException {
-        DistributionInfoType type = DistributionInfoType.valueOf(Text.readString(in));
-        if (type == DistributionInfoType.HASH) {
-            DistributionDesc desc = new HashDistributionDesc();
-            desc.readFields(in);
-            return desc;
-        } else if (type == DistributionInfoType.RANDOM) {
-            DistributionDesc desc = new RandomDistributionDesc();
-            desc.readFields(in);
-            return desc;
-        } else {
-            throw new IOException("Unknown distribution type: " + type);
-        }
-    }
-
     @Override
     public void write(DataOutput out) throws IOException {
         Text.writeString(out, type.name());
-    }
-
-    public void readFields(DataInput in) throws IOException {
-        throw new NotImplementedException();
+        out.writeInt(numBucket);

Review Comment:
   I check all the code, FE not use DistributionDesc write/read. It use DistributionInfo write/read function, so I not add autoBucket in these functions, just add them in property and load it in OlapTable read with markAutoBucket



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] JackDrogon commented on a diff in pull request #15250: [feature] Add auto bucket implement

Posted by GitBox <gi...@apache.org>.
JackDrogon commented on code in PR #15250:
URL: https://github.com/apache/doris/pull/15250#discussion_r1056019591


##########
fe/fe-core/src/main/java/org/apache/doris/common/util/AutoBucketUtils.java:
##########
@@ -0,0 +1,97 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.common.util;
+
+import org.apache.doris.catalog.DiskInfo;
+import org.apache.doris.catalog.DiskInfo.DiskState;
+import org.apache.doris.catalog.Env;
+import org.apache.doris.system.Backend;
+import org.apache.doris.system.SystemInfoService;
+
+import com.google.common.collect.ImmutableMap;
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
+
+public class AutoBucketUtils {
+    private static Logger logger = LogManager.getLogger(AutoBucketUtils.class);
+
+    private static final long SIZE_100MB = 100 * 1024 * 1024L;
+    private static final long SIZE_1GB = 1 * 1024 * 1024 * 1024L;
+
+    private static int getBENum() {
+        SystemInfoService infoService = Env.getCurrentSystemInfo();
+        ImmutableMap<Long, Backend> backends = infoService.getBackendsInCluster(null);
+
+        int activeBENum = 0;
+        for (Backend backend : backends.values()) {
+            if (backend.isAlive()) {
+                ++activeBENum;
+            }
+        }
+        return activeBENum;
+    }
+
+    private static int getBucketsNumByBEDisks() {
+        SystemInfoService infoService = Env.getCurrentSystemInfo();
+        ImmutableMap<Long, Backend> backends = infoService.getBackendsInCluster(null);
+
+        int buckets = 0;
+        for (Backend backend : backends.values()) {
+            if (!backend.isLoadAvailable()) {

Review Comment:
   If backend is not loadAvailable,it would not be treated as a machine that could take on data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman commented on a diff in pull request #15250: (improvement)[bucket] Add auto bucket implement

Posted by GitBox <gi...@apache.org>.
morningman commented on code in PR #15250:
URL: https://github.com/apache/doris/pull/15250#discussion_r1057079041


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/CreateTableStmt.java:
##########
@@ -94,6 +96,32 @@ public class CreateTableStmt extends DdlStmt {
         engineNames.add("jdbc");
     }
 
+    // if auto bucket auto bucket enable, rewrite distribution bucket num &&
+    // set properties[PropertyAnalyzer.PROPERTIES_AUTO_BUCKET] = "true"
+    private static Map<String, String> maybeRewriteByAutoBucket(DistributionDesc distributionDesc,
+            Map<String, String> properties) throws AnalysisException {
+        if (distributionDesc == null || !distributionDesc.isAutoBucket()) {
+            return properties;
+        }
+
+        // auto bucket is enable
+        Map<String, String> newProperties = properties;
+        if (newProperties == null) {
+            newProperties = new HashMap<String, String>();
+        }
+        newProperties.put(PropertyAnalyzer.PROPERTIES_AUTO_BUCKET, "true");
+
+        if (!newProperties.containsKey(PropertyAnalyzer.PROPERTIES_ESTIMATE_PARTITION_SIZE)) {
+            distributionDesc.setBuckets(11);

Review Comment:
   Define this `11` in `FeConstants.java`



##########
fe/fe-core/src/main/java/org/apache/doris/catalog/DistributionInfo.java:
##########
@@ -40,12 +40,28 @@ public enum DistributionInfoType {
     @SerializedName(value = "type")
     protected DistributionInfoType type;
 
+    @SerializedName(value = "bucketNum")
+    protected int bucketNum;

Review Comment:
   Same suggestions as `DistributionDesc`



##########
fe/fe-core/src/main/java/org/apache/doris/catalog/OlapTable.java:
##########
@@ -1278,6 +1277,9 @@ public void readFields(DataInput in) throws IOException {
         if (in.readBoolean()) {
             tableProperty = TableProperty.read(in);
         }
+        if (isAutoBucket()) {
+            defaultDistributionInfo.markAutoBucket();

Review Comment:
   You have read the `defaultDistributionInfo` in line 1243, so no need to mark auto bucket again.



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/DistributionDesc.java:
##########
@@ -52,27 +70,10 @@ public DistributionInfo toDistributionInfo(List<Column> columns) throws DdlExcep
         throw new NotImplementedException();
     }
 
-    public static DistributionDesc read(DataInput in) throws IOException {
-        DistributionInfoType type = DistributionInfoType.valueOf(Text.readString(in));
-        if (type == DistributionInfoType.HASH) {
-            DistributionDesc desc = new HashDistributionDesc();
-            desc.readFields(in);
-            return desc;
-        } else if (type == DistributionInfoType.RANDOM) {
-            DistributionDesc desc = new RandomDistributionDesc();
-            desc.readFields(in);
-            return desc;
-        } else {
-            throw new IOException("Unknown distribution type: " + type);
-        }
-    }
-
     @Override
     public void write(DataOutput out) throws IOException {
         Text.writeString(out, type.name());
-    }
-
-    public void readFields(DataInput in) throws IOException {
-        throw new NotImplementedException();
+        out.writeInt(numBucket);

Review Comment:
   You can not change `write` method like this, this will cause metadata incompatible.
   I suggest to not modify the `write` and `read` method of `DistributionDesc` and its derived classes.
   And also leave the `numBucket` and `autoBucket` in derived classes.
   And you can add `getBuckets()` and `setBuckets()` method and override them in derived classes.
   This maybe duplicated, but will not broke the metadata compatibility.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] JackDrogon commented on a diff in pull request #15250: [feature] Add auto bucket implement

Posted by GitBox <gi...@apache.org>.
JackDrogon commented on code in PR #15250:
URL: https://github.com/apache/doris/pull/15250#discussion_r1056017197


##########
fe/fe-core/src/main/java/org/apache/doris/common/util/PropertyAnalyzer.java:
##########
@@ -93,6 +93,9 @@ public class PropertyAnalyzer {
 
     public static final String PROPERTIES_INMEMORY = "in_memory";
 
+    public static final String PROPERTIES_AUTO_BUCKET = "auto_bucket";

Review Comment:
   the properties is not from create table stmt syntax. It generated in rewrite



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] JackDrogon commented on a diff in pull request #15250: (improvement)[bucket] Add auto bucket implement

Posted by GitBox <gi...@apache.org>.
JackDrogon commented on code in PR #15250:
URL: https://github.com/apache/doris/pull/15250#discussion_r1057089331


##########
fe/fe-core/src/main/java/org/apache/doris/catalog/OlapTable.java:
##########
@@ -1278,6 +1277,9 @@ public void readFields(DataInput in) throws IOException {
         if (in.readBoolean()) {
             tableProperty = TableProperty.read(in);
         }
+        if (isAutoBucket()) {
+            defaultDistributionInfo.markAutoBucket();

Review Comment:
   For compatibility with DistributuinInfo persistence, I store autoBucket in property.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] JackDrogon commented on a diff in pull request #15250: [feature] Add auto bucket implement

Posted by GitBox <gi...@apache.org>.
JackDrogon commented on code in PR #15250:
URL: https://github.com/apache/doris/pull/15250#discussion_r1056016421


##########
fe/fe-core/src/main/java/org/apache/doris/common/util/AutoBucketUtils.java:
##########
@@ -0,0 +1,97 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.common.util;
+
+import org.apache.doris.catalog.DiskInfo;
+import org.apache.doris.catalog.DiskInfo.DiskState;
+import org.apache.doris.catalog.Env;
+import org.apache.doris.system.Backend;
+import org.apache.doris.system.SystemInfoService;
+
+import com.google.common.collect.ImmutableMap;
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
+
+public class AutoBucketUtils {
+    private static Logger logger = LogManager.getLogger(AutoBucketUtils.class);
+
+    private static final long SIZE_100MB = 100 * 1024 * 1024L;
+    private static final long SIZE_1GB = 1 * 1024 * 1024 * 1024L;
+
+    private static int getBENum() {
+        SystemInfoService infoService = Env.getCurrentSystemInfo();
+        ImmutableMap<Long, Backend> backends = infoService.getBackendsInCluster(null);
+
+        int activeBENum = 0;
+        for (Backend backend : backends.values()) {
+            if (backend.isAlive()) {
+                ++activeBENum;
+            }
+        }
+        return activeBENum;
+    }
+
+    private static int getBucketsNumByBEDisks() {
+        SystemInfoService infoService = Env.getCurrentSystemInfo();
+        ImmutableMap<Long, Backend> backends = infoService.getBackendsInCluster(null);
+
+        int buckets = 0;
+        for (Backend backend : backends.values()) {
+            if (!backend.isLoadAvailable()) {
+                break;
+            }
+
+            ImmutableMap<String, DiskInfo> disks = backend.getDisks();
+            for (DiskInfo diskInfo : disks.values()) {
+                if (diskInfo.getState() == DiskState.ONLINE && diskInfo.hasPathHash()) {
+                    buckets += (diskInfo.getAvailableCapacityB() - 1) / (50 * SIZE_1GB) + 1;
+                }
+            }
+        }
+        return buckets;
+    }
+
+    private static int convertParitionSizeToBucketsNum(long partitionSize) {
+        partitionSize /= 5; // for compression 5:1
+
+        // <= 100MB, 1 bucket
+        // <= 1GB, 2 buckets
+        // > 1GB, round to (size / 1G)
+        if (partitionSize <= SIZE_100MB) {
+            return 1;
+        } else if (partitionSize <= SIZE_1GB) {
+            return 2;
+        } else {
+            return (int) ((partitionSize - 1) / SIZE_1GB + 1);
+        }
+    }
+
+    public static int getBucketsNum(long partitionSize) {
+        int bucketsNumByPartitionSize = convertParitionSizeToBucketsNum(partitionSize);
+        int bucketsNumByBE = getBucketsNumByBEDisks();
+        int bucketsNum = Math.min(128, Math.min(bucketsNumByPartitionSize, bucketsNumByBE));
+        int beNum = getBENum();
+        logger.info("AutoBucketsUtil: bucketsNumByPartitionSize {}, bucketsNumByBE {}, bucketsNum {}, beNum {}",

Review Comment:
   I wrote by debug level before, but i check sys_log_level start from info, hmm. I think create table is not a common behaviour. so it's not a hotpot path



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] perid007 commented on pull request #15250: (improvement)[bucket] Add auto bucket implement

Posted by "perid007 (via GitHub)" <gi...@apache.org>.
perid007 commented on PR #15250:
URL: https://github.com/apache/doris/pull/15250#issuecomment-1557050140

   > how can it support Colocation Join ?
   
   +1 same question


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] JackDrogon commented on a diff in pull request #15250: [feature] Add auto bucket implement

Posted by GitBox <gi...@apache.org>.
JackDrogon commented on code in PR #15250:
URL: https://github.com/apache/doris/pull/15250#discussion_r1056016539


##########
fe/fe-core/src/main/java/org/apache/doris/common/util/AutoBucketUtils.java:
##########
@@ -0,0 +1,97 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.common.util;
+
+import org.apache.doris.catalog.DiskInfo;
+import org.apache.doris.catalog.DiskInfo.DiskState;
+import org.apache.doris.catalog.Env;
+import org.apache.doris.system.Backend;
+import org.apache.doris.system.SystemInfoService;
+
+import com.google.common.collect.ImmutableMap;
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
+
+public class AutoBucketUtils {

Review Comment:
   OK, I will add some ut



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] JackDrogon commented on a diff in pull request #15250: (improvement)[bucket] Add auto bucket implement

Posted by GitBox <gi...@apache.org>.
JackDrogon commented on code in PR #15250:
URL: https://github.com/apache/doris/pull/15250#discussion_r1057087402


##########
fe/fe-core/src/main/java/org/apache/doris/catalog/DistributionInfo.java:
##########
@@ -40,12 +40,28 @@ public enum DistributionInfoType {
     @SerializedName(value = "type")
     protected DistributionInfoType type;
 
+    @SerializedName(value = "bucketNum")
+    protected int bucketNum;

Review Comment:
   hmm, I only move it from HashDistributionInfo && RandomDistributionInfo to base class DistributionInfo. read/write methos are compatible. I test with old fe and this one



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15250: (improvement)[bucket] Add auto bucket implement

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15250:
URL: https://github.com/apache/doris/pull/15250#issuecomment-1383549220

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #15250: [feature] Add auto bucket implement

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #15250:
URL: https://github.com/apache/doris/pull/15250#issuecomment-1361453488

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 34.67 seconds
    load time: 634 seconds
    storage size: 17123613938 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221221150654_clickbench_pr_66506.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman commented on a diff in pull request #15250: (improvement)[bucket] Add auto bucket implement

Posted by GitBox <gi...@apache.org>.
morningman commented on code in PR #15250:
URL: https://github.com/apache/doris/pull/15250#discussion_r1062560575


##########
fe/fe-core/src/main/java/org/apache/doris/common/util/AutoBucketUtils.java:
##########
@@ -0,0 +1,98 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.common.util;
+
+import org.apache.doris.catalog.DiskInfo;
+import org.apache.doris.catalog.DiskInfo.DiskState;
+import org.apache.doris.catalog.Env;
+import org.apache.doris.system.Backend;
+import org.apache.doris.system.SystemInfoService;
+
+import com.google.common.collect.ImmutableMap;
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
+
+public class AutoBucketUtils {
+    private static Logger logger = LogManager.getLogger(AutoBucketUtils.class);
+
+    static final long SIZE_100MB = 100 * 1024 * 1024L;
+    static final long SIZE_1GB = 1 * 1024 * 1024 * 1024L;
+    static final long SIZE_1TB = 1024 * SIZE_1GB;
+
+    private static int getBENum() {
+        SystemInfoService infoService = Env.getCurrentSystemInfo();
+        ImmutableMap<Long, Backend> backends = infoService.getBackendsInCluster(null);
+
+        int activeBENum = 0;
+        for (Backend backend : backends.values()) {
+            if (backend.isAlive()) {
+                ++activeBENum;
+            }
+        }
+        return activeBENum;
+    }
+
+    private static int getBucketsNumByBEDisks() {
+        SystemInfoService infoService = Env.getCurrentSystemInfo();
+        ImmutableMap<Long, Backend> backends = infoService.getBackendsInCluster(null);
+
+        int buckets = 0;
+        for (Backend backend : backends.values()) {
+            if (!backend.isLoadAvailable()) {
+                break;

Review Comment:
   ```suggestion
                   continue;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15250: (improvement)[bucket] Add auto bucket implement

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15250:
URL: https://github.com/apache/doris/pull/15250#issuecomment-1383549192

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman merged pull request #15250: (improvement)[bucket] Add auto bucket implement

Posted by GitBox <gi...@apache.org>.
morningman merged PR #15250:
URL: https://github.com/apache/doris/pull/15250


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman commented on a diff in pull request #15250: [feature] Add auto bucket implement

Posted by GitBox <gi...@apache.org>.
morningman commented on code in PR #15250:
URL: https://github.com/apache/doris/pull/15250#discussion_r1055504998


##########
fe/fe-core/src/main/java/org/apache/doris/common/util/PropertyAnalyzer.java:
##########
@@ -93,6 +93,9 @@ public class PropertyAnalyzer {
 
     public static final String PROPERTIES_INMEMORY = "in_memory";
 
+    public static final String PROPERTIES_AUTO_BUCKET = "auto_bucket";

Review Comment:
   I didn't see this property in the example you gave.



##########
fe/fe-core/src/main/java/org/apache/doris/common/util/AutoBucketUtils.java:
##########
@@ -0,0 +1,97 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.common.util;
+
+import org.apache.doris.catalog.DiskInfo;
+import org.apache.doris.catalog.DiskInfo.DiskState;
+import org.apache.doris.catalog.Env;
+import org.apache.doris.system.Backend;
+import org.apache.doris.system.SystemInfoService;
+
+import com.google.common.collect.ImmutableMap;
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
+
+public class AutoBucketUtils {
+    private static Logger logger = LogManager.getLogger(AutoBucketUtils.class);
+
+    private static final long SIZE_100MB = 100 * 1024 * 1024L;
+    private static final long SIZE_1GB = 1 * 1024 * 1024 * 1024L;
+
+    private static int getBENum() {
+        SystemInfoService infoService = Env.getCurrentSystemInfo();
+        ImmutableMap<Long, Backend> backends = infoService.getBackendsInCluster(null);
+
+        int activeBENum = 0;
+        for (Backend backend : backends.values()) {
+            if (backend.isAlive()) {
+                ++activeBENum;
+            }
+        }
+        return activeBENum;
+    }
+
+    private static int getBucketsNumByBEDisks() {
+        SystemInfoService infoService = Env.getCurrentSystemInfo();
+        ImmutableMap<Long, Backend> backends = infoService.getBackendsInCluster(null);
+
+        int buckets = 0;
+        for (Backend backend : backends.values()) {
+            if (!backend.isLoadAvailable()) {

Review Comment:
   Why judge `isLoadAvailable`?



##########
fe/fe-core/src/main/java/org/apache/doris/clone/DynamicPartitionScheduler.java:
##########
@@ -140,6 +142,69 @@ private Map<String, String> createDefaultRuntimeInfo() {
         return defaultRuntimeInfo;
     }
 
+    // exponential moving average
+    private static long ema(ArrayList<Long> history, int period) {
+        double alpha = 2.0 / (period + 1);
+        double ema = history.get(0);
+        for (int i = 1; i < history.size(); i++) {
+            ema = alpha * history.get(i) + (1 - alpha) * ema;
+        }
+        return (long) ema;
+    }
+
+    private static long getNextPartitionSize(ArrayList<Long> historyPartitionsSize) {
+        if (historyPartitionsSize.size() < 2) {
+            return historyPartitionsSize.get(0);
+        }
+
+        int size = historyPartitionsSize.size() > 7 ? 7 : historyPartitionsSize.size();
+
+        boolean isAscending = true;
+        for (int i = 1; i < size; i++) {
+            if (historyPartitionsSize.get(i) < historyPartitionsSize.get(i - 1)) {
+                isAscending = false;
+                break;
+            }
+        }
+
+        if (isAscending) {
+            ArrayList<Long> historyDeltaSize = Lists.newArrayList();
+            for (int i = 1; i < size; i++) {
+                historyDeltaSize.add(historyPartitionsSize.get(i) - historyPartitionsSize.get(i - 1));
+            }
+            return historyPartitionsSize.get(size - 1) + ema(historyDeltaSize, 7);
+        } else {
+            return ema(historyPartitionsSize, 7);
+        }
+    }
+
+    private static int getBucketsNum(DynamicPartitionProperty property, OlapTable table) {
+        if (!table.isAutoBucket()) {
+            return property.getBuckets();
+        }
+
+        // auto bucket
+        List<Partition> partitions = Lists.newArrayList(table.getPartitions());
+        if (partitions.size() == 0) {
+            return property.getBuckets();
+        }
+
+        Collections.sort(partitions, new Comparator<Partition>() {
+            @Override
+            public int compare(Partition p1, Partition p2) {
+                return (int) (p1.getId() - p2.getId());
+            }
+        });
+        ArrayList<Long> parititonsSize = Lists.newArrayList();
+        for (Partition partition : table.getPartitions()) {
+            parititonsSize.add(partition.getDataSize());
+        }
+
+        // * 5 for uncompressed data
+        long uncompressedPartionSize = getNextPartitionSize(parititonsSize) * 5;

Review Comment:
   No need `* 5`, we can just use compressed data size to calc bucket num.



##########
fe/fe-core/src/main/java/org/apache/doris/common/util/AutoBucketUtils.java:
##########
@@ -0,0 +1,97 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.common.util;
+
+import org.apache.doris.catalog.DiskInfo;
+import org.apache.doris.catalog.DiskInfo.DiskState;
+import org.apache.doris.catalog.Env;
+import org.apache.doris.system.Backend;
+import org.apache.doris.system.SystemInfoService;
+
+import com.google.common.collect.ImmutableMap;
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
+
+public class AutoBucketUtils {

Review Comment:
   Need write unit test for this class



##########
fe/fe-core/src/main/java/org/apache/doris/clone/DynamicPartitionScheduler.java:
##########
@@ -140,6 +142,69 @@ private Map<String, String> createDefaultRuntimeInfo() {
         return defaultRuntimeInfo;
     }
 
+    // exponential moving average
+    private static long ema(ArrayList<Long> history, int period) {
+        double alpha = 2.0 / (period + 1);
+        double ema = history.get(0);
+        for (int i = 1; i < history.size(); i++) {
+            ema = alpha * history.get(i) + (1 - alpha) * ema;
+        }
+        return (long) ema;
+    }
+
+    private static long getNextPartitionSize(ArrayList<Long> historyPartitionsSize) {
+        if (historyPartitionsSize.size() < 2) {
+            return historyPartitionsSize.get(0);
+        }
+
+        int size = historyPartitionsSize.size() > 7 ? 7 : historyPartitionsSize.size();
+
+        boolean isAscending = true;
+        for (int i = 1; i < size; i++) {
+            if (historyPartitionsSize.get(i) < historyPartitionsSize.get(i - 1)) {
+                isAscending = false;
+                break;
+            }
+        }
+
+        if (isAscending) {
+            ArrayList<Long> historyDeltaSize = Lists.newArrayList();
+            for (int i = 1; i < size; i++) {
+                historyDeltaSize.add(historyPartitionsSize.get(i) - historyPartitionsSize.get(i - 1));
+            }
+            return historyPartitionsSize.get(size - 1) + ema(historyDeltaSize, 7);
+        } else {
+            return ema(historyPartitionsSize, 7);
+        }
+    }
+
+    private static int getBucketsNum(DynamicPartitionProperty property, OlapTable table) {
+        if (!table.isAutoBucket()) {
+            return property.getBuckets();
+        }
+
+        // auto bucket
+        List<Partition> partitions = Lists.newArrayList(table.getPartitions());
+        if (partitions.size() == 0) {
+            return property.getBuckets();
+        }
+
+        Collections.sort(partitions, new Comparator<Partition>() {
+            @Override
+            public int compare(Partition p1, Partition p2) {
+                return (int) (p1.getId() - p2.getId());

Review Comment:
   And looks like you sort the `partitions` but not use it?



##########
fe/fe-core/src/main/java/org/apache/doris/clone/DynamicPartitionScheduler.java:
##########
@@ -140,6 +142,69 @@ private Map<String, String> createDefaultRuntimeInfo() {
         return defaultRuntimeInfo;
     }
 
+    // exponential moving average
+    private static long ema(ArrayList<Long> history, int period) {
+        double alpha = 2.0 / (period + 1);
+        double ema = history.get(0);
+        for (int i = 1; i < history.size(); i++) {
+            ema = alpha * history.get(i) + (1 - alpha) * ema;
+        }
+        return (long) ema;
+    }
+
+    private static long getNextPartitionSize(ArrayList<Long> historyPartitionsSize) {
+        if (historyPartitionsSize.size() < 2) {
+            return historyPartitionsSize.get(0);
+        }
+
+        int size = historyPartitionsSize.size() > 7 ? 7 : historyPartitionsSize.size();
+
+        boolean isAscending = true;
+        for (int i = 1; i < size; i++) {
+            if (historyPartitionsSize.get(i) < historyPartitionsSize.get(i - 1)) {
+                isAscending = false;
+                break;
+            }
+        }
+
+        if (isAscending) {
+            ArrayList<Long> historyDeltaSize = Lists.newArrayList();
+            for (int i = 1; i < size; i++) {
+                historyDeltaSize.add(historyPartitionsSize.get(i) - historyPartitionsSize.get(i - 1));
+            }
+            return historyPartitionsSize.get(size - 1) + ema(historyDeltaSize, 7);
+        } else {
+            return ema(historyPartitionsSize, 7);
+        }
+    }
+
+    private static int getBucketsNum(DynamicPartitionProperty property, OlapTable table) {
+        if (!table.isAutoBucket()) {
+            return property.getBuckets();
+        }
+
+        // auto bucket
+        List<Partition> partitions = Lists.newArrayList(table.getPartitions());
+        if (partitions.size() == 0) {
+            return property.getBuckets();
+        }
+
+        Collections.sort(partitions, new Comparator<Partition>() {
+            @Override
+            public int compare(Partition p1, Partition p2) {
+                return (int) (p1.getId() - p2.getId());

Review Comment:
   You can not rely on partition id to get the latest partition.
   Use partition range value instead.



##########
fe/fe-core/src/main/java/org/apache/doris/common/util/AutoBucketUtils.java:
##########
@@ -0,0 +1,97 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.common.util;
+
+import org.apache.doris.catalog.DiskInfo;
+import org.apache.doris.catalog.DiskInfo.DiskState;
+import org.apache.doris.catalog.Env;
+import org.apache.doris.system.Backend;
+import org.apache.doris.system.SystemInfoService;
+
+import com.google.common.collect.ImmutableMap;
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
+
+public class AutoBucketUtils {
+    private static Logger logger = LogManager.getLogger(AutoBucketUtils.class);
+
+    private static final long SIZE_100MB = 100 * 1024 * 1024L;
+    private static final long SIZE_1GB = 1 * 1024 * 1024 * 1024L;
+
+    private static int getBENum() {
+        SystemInfoService infoService = Env.getCurrentSystemInfo();
+        ImmutableMap<Long, Backend> backends = infoService.getBackendsInCluster(null);
+
+        int activeBENum = 0;
+        for (Backend backend : backends.values()) {
+            if (backend.isAlive()) {
+                ++activeBENum;
+            }
+        }
+        return activeBENum;
+    }
+
+    private static int getBucketsNumByBEDisks() {
+        SystemInfoService infoService = Env.getCurrentSystemInfo();
+        ImmutableMap<Long, Backend> backends = infoService.getBackendsInCluster(null);
+
+        int buckets = 0;
+        for (Backend backend : backends.values()) {
+            if (!backend.isLoadAvailable()) {
+                break;
+            }
+
+            ImmutableMap<String, DiskInfo> disks = backend.getDisks();
+            for (DiskInfo diskInfo : disks.values()) {
+                if (diskInfo.getState() == DiskState.ONLINE && diskInfo.hasPathHash()) {
+                    buckets += (diskInfo.getAvailableCapacityB() - 1) / (50 * SIZE_1GB) + 1;
+                }
+            }
+        }
+        return buckets;
+    }
+
+    private static int convertParitionSizeToBucketsNum(long partitionSize) {
+        partitionSize /= 5; // for compression 5:1
+
+        // <= 100MB, 1 bucket
+        // <= 1GB, 2 buckets
+        // > 1GB, round to (size / 1G)
+        if (partitionSize <= SIZE_100MB) {
+            return 1;
+        } else if (partitionSize <= SIZE_1GB) {
+            return 2;
+        } else {
+            return (int) ((partitionSize - 1) / SIZE_1GB + 1);
+        }
+    }
+
+    public static int getBucketsNum(long partitionSize) {
+        int bucketsNumByPartitionSize = convertParitionSizeToBucketsNum(partitionSize);
+        int bucketsNumByBE = getBucketsNumByBEDisks();
+        int bucketsNum = Math.min(128, Math.min(bucketsNumByPartitionSize, bucketsNumByBE));
+        int beNum = getBENum();
+        logger.info("AutoBucketsUtil: bucketsNumByPartitionSize {}, bucketsNumByBE {}, bucketsNum {}, beNum {}",

Review Comment:
   too many logs. remove some or change to debug level



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman commented on pull request #15250: [feature] Add auto bucket implement

Posted by GitBox <gi...@apache.org>.
morningman commented on PR #15250:
URL: https://github.com/apache/doris/pull/15250#issuecomment-1362885350

   Better using `BUCKETS AUTO` instead of `BUCKETS 0`.
   You can return `0` internally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] JackDrogon commented on pull request #15250: [feature] Add auto bucket implement

Posted by GitBox <gi...@apache.org>.
JackDrogon commented on PR #15250:
URL: https://github.com/apache/doris/pull/15250#issuecomment-1362378275

   Need to add show create table with autobucket && estimate_partition_size settings


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] JackDrogon commented on a diff in pull request #15250: (improvement)[bucket] Add auto bucket implement

Posted by GitBox <gi...@apache.org>.
JackDrogon commented on code in PR #15250:
URL: https://github.com/apache/doris/pull/15250#discussion_r1057035333


##########
fe/fe-core/src/main/java/org/apache/doris/clone/DynamicPartitionScheduler.java:
##########
@@ -140,6 +142,69 @@ private Map<String, String> createDefaultRuntimeInfo() {
         return defaultRuntimeInfo;
     }
 
+    // exponential moving average
+    private static long ema(ArrayList<Long> history, int period) {
+        double alpha = 2.0 / (period + 1);
+        double ema = history.get(0);
+        for (int i = 1; i < history.size(); i++) {
+            ema = alpha * history.get(i) + (1 - alpha) * ema;
+        }
+        return (long) ema;
+    }
+
+    private static long getNextPartitionSize(ArrayList<Long> historyPartitionsSize) {
+        if (historyPartitionsSize.size() < 2) {
+            return historyPartitionsSize.get(0);
+        }
+
+        int size = historyPartitionsSize.size() > 7 ? 7 : historyPartitionsSize.size();
+
+        boolean isAscending = true;
+        for (int i = 1; i < size; i++) {
+            if (historyPartitionsSize.get(i) < historyPartitionsSize.get(i - 1)) {
+                isAscending = false;
+                break;
+            }
+        }
+
+        if (isAscending) {
+            ArrayList<Long> historyDeltaSize = Lists.newArrayList();
+            for (int i = 1; i < size; i++) {
+                historyDeltaSize.add(historyPartitionsSize.get(i) - historyPartitionsSize.get(i - 1));
+            }
+            return historyPartitionsSize.get(size - 1) + ema(historyDeltaSize, 7);
+        } else {
+            return ema(historyPartitionsSize, 7);
+        }
+    }
+
+    private static int getBucketsNum(DynamicPartitionProperty property, OlapTable table) {
+        if (!table.isAutoBucket()) {
+            return property.getBuckets();
+        }
+
+        // auto bucket
+        List<Partition> partitions = Lists.newArrayList(table.getPartitions());
+        if (partitions.size() == 0) {
+            return property.getBuckets();
+        }
+
+        Collections.sort(partitions, new Comparator<Partition>() {
+            @Override
+            public int compare(Partition p1, Partition p2) {
+                return (int) (p1.getId() - p2.getId());

Review Comment:
   Sort `partitions` for get last n history partition size



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] enterwhat commented on pull request #15250: (improvement)[bucket] Add auto bucket implement

Posted by "enterwhat (via GitHub)" <gi...@apache.org>.
enterwhat commented on PR #15250:
URL: https://github.com/apache/doris/pull/15250#issuecomment-1434152430

   how can it support Colocation Join ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org