You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@doris.apache.org by mo...@apache.org on 2022/04/11 06:03:41 UTC

[incubator-doris] branch dev-1.0.1 updated (075f9e6420 -> 4ba2fdb96e)

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a change to branch dev-1.0.1
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git


    from 075f9e6420 [fix](ut)(memory-leak) Fix be asan ut failed and hdfs file reader memory leak (#8905)
     new 788bdba3a3 [improvement](join) update broadcast join cost algorithm (#8695)
     new 01b8ef64cd [Bug] Fix some bugs(rewrite rule/symbol transport) of like predicate (#8770)
     new a44f304826 [fix](routine load) Routine load task doesn't reallocate when previous BE is down. (#8824)
     new d003cc7524 [feature](function)(vectorized) Support all geolocation functions on vectorized engine (#8846)
     new d3b051a442 [fix](vectorized) core dump on ST_AsText (#8870)
     new 9c8d005abc [fix] check disk capacity before writing data (#8887)
     new 4f0172d89c [fix](join) Fix error bucket num get in bucket shuffle join in dynamic partition (#8891)
     new a4697fd76c [fix] fix the problem that using tsan to compile，BE will stack overflow when start (#8904)
     new 4ba2fdb96e [fix](error-code) replace invalid format specifier (#8940)

The 9 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/CMakeLists.txt                                  |   3 +-
 be/src/geo/geo_functions.cpp                       |  18 --
 be/src/geo/geo_functions.h                         |  20 ++
 be/src/olap/compaction.cpp                         |   1 +
 be/src/olap/data_dir.cpp                           |   1 -
 be/src/olap/delta_writer.cpp                       |   1 +
 be/src/olap/push_handler.cpp                       |   2 +
 be/src/olap/rowset/beta_rowset_writer.cpp          |   2 +-
 be/src/olap/rowset/rowset_writer_context.h         |   5 +
 be/src/olap/rowset/segment_v2/segment_writer.cpp   |  11 +-
 be/src/olap/rowset/segment_v2/segment_writer.h     |   4 +-
 be/src/olap/schema_change.cpp                      |   3 +
 be/src/olap/tablet.cpp                             |  16 +-
 be/src/olap/tablet_manager.cpp                     |   1 +
 be/src/service/CMakeLists.txt                      |   2 +-
 be/src/vec/functions/functions_geo.cpp             | 354 ++++++++++++++++++++-
 be/src/vec/functions/functions_geo.h               |  22 +-
 be/test/vec/function/function_geo_test.cpp         | 197 ++++++++++++
 build.sh                                           |   3 -
 .../vectorized-execution-engine.md                 |   9 +-
 docs/en/getting-started/advance-usage.md           |  22 +-
 .../vectorized-execution-engine.md                 |   9 +-
 docs/zh-CN/getting-started/advance-usage.md        |  14 +-
 .../java/org/apache/doris/analysis/Analyzer.java   |  21 +-
 .../org/apache/doris/analysis/LikePredicate.java   |  12 +-
 .../java/org/apache/doris/catalog/FunctionSet.java |  10 +-
 .../org/apache/doris/catalog/ScalarFunction.java   |  18 +-
 .../java/org/apache/doris/common/ErrorCode.java    |  90 +++---
 .../doris/load/routineload/RoutineLoadManager.java |  36 ++-
 .../apache/doris/planner/DistributedPlanner.java   |  29 +-
 .../org/apache/doris/planner/OlapScanNode.java     |   7 +-
 .../main/java/org/apache/doris/qe/Coordinator.java |   4 +-
 .../java/org/apache/doris/qe/SessionVariable.java  |   8 +
 .../doris/rewrite/RewriteLikePredicateRule.java    |  50 ---
 .../doris/planner/DistributedPlannerTest.java      |  24 ++
 .../java/org/apache/doris/qe/CoordinatorTest.java  |   2 +
 gensrc/script/doris_builtins_functions.py          |  19 +-
 .../sql/bucket_shuffle_join.out}                   |   5 +-
 .../apache/doris/regression/util/SuiteInfo.groovy  |  33 --
 .../suites/join/ddl/test_bucket_shuffle_join.sql   |  16 +
 .../suites/join/sql/bucket_shuffle_join.sql        |   1 +
 41 files changed, 844 insertions(+), 261 deletions(-)
 delete mode 100644 fe/fe-core/src/main/java/org/apache/doris/rewrite/RewriteLikePredicateRule.java
 copy regression-test/data/{correctness/test_select_constant.out => join/sql/bucket_shuffle_join.out} (59%)
 delete mode 100644 regression-test/framework/src/main/groovy/org/apache/doris/regression/util/SuiteInfo.groovy
 create mode 100644 regression-test/suites/join/ddl/test_bucket_shuffle_join.sql
 create mode 100644 regression-test/suites/join/sql/bucket_shuffle_join.sql


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[incubator-doris] 01/09: [improvement](join) update broadcast join cost algorithm (#8695)

Posted by mo...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch dev-1.0.1
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git

commit 788bdba3a30bffe34431bbd33a071113ae6b8a3d
Author: morrySnow <10...@users.noreply.github.com>
AuthorDate: Sat Apr 9 19:00:27 2022 +0800

    [improvement](join) update broadcast join cost algorithm (#8695)
    
    broadcast join cost is used compressed data size currently.
    The amount of memory used may be significantly more than estimated.
    This patch:
    1. add a compressed ratio to broadcast join cost and set to 5 according to the experience.
    2. add a new session variable `auto_broadcast_join_threshold` to limit memory used by broadcast in bytes, the default value is 1073741824(1GB)
---
 docs/en/getting-started/advance-usage.md           | 22 ++++++++++------
 docs/zh-CN/getting-started/advance-usage.md        | 14 ++++++++---
 .../java/org/apache/doris/analysis/Analyzer.java   | 19 +++++++++++++-
 .../apache/doris/planner/DistributedPlanner.java   | 29 ++++++++++------------
 .../org/apache/doris/planner/OlapScanNode.java     |  5 +++-
 .../java/org/apache/doris/qe/SessionVariable.java  |  8 ++++++
 .../doris/planner/DistributedPlannerTest.java      | 24 ++++++++++++++++++
 7 files changed, 91 insertions(+), 30 deletions(-)

diff --git a/docs/en/getting-started/advance-usage.md b/docs/en/getting-started/advance-usage.md
index 5429b53c39..cfaa54dffe 100644
--- a/docs/en/getting-started/advance-usage.md
+++ b/docs/en/getting-started/advance-usage.md
@@ -179,8 +179,8 @@ mysql> SHOW VARIABLES LIKE "%mem_limit%";
 1 row in set (0.00 sec)
 ```
 
->* The above modification is session level and is only valid within the current connection session. Disconnecting and reconnecting will change back to the default value.
->* If you need to modify the global variable, you can set it as follows: `SET GLOBAL exec_mem_limit = 8589934592;` When the setup is complete, disconnect the session and log in again, and the parameters will take effect permanently.
+> * The above modification is session level and is only valid within the current connection session. Disconnecting and reconnecting will change back to the default value.
+> * If you need to modify the global variable, you can set it as follows: `SET GLOBAL exec_mem_limit = 8589934592;` When the setup is complete, disconnect the session and log in again, and the parameters will take effect permanently.
 
 ### 2.2 Query timeout
 
@@ -202,18 +202,24 @@ Modify the timeout to 1 minute:
 
 `SET query timeout =60;`
 
->* The current timeout check interval is 5 seconds, so timeouts less than 5 seconds are not very accurate.
->* The above modifications are also session level. Global validity can be modified by `SET GLOBAL`.
+> * The current timeout check interval is 5 seconds, so timeouts less than 5 seconds are not very accurate.
+> * The above modifications are also session level. Global validity can be modified by `SET GLOBAL`.
 
 ### 2.3 Broadcast/Shuffle Join
 
-By default, the system implements Join by conditionally filtering small tables, broadcasting them to the nodes where the large tables are located, forming a memory Hash table, and then streaming out the data of the large tables Hash Join. However, if the amount of data filtered by small tables cannot be put into memory, Join will not be able to complete at this time. The usual error should be caused by memory overrun first.
+The system implements Join operator in two ways:
 
-If you encounter the above situation, it is recommended to use Shuffle Join explicitly, also known as Partitioned Join. That is, small and large tables are Hash according to Join's key, and then distributed Join. This memory consumption is allocated to all computing nodes in the cluster.
+Broadcast join: conditionally filtering right hand tables, broadcasting them to the nodes where the large tables are located, forming a memory Hash table, and then streaming out the data of the large tables Hash Join.
 
-Doris will try to use Broadcast Join first. If small tables are too large to broadcasting, Doris will switch to Shuffle Join automatically. Note that if you use Broadcast Join explicitly in this case, Doris will still switch to Shuffle Join automatically.
+Shuffle join: tables in both side are Hash according to Join's key, and then distributed Join. This memory consumption is allocated to all computing nodes in the cluster.
 
-Use Broadcast Join (default):
+Broadcast join is perfermance better when right hand table size is really small, vice versa.
+
+Doris will try to use Broadcast Join first. You can specify how each join operator is implemented explicitly. System provides configurable parameter `auto_broadcast_join_threshold` to configure the maximum percentage of execute memory could used for build hash table for broadcast join. The meaningful values range from 0 to 1, and the default value is 0.8. System will use shuffle join when broadcast join used memory more than it.
+
+You can turn off broadcast join by set `auto_broadcast_join_threshold` to negative or zero.
+
+Choose join implementation automaticaly (default):
 
 ```
 mysql> select sum(table1.pv) from table1 join table2 where table1.siteid = 2;
diff --git a/docs/zh-CN/getting-started/advance-usage.md b/docs/zh-CN/getting-started/advance-usage.md
index c32ab0ed14..8be1eb5ee2 100644
--- a/docs/zh-CN/getting-started/advance-usage.md
+++ b/docs/zh-CN/getting-started/advance-usage.md
@@ -207,13 +207,19 @@ mysql> SHOW VARIABLES LIKE "%query_timeout%";
 
 ### 2.3 Broadcast/Shuffle Join
 
-系统默认实现 Join 的方式，是将小表进行条件过滤后，将其广播到大表所在的各个节点上，形成一个内存 Hash 表，然后流式读出大表的数据进行Hash Join。但是如果当小表过滤后的数据量无法放入内存的话，此时 Join 将无法完成，通常的报错应该是首先造成内存超限。
+系统提供了两种Join的实现方式，broadcast join和shuffle join（partitioned Join）。
 
-如果遇到上述情况，建议显式指定 Shuffle Join，也被称作 Partitioned Join。即将小表和大表都按照 Join 的 key 进行 Hash，然后进行分布式的 Join。这个对内存的消耗就会分摊到集群的所有计算节点上。
+broadcast join是指将小表进行条件过滤后，将其广播到大表所在的各个节点上，形成一个内存 Hash 表，然后流式读出大表的数据进行Hash Join。
 
-Doris会自动尝试进行 Broadcast Join，如果预估小表过大则会自动切换至 Shuffle Join。注意，如果此时显式指定了 Broadcast Join 也会自动切换至 Shuffle Join。
+shuffle join是指将小表和大表都按照 Join 的 key 进行 Hash，然后进行分布式的 Join。
 
-使用 Broadcast Join（默认）:
+当小表的数据量较小时，broadcast join拥有更好的性能。反之，则shuffle join拥有更好的性能。
+
+系统会自动尝试进行 Broadcast Join，也可以显式指定每个join算子的实现方式。系统提供了可配置的参数`auto_broadcast_join_threshold`，指定使用broadcast join时，hash table使用的内存占整体执行内存比例的上限，取值范围为0到1，默认值为0.8。当系统计算hash table使用的内存会超过此限制时，会自动转换为使用shuffle join。
+
+当`auto_broadcast_join_threshold`被设置为小于等于0时，所有的join都将使用shuffle join。
+
+自动选择join方式（默认）:
 
 ```
 mysql> select sum(table1.pv) from table1 join table2 where table1.siteid = 2;
diff --git a/fe/fe-core/src/main/java/org/apache/doris/analysis/Analyzer.java b/fe/fe-core/src/main/java/org/apache/doris/analysis/Analyzer.java
index 8b5ebab306..2addbbde7f 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/analysis/Analyzer.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/analysis/Analyzer.java
@@ -179,6 +179,10 @@ public class Analyzer {
     public List<RuntimeFilter> getAssignedRuntimeFilter() { return assignedRuntimeFilters; }
     public void clearAssignedRuntimeFilters() { assignedRuntimeFilters.clear(); }
 
+    public long getAutoBroadcastJoinThreshold() {
+        return globalState.autoBroadcastJoinThreshold;
+    }
+
     // state shared between all objects of an Analyzer tree
     // TODO: Many maps here contain properties about tuples, e.g., whether
     // a tuple is outer/semi joined, etc. Remove the maps in favor of making
@@ -292,6 +296,8 @@ public class Analyzer {
 
         private final ExprRewriter mvExprRewriter;
 
+        private final long autoBroadcastJoinThreshold;
+
         public GlobalState(Catalog catalog, ConnectContext context) {
             this.catalog = catalog;
             this.context = context;
@@ -325,8 +331,19 @@ public class Analyzer {
             mvRewriteRules.add(HLLHashToSlotRefRule.INSTANCE);
             mvRewriteRules.add(CountFieldToSum.INSTANCE);
             mvExprRewriter = new ExprRewriter(mvRewriteRules);
+
+            // compute max exec mem could be used for broadcast join
+            long perNodeMemLimit = context.getSessionVariable().getMaxExecMemByte();
+            double autoBroadcastJoinThresholdPercentage = context.getSessionVariable().autoBroadcastJoinThreshold;
+            if (autoBroadcastJoinThresholdPercentage > 1) {
+                autoBroadcastJoinThresholdPercentage = 1.0;
+            } else if (autoBroadcastJoinThresholdPercentage <= 0) {
+                autoBroadcastJoinThresholdPercentage = -1.0;
+            }
+            autoBroadcastJoinThreshold = (long)(perNodeMemLimit * autoBroadcastJoinThresholdPercentage);
         }
-    };
+    }
+
     private final GlobalState globalState;
 
     // An analyzer stores analysis state for a single select block. A select block can be
diff --git a/fe/fe-core/src/main/java/org/apache/doris/planner/DistributedPlanner.java b/fe/fe-core/src/main/java/org/apache/doris/planner/DistributedPlanner.java
index 90dda46638..5d7d30aec2 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/planner/DistributedPlanner.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/planner/DistributedPlanner.java
@@ -96,12 +96,10 @@ public class DistributedPlanner {
             Preconditions.checkState(!queryStmt.hasOffset());
             isPartitioned = true;
         }
-        long perNodeMemLimit = ctx_.getQueryOptions().mem_limit;
         if (LOG.isDebugEnabled()) {
             LOG.debug("create plan fragments");
-            LOG.debug("memlimit=" + Long.toString(perNodeMemLimit));
         }
-        createPlanFragments(singleNodePlan, isPartitioned, perNodeMemLimit, fragments);
+        createPlanFragments(singleNodePlan, isPartitioned, fragments);
         return fragments;
     }
 
@@ -181,8 +179,7 @@ public class DistributedPlanner {
      * partitioned; the partition function is derived from the inputs.
      */
     private PlanFragment createPlanFragments(
-            PlanNode root, boolean isPartitioned,
-            long perNodeMemLimit, ArrayList<PlanFragment> fragments) throws UserException {
+            PlanNode root, boolean isPartitioned, ArrayList<PlanFragment> fragments) throws UserException {
         ArrayList<PlanFragment> childFragments = Lists.newArrayList();
         for (PlanNode child : root.getChildren()) {
             // allow child fragments to be partitioned, unless they contain a limit clause
@@ -193,7 +190,7 @@ public class DistributedPlanner {
             // TODO()
             // if (root instanceof SubplanNode && child == root.getChild(1)) continue;
             childFragments.add(
-                    createPlanFragments(child, childIsPartitioned, perNodeMemLimit, fragments));
+                    createPlanFragments(child, childIsPartitioned, fragments));
         }
 
         PlanFragment result = null;
@@ -204,8 +201,8 @@ public class DistributedPlanner {
             result = createTableFunctionFragment(root, childFragments.get(0));
         } else if (root instanceof HashJoinNode) {
             Preconditions.checkState(childFragments.size() == 2);
-            result = createHashJoinFragment((HashJoinNode) root, childFragments.get(1),
-                    childFragments.get(0), perNodeMemLimit, fragments);
+            result = createHashJoinFragment((HashJoinNode) root,
+                    childFragments.get(1), childFragments.get(0), fragments);
         } else if (root instanceof CrossJoinNode) {
             result = createCrossJoinFragment((CrossJoinNode) root, childFragments.get(1),
                     childFragments.get(0));
@@ -306,9 +303,9 @@ public class DistributedPlanner {
      * This function is mainly used to choose the most suitable distributed method for the 'node',
      * and transform it into PlanFragment.
      */
-    private PlanFragment createHashJoinFragment(HashJoinNode node, PlanFragment rightChildFragment,
-                                                PlanFragment leftChildFragment, long perNodeMemLimit,
-                                                ArrayList<PlanFragment> fragments)
+    private PlanFragment createHashJoinFragment(
+            HashJoinNode node, PlanFragment rightChildFragment,
+            PlanFragment leftChildFragment, ArrayList<PlanFragment> fragments)
             throws UserException {
         List<String> reason = Lists.newArrayList();
         if (canColocateJoin(node, leftChildFragment, rightChildFragment, reason)) {
@@ -352,16 +349,16 @@ public class DistributedPlanner {
         // - or if it's cheaper and we weren't explicitly told to do a partitioned join
         // - and we're not doing a full or right outer join (those require the left-hand
         //   side to be partitioned for correctness)
-        // - and the expected size of the hash tbl doesn't exceed perNodeMemLimit
+        // - and the expected size of the hash tbl doesn't exceed autoBroadcastThreshold
         // we set partition join as default when broadcast join cost equals partition join cost
+
         if (node.getJoinOp() != JoinOperator.RIGHT_OUTER_JOIN && node.getJoinOp() != JoinOperator.FULL_OUTER_JOIN) {
             if (node.getInnerRef().isBroadcastJoin()) {
                 // respect user join hint
                 doBroadcast = true;
-            } else if (!node.getInnerRef().isPartitionJoin()
-                    && joinCostEvaluation.isBroadcastCostSmaller()
-                    && (perNodeMemLimit == 0
-                    || joinCostEvaluation.constructHashTableSpace() <= perNodeMemLimit)) {
+            } else if (!node.getInnerRef().isPartitionJoin() && joinCostEvaluation.isBroadcastCostSmaller()
+                    && joinCostEvaluation.constructHashTableSpace()
+                    <= ctx_.getRootAnalyzer().getAutoBroadcastJoinThreshold()) {
                 doBroadcast = true;
             } else {
                 doBroadcast = false;
diff --git a/fe/fe-core/src/main/java/org/apache/doris/planner/OlapScanNode.java b/fe/fe-core/src/main/java/org/apache/doris/planner/OlapScanNode.java
index 13ccc90407..6310a7aaf8 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/planner/OlapScanNode.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/planner/OlapScanNode.java
@@ -89,6 +89,9 @@ import java.util.stream.Collectors;
 public class OlapScanNode extends ScanNode {
     private static final Logger LOG = LogManager.getLogger(OlapScanNode.class);
 
+    // average compression ratio in doris storage engine
+    private final static int COMPRESSION_RATIO = 5;
+
     private List<TScanRangeLocations> result = new ArrayList<>();
     /*
      * When the field value is ON, the storage engine can return the data directly without pre-aggregation.
@@ -376,7 +379,7 @@ public class OlapScanNode extends ScanNode {
     public void computeStats(Analyzer analyzer) {
         super.computeStats(analyzer);
         if (cardinality > 0) {
-            avgRowSize = totalBytes / (float) cardinality;
+            avgRowSize = totalBytes / (float) cardinality * COMPRESSION_RATIO;
             capCardinalityAtLimit();
         }
         // when node scan has no data, cardinality should be 0 instead of a invalid value after computeStats()
diff --git a/fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java b/fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java
index e7725eccbb..20b1187e84 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java
@@ -176,6 +176,8 @@ public class SessionVariable implements Serializable, Writable {
 
     public static final String BLOCK_ENCRYPTION_MODE = "block_encryption_mode";
 
+    public static final String AUTO_BROADCAST_JOIN_THRESHOLD = "auto_broadcast_join_threshold";
+
     public static final String ENABLE_PROJECTION = "enable_projection";
 
     // session origin value
@@ -431,6 +433,12 @@ public class SessionVariable implements Serializable, Writable {
     @VariableMgr.VarAttr(name = BLOCK_ENCRYPTION_MODE)
     private String blockEncryptionMode = "";
 
+    // the maximum size in bytes for a table that will be broadcast to all be nodes
+    // when performing a join, By setting this value to -1 broadcasting can be disabled.
+    // Default value is 1Gto
+    @VariableMgr.VarAttr(name = AUTO_BROADCAST_JOIN_THRESHOLD)
+    public double autoBroadcastJoinThreshold = 0.8;
+  
     @VariableMgr.VarAttr(name = ENABLE_PROJECTION)
     private boolean enableProjection = false;
 
diff --git a/fe/fe-core/src/test/java/org/apache/doris/planner/DistributedPlannerTest.java b/fe/fe-core/src/test/java/org/apache/doris/planner/DistributedPlannerTest.java
index 0d304e992c..94b68b295d 100644
--- a/fe/fe-core/src/test/java/org/apache/doris/planner/DistributedPlannerTest.java
+++ b/fe/fe-core/src/test/java/org/apache/doris/planner/DistributedPlannerTest.java
@@ -147,4 +147,28 @@ public class DistributedPlannerTest {
         plan = planner.getExplainString(fragments, new ExplainOptions(false, false));
         Assert.assertEquals(1, StringUtils.countMatches(plan, "INNER JOIN (PARTITIONED)"));
     }
+
+    @Test
+    public void testBroadcastJoinCostThreshold() throws Exception {
+        String sql = "explain select * from db1.tbl1 join db1.tbl2 on tbl1.k1 = tbl2.k3";
+        StmtExecutor stmtExecutor = new StmtExecutor(ctx, sql);
+        stmtExecutor.execute();
+        Planner planner = stmtExecutor.planner();
+        List<PlanFragment> fragments = planner.getFragments();
+        String plan = planner.getExplainString(fragments, new ExplainOptions(false, false));
+        Assert.assertEquals(1, StringUtils.countMatches(plan, "INNER JOIN (BROADCAST)"));
+
+        double originThreshold = ctx.getSessionVariable().autoBroadcastJoinThreshold;
+        try {
+            ctx.getSessionVariable().autoBroadcastJoinThreshold = -1.0;
+            stmtExecutor = new StmtExecutor(ctx, sql);
+            stmtExecutor.execute();
+            planner = stmtExecutor.planner();
+            fragments = planner.getFragments();
+            plan = planner.getExplainString(fragments, new ExplainOptions(false, false));
+            Assert.assertEquals(1, StringUtils.countMatches(plan, "INNER JOIN (PARTITIONED)"));
+        } finally {
+            ctx.getSessionVariable().autoBroadcastJoinThreshold = originThreshold;
+        }
+    }
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[incubator-doris] 07/09: [fix](join) Fix error bucket num get in bucket shuffle join in dynamic partition (#8891)

Posted by mo...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch dev-1.0.1
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git

commit 4f0172d89c673789297748a05a94538ff976605c
Author: HappenLee <ha...@hotmail.com>
AuthorDate: Sat Apr 9 19:11:44 2022 +0800

    [fix](join) Fix error bucket num get in bucket shuffle join in dynamic partition (#8891)
---
 be/src/service/CMakeLists.txt                      |  2 +-
 build.sh                                           |  3 --
 .../org/apache/doris/planner/OlapScanNode.java     |  2 ++
 .../main/java/org/apache/doris/qe/Coordinator.java |  4 ++-
 .../java/org/apache/doris/qe/CoordinatorTest.java  |  2 ++
 .../data/join/sql/bucket_shuffle_join.out          |  5 ++++
 .../apache/doris/regression/util/SuiteInfo.groovy  | 33 ----------------------
 .../suites/join/ddl/test_bucket_shuffle_join.sql   | 16 +++++++++++
 .../suites/join/sql/bucket_shuffle_join.sql        |  1 +
 9 files changed, 30 insertions(+), 38 deletions(-)

diff --git a/be/src/service/CMakeLists.txt b/be/src/service/CMakeLists.txt
index fecc5b4153..fb07d2cca7 100644
--- a/be/src/service/CMakeLists.txt
+++ b/be/src/service/CMakeLists.txt
@@ -44,7 +44,7 @@ if (${MAKE_TEST} STREQUAL "OFF")
     install(DIRECTORY DESTINATION ${OUTPUT_DIR}/lib/)
     install(TARGETS palo_be DESTINATION ${OUTPUT_DIR}/lib/)
 
-    if (${STRIP_DEBUG_INFO} STREQUAL "ON")
+    if ("${STRIP_DEBUG_INFO}" STREQUAL "ON")
         add_custom_command(TARGET palo_be POST_BUILD
             COMMAND ${CMAKE_OBJCOPY} --only-keep-debug $<TARGET_FILE:palo_be> $<TARGET_FILE:palo_be>.dbg
             COMMAND ${CMAKE_STRIP} --strip-debug --strip-unneeded $<TARGET_FILE:palo_be>
diff --git a/build.sh b/build.sh
index 16c7b792b0..f5be3971fb 100755
--- a/build.sh
+++ b/build.sh
@@ -293,9 +293,6 @@ if [ ${BUILD_FE} -eq 1 -o ${BUILD_SPARK_DPP} -eq 1 ]; then
 fi
 
 function build_ui() {
-    # check NPM env here, not in env.sh.
-    # Because UI should be considered a non-essential component at runtime.
-    # Only when the compilation is required, check the relevant compilation environment.
     NPM=npm
     if ! ${NPM} --version; then
         echo "Error: npm is not found"
diff --git a/fe/fe-core/src/main/java/org/apache/doris/planner/OlapScanNode.java b/fe/fe-core/src/main/java/org/apache/doris/planner/OlapScanNode.java
index 6310a7aaf8..0ba2e7a693 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/planner/OlapScanNode.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/planner/OlapScanNode.java
@@ -172,6 +172,8 @@ public class OlapScanNode extends ScanNode {
         setCanTurnOnPreAggr(false);
     }
 
+    public long getTotalTabletsNum() { return totalTabletsNum; }
+
     public boolean getForceOpenPreAgg() {
         return forceOpenPreAgg;
     }
diff --git a/fe/fe-core/src/main/java/org/apache/doris/qe/Coordinator.java b/fe/fe-core/src/main/java/org/apache/doris/qe/Coordinator.java
index 7e03688b37..502b3bfa39 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/qe/Coordinator.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/qe/Coordinator.java
@@ -1786,7 +1786,9 @@ public class Coordinator {
         private void computeScanRangeAssignmentByBucket(
                 final OlapScanNode scanNode, ImmutableMap<Long, Backend> idToBackend, Map<TNetworkAddress, Long> addressToBackendID) throws Exception {
             if (!fragmentIdToSeqToAddressMap.containsKey(scanNode.getFragmentId())) {
-                fragmentIdToBucketNumMap.put(scanNode.getFragmentId(), scanNode.getOlapTable().getDefaultDistributionInfo().getBucketNum());
+                // The bucket shuffle join only hit when the partition is one. so the totalTabletsNum is all tablet of
+                // one hit partition. can be the right bucket num in bucket shuffle join
+                fragmentIdToBucketNumMap.put(scanNode.getFragmentId(), (int)scanNode.getTotalTabletsNum());
                 fragmentIdToSeqToAddressMap.put(scanNode.getFragmentId(), new HashedMap());
                 fragmentIdBucketSeqToScanRangeMap.put(scanNode.getFragmentId(), new BucketSeqToScanRange());
                 fragmentIdToBuckendIdBucketCountMap.put(scanNode.getFragmentId(), new HashMap<>());
diff --git a/fe/fe-core/src/test/java/org/apache/doris/qe/CoordinatorTest.java b/fe/fe-core/src/test/java/org/apache/doris/qe/CoordinatorTest.java
index fb86df6e33..e355f6c247 100644
--- a/fe/fe-core/src/test/java/org/apache/doris/qe/CoordinatorTest.java
+++ b/fe/fe-core/src/test/java/org/apache/doris/qe/CoordinatorTest.java
@@ -235,6 +235,7 @@ public class CoordinatorTest extends Coordinator {
         }
 
         Deencapsulation.setField(olapScanNode, "bucketSeq2locations", bucketseq2localtion);
+        Deencapsulation.setField(olapScanNode, "totalTabletsNum", 66);
         olapScanNode.setFragment(new PlanFragment(planFragmentId, olapScanNode,
                 new DataPartition(TPartitionType.UNPARTITIONED)));
 
@@ -357,6 +358,7 @@ public class CoordinatorTest extends Coordinator {
         }
 
         Deencapsulation.setField(olapScanNode, "bucketSeq2locations", bucketseq2localtion);
+        Deencapsulation.setField(olapScanNode, "totalTabletsNum", 66);
         olapScanNode.setFragment(new PlanFragment(planFragmentId, olapScanNode,
                 new DataPartition(TPartitionType.UNPARTITIONED)));
 
diff --git a/regression-test/data/join/sql/bucket_shuffle_join.out b/regression-test/data/join/sql/bucket_shuffle_join.out
new file mode 100644
index 0000000000..87f57761ba
--- /dev/null
+++ b/regression-test/data/join/sql/bucket_shuffle_join.out
@@ -0,0 +1,5 @@
+-- This file is automatically generated. You should know what you did if you want to edit this
+-- !bucket_shuffle_join --
+1	2021-12-01T00:00
+2	2021-12-01T00:00
+
diff --git a/regression-test/framework/src/main/groovy/org/apache/doris/regression/util/SuiteInfo.groovy b/regression-test/framework/src/main/groovy/org/apache/doris/regression/util/SuiteInfo.groovy
deleted file mode 100644
index 589d5b882c..0000000000
--- a/regression-test/framework/src/main/groovy/org/apache/doris/regression/util/SuiteInfo.groovy
+++ /dev/null
@@ -1,33 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-package org.apache.doris.regression.util
-
-import groovy.transform.CompileStatic
-
-@CompileStatic
-class SuiteInfo {
-    File file
-    String group
-    String suiteName
-
-    SuiteInfo(File file, String group, String suiteName) {
-        this.file = file
-        this.group = group
-        this.suiteName = suiteName
-    }
-}
diff --git a/regression-test/suites/join/ddl/test_bucket_shuffle_join.sql b/regression-test/suites/join/ddl/test_bucket_shuffle_join.sql
new file mode 100644
index 0000000000..7d461bf86a
--- /dev/null
+++ b/regression-test/suites/join/ddl/test_bucket_shuffle_join.sql
@@ -0,0 +1,16 @@
+CREATE TABLE `test_bucket_shuffle_join` (
+  `id` int(11) NOT NULL COMMENT "",
+  `rectime` datetime NOT NULL COMMENT ""
+) ENGINE=OLAP
+UNIQUE KEY(`id`, `rectime`)
+COMMENT "olap"
+PARTITION BY RANGE(`rectime`)
+(
+PARTITION p202111 VALUES [('2021-11-01 00:00:00'), ('2021-12-01 00:00:00')))
+DISTRIBUTED BY HASH(`id`) BUCKETS 10
+PROPERTIES (
+"replication_allocation" = "tag.location.default: 1",
+"in_memory" = "false",
+"storage_format" = "V2"
+)
+
diff --git a/regression-test/suites/join/sql/bucket_shuffle_join.sql b/regression-test/suites/join/sql/bucket_shuffle_join.sql
new file mode 100644
index 0000000000..807613e2e4
--- /dev/null
+++ b/regression-test/suites/join/sql/bucket_shuffle_join.sql
@@ -0,0 +1 @@
+select * from test_bucket_shuffle_join where rectime="2021-12-01 00:00:00" and id in (select k1 from test_join where k1 in (1,2))


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[incubator-doris] 03/09: [fix](routine load) Routine load task doesn't reallocate when previous BE is down. (#8824)

Posted by mo...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch dev-1.0.1
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git

commit a44f304826ef1d0a1e134bb8734446133405016c
Author: Henry2SS <45...@users.noreply.github.com>
AuthorDate: Sat Apr 9 19:02:55 2022 +0800

    [fix](routine load) Routine load task doesn't reallocate when previous BE is down. (#8824)
    
    if previous be is not alive, should assigned another available BE instead.
---
 .../doris/load/routineload/RoutineLoadManager.java | 36 ++++++++++++----------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/fe/fe-core/src/main/java/org/apache/doris/load/routineload/RoutineLoadManager.java b/fe/fe-core/src/main/java/org/apache/doris/load/routineload/RoutineLoadManager.java
index ed6e2eaf81..2d6c13adab 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/load/routineload/RoutineLoadManager.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/load/routineload/RoutineLoadManager.java
@@ -41,6 +41,7 @@ import org.apache.doris.mysql.privilege.PrivPredicate;
 import org.apache.doris.persist.AlterRoutineLoadJobOperationLog;
 import org.apache.doris.persist.RoutineLoadOperation;
 import org.apache.doris.qe.ConnectContext;
+import org.apache.doris.system.Backend;
 
 import com.google.common.base.Preconditions;
 import com.google.common.collect.Lists;
@@ -410,36 +411,39 @@ public class RoutineLoadManager implements Writable {
     // check if the specified BE is available for running task
     // return true if it is available. return false if otherwise.
     // throw exception if unrecoverable errors happen.
-    public long getAvailableBeForTask(long previoudBeId, String clusterName) throws LoadException {
+    public long getAvailableBeForTask(long previousBeId, String clusterName) throws LoadException {
         List<Long> beIdsInCluster = Catalog.getCurrentSystemInfo().getClusterBackendIds(clusterName, true);
         if (beIdsInCluster == null) {
             throw new LoadException("The " + clusterName + " has been deleted");
         }
 
-        if (previoudBeId != -1L && !beIdsInCluster.contains(previoudBeId)) {
-            return -1L;
-        }
-
         // check if be has idle slot
         readLock();
         try {
             Map<Long, Integer> beIdToConcurrentTasks = getBeCurrentTasksNumMap();
+
             // 1. Find if the given BE id has available slots
-            if (previoudBeId != -1L) {
-                int idleTaskNum = 0;
-                if (!beIdToMaxConcurrentTasks.containsKey(previoudBeId)) {
-                    idleTaskNum = 0;
-                } else if (beIdToConcurrentTasks.containsKey(previoudBeId)) {
-                    idleTaskNum = beIdToMaxConcurrentTasks.get(previoudBeId) - beIdToConcurrentTasks.get(previoudBeId);
-                } else {
-                    idleTaskNum = Config.max_routine_load_task_num_per_be;
-                }
-                if (idleTaskNum > 0) {
-                    return previoudBeId;
+            if (previousBeId != -1L && beIdsInCluster.contains(previousBeId)) {
+                // get the previousBackend info
+                Backend previousBackend = Catalog.getCurrentSystemInfo().getBackend(previousBeId);
+                // check previousBackend is not null && load available
+                if (previousBackend != null && previousBackend.isLoadAvailable()) {
+                    int idleTaskNum = 0;
+                    if (!beIdToMaxConcurrentTasks.containsKey(previousBeId)) {
+                        idleTaskNum = 0;
+                    } else if (beIdToConcurrentTasks.containsKey(previousBeId)) {
+                        idleTaskNum = beIdToMaxConcurrentTasks.get(previousBeId) - beIdToConcurrentTasks.get(previousBeId);
+                    } else {
+                        idleTaskNum = beIdToMaxConcurrentTasks.get(previousBeId);
+                    }
+                    if (idleTaskNum > 0) {
+                        return previousBeId;
+                    }
                 }
             }
 
             // 2. The given BE id does not have available slots, find a BE with min tasks
+            // 3. The previos BE is not in cluster && is not load available, find a new BE with min tasks
             int idleTaskNum = 0;
             long resultBeId = -1L;
             int maxIdleSlotNum = 0;


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[incubator-doris] 05/09: [fix](vectorized) core dump on ST_AsText (#8870)

Posted by mo...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch dev-1.0.1
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git

commit d3b051a442aa8dd113495a280cc4b465050d24e4
Author: morningman <mo...@163.com>
AuthorDate: Mon Apr 11 11:42:09 2022 +0800

    [fix](vectorized) core dump on ST_AsText (#8870)
---
 be/src/vec/functions/functions_geo.cpp | 36 ++++++++++++++++++++--------------
 be/src/vec/functions/functions_geo.h   |  1 -
 2 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/be/src/vec/functions/functions_geo.cpp b/be/src/vec/functions/functions_geo.cpp
index ba4c370506..51b9cd5311 100644
--- a/be/src/vec/functions/functions_geo.cpp
+++ b/be/src/vec/functions/functions_geo.cpp
@@ -31,9 +31,12 @@ struct StPoint {
     static const size_t NUM_ARGS = 2;
     static Status execute(Block& block, const ColumnNumbers& arguments, size_t result) {
         DCHECK_EQ(arguments.size(), 2);
-        auto return_type = block.get_data_type(result);
-        auto column_x = block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
-        auto column_y = block.get_by_position(arguments[1]).column->convert_to_full_column_if_const();
+        auto return_type = remove_nullable(block.get_data_type(result));
+
+        auto column_x =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        auto column_y =
+                block.get_by_position(arguments[1]).column->convert_to_full_column_if_const();
 
         const auto size = column_x->size();
 
@@ -67,14 +70,15 @@ struct StAsWktName {
     static constexpr auto NAME = "st_aswkt";
 };
 
-template<typename FunctionName>
+template <typename FunctionName>
 struct StAsText {
     static constexpr auto NEED_CONTEXT = false;
     static constexpr auto NAME = FunctionName::NAME;
     static const size_t NUM_ARGS = 1;
-    static Status execute(Block& block, const ColumnNumbers& arguments,size_t result) {
+    static Status execute(Block& block, const ColumnNumbers& arguments, size_t result) {
         DCHECK_EQ(arguments.size(), 1);
-        auto return_type = block.get_data_type(result);
+        auto return_type = remove_nullable(block.get_data_type(result));
+
         auto input = block.get_by_position(arguments[0]).column;
 
         auto size = input->size();
@@ -106,9 +110,10 @@ struct StX {
     static constexpr auto NEED_CONTEXT = false;
     static constexpr auto NAME = "st_x";
     static const size_t NUM_ARGS = 1;
-    static Status execute(Block& block, const ColumnNumbers& arguments,size_t result) {
+    static Status execute(Block& block, const ColumnNumbers& arguments, size_t result) {
         DCHECK_EQ(arguments.size(), 1);
-        auto return_type = block.get_data_type(result);
+        auto return_type = remove_nullable(block.get_data_type(result));
+
         auto input = block.get_by_position(arguments[0]).column;
 
         auto size = input->size();
@@ -140,9 +145,10 @@ struct StY {
     static constexpr auto NEED_CONTEXT = false;
     static constexpr auto NAME = "st_y";
     static const size_t NUM_ARGS = 1;
-    static Status execute(Block& block, const ColumnNumbers& arguments,size_t result) {
+    static Status execute(Block& block, const ColumnNumbers& arguments, size_t result) {
         DCHECK_EQ(arguments.size(), 1);
-        auto return_type = block.get_data_type(result);
+        auto return_type = remove_nullable(block.get_data_type(result));
+
         auto input = block.get_by_position(arguments[0]).column;
 
         auto size = input->size();
@@ -176,7 +182,8 @@ struct StDistanceSphere {
     static const size_t NUM_ARGS = 4;
     static Status execute(Block& block, const ColumnNumbers& arguments, size_t result) {
         DCHECK_EQ(arguments.size(), 4);
-        auto return_type = block.get_data_type(result);
+        auto return_type = remove_nullable(block.get_data_type(result));
+
         auto x_lng = block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
         auto x_lat = block.get_by_position(arguments[1]).column->convert_to_full_column_if_const();
         auto y_lng = block.get_by_position(arguments[2]).column->convert_to_full_column_if_const();
@@ -189,16 +196,15 @@ struct StDistanceSphere {
         res = ColumnNullable::create(return_type->create_column(), ColumnUInt8::create());
 
         for (int row = 0; row < size; ++row) {
-            double distance;
+            double distance = 0;
             if (!GeoPoint::ComputeDistance(x_lng->operator[](row).get<Float64>(),
                                            x_lat->operator[](row).get<Float64>(),
                                            y_lng->operator[](row).get<Float64>(),
-                                           y_lat->operator[](row).get<Float64>(),
-                                           &distance)) {
+                                           y_lat->operator[](row).get<Float64>(), &distance)) {
                 res->insert_data(nullptr, 0);
                 continue;
             }
-            res->insert_data(const_cast<const char*>((char*) &distance), 0);
+            res->insert_data(const_cast<const char*>((char*)&distance), 0);
         }
 
         block.replace_by_position(result, std::move(res));
diff --git a/be/src/vec/functions/functions_geo.h b/be/src/vec/functions/functions_geo.h
index d18b808c3f..3002c55a10 100644
--- a/be/src/vec/functions/functions_geo.h
+++ b/be/src/vec/functions/functions_geo.h
@@ -42,7 +42,6 @@ public:
 
     Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
                         size_t result, size_t input_rows_count) override {
-        return Impl::execute(block, arguments, result);
         if constexpr (Impl::NEED_CONTEXT) {
             return Impl::execute(context, block, arguments, result);
         } else {


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[incubator-doris] 02/09: [Bug] Fix some bugs(rewrite rule/symbol transport) of like predicate (#8770)

Posted by mo...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch dev-1.0.1
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git

commit 01b8ef64cdd0d51c91525be7fc221e7db0c71de2
Author: morningman <mo...@163.com>
AuthorDate: Mon Apr 11 11:40:23 2022 +0800

    [Bug] Fix some bugs(rewrite rule/symbol transport) of like predicate (#8770)
---
 .../java/org/apache/doris/analysis/Analyzer.java   |  2 -
 .../org/apache/doris/analysis/LikePredicate.java   | 12 +++---
 .../java/org/apache/doris/catalog/FunctionSet.java | 10 +++--
 .../org/apache/doris/catalog/ScalarFunction.java   | 18 +++++---
 .../doris/rewrite/RewriteLikePredicateRule.java    | 50 ----------------------
 5 files changed, 25 insertions(+), 67 deletions(-)

diff --git a/fe/fe-core/src/main/java/org/apache/doris/analysis/Analyzer.java b/fe/fe-core/src/main/java/org/apache/doris/analysis/Analyzer.java
index 2addbbde7f..e46831100d 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/analysis/Analyzer.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/analysis/Analyzer.java
@@ -49,7 +49,6 @@ import org.apache.doris.rewrite.RewriteAliasFunctionRule;
 import org.apache.doris.rewrite.RewriteBinaryPredicatesRule;
 import org.apache.doris.rewrite.RewriteEncryptKeyRule;
 import org.apache.doris.rewrite.RewriteFromUnixTimeRule;
-import org.apache.doris.rewrite.RewriteLikePredicateRule;
 import org.apache.doris.rewrite.RewriteDateLiteralRule;
 import org.apache.doris.rewrite.mvrewrite.CountDistinctToBitmap;
 import org.apache.doris.rewrite.mvrewrite.CountDistinctToBitmapOrHLLRule;
@@ -317,7 +316,6 @@ public class Analyzer {
             rules.add(RewriteDateLiteralRule.INSTANCE);
             rules.add(RewriteEncryptKeyRule.INSTANCE);
             rules.add(RewriteAliasFunctionRule.INSTANCE);
-            rules.add(RewriteLikePredicateRule.INSTANCE);
             List<ExprRewriteRule> onceRules = Lists.newArrayList();
             onceRules.add(ExtractCommonFactorsRule.INSTANCE);
             onceRules.add(InferFiltersRule.INSTANCE);
diff --git a/fe/fe-core/src/main/java/org/apache/doris/analysis/LikePredicate.java b/fe/fe-core/src/main/java/org/apache/doris/analysis/LikePredicate.java
index 97d8f2afdd..8d72c2a112 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/analysis/LikePredicate.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/analysis/LikePredicate.java
@@ -113,14 +113,16 @@ public class LikePredicate extends Predicate {
     @Override
     public void analyzeImpl(Analyzer analyzer) throws AnalysisException {
         super.analyzeImpl(analyzer);
-        if (!getChild(0).getType().isStringType() && !getChild(0).getType().isFixedPointType()
-                && !getChild(0).getType().isNull()) {
+        if (getChild(0).getType().isObjectStored()) {
             throw new AnalysisException(
-              "left operand of " + op.toString() + " must be of type STRING or FIXED_POINT_TYPE: " + toSql());
+                    "left operand of " + op.toString() + " must not be Bitmap or HLL: " + toSql());
         }
         if (!getChild(1).getType().isStringType() && !getChild(1).getType().isNull()) {
-            throw new AnalysisException(
-              "right operand of " + op.toString() + " must be of type STRING: " + toSql());
+            throw new AnalysisException("right operand of " + op.toString() + " must be of type STRING: " + toSql());
+        }
+
+        if (!getChild(0).getType().isStringType()) {
+            uncheckedCastChild(Type.VARCHAR, 0);
         }
 
         fn = getBuiltinFunction(analyzer, op.toString(),
diff --git a/fe/fe-core/src/main/java/org/apache/doris/catalog/FunctionSet.java b/fe/fe-core/src/main/java/org/apache/doris/catalog/FunctionSet.java
index 9ec3cfef5c..60912c9bdf 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/catalog/FunctionSet.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/catalog/FunctionSet.java
@@ -1201,10 +1201,12 @@ public class FunctionSet<min_initIN9doris_udf12DecimalV2ValEEEvPNS2_15FunctionCo
             vecFns = Lists.newArrayList();
             vectorizedFunctions.put(fn.functionName(), vecFns);
         }
-        ScalarFunction scalarFunction = (ScalarFunction)fn;
-        vecFns.add(ScalarFunction.createVecBuiltin(scalarFunction.functionName(), scalarFunction.getSymbolName(),
-                Lists.newArrayList(scalarFunction.getArgs()), scalarFunction.hasVarArgs(),
-                scalarFunction.getReturnType(), scalarFunction.isUserVisible(), scalarFunction.getNullableMode()));
+        ScalarFunction scalarFunction = (ScalarFunction) fn;
+        vecFns.add(ScalarFunction.createVecBuiltin(scalarFunction.functionName(), scalarFunction.getPrepareFnSymbol(),
+                        scalarFunction.getSymbolName(), scalarFunction.getCloseFnSymbol(),
+                        Lists.newArrayList(scalarFunction.getArgs()), scalarFunction.hasVarArgs(),
+                        scalarFunction.getReturnType(), scalarFunction.isUserVisible(),
+                        scalarFunction.getNullableMode()));
     }
 
 
diff --git a/fe/fe-core/src/main/java/org/apache/doris/catalog/ScalarFunction.java b/fe/fe-core/src/main/java/org/apache/doris/catalog/ScalarFunction.java
index 66f0a6252b..d9c4b0252e 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/catalog/ScalarFunction.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/catalog/ScalarFunction.java
@@ -284,16 +284,22 @@ public class ScalarFunction extends Function {
 
     public static ScalarFunction createVecBuiltinOperator(
             String name, String symbol, ArrayList<Type> argTypes, Type retType, NullableMode nullableMode) {
-        return createVecBuiltin(name, symbol, argTypes, false, retType, false, nullableMode);
+        return createVecBuiltin(name, null, symbol, null, argTypes, false, retType, false, nullableMode);
     }
 
     //TODO: This method should not be here, move to other place in the future
-    public static ScalarFunction createVecBuiltin(
-            String name, String symbol, ArrayList<Type> argTypes,
-            boolean hasVarArgs, Type retType, boolean userVisible, NullableMode nullableMode) {
-        ScalarFunction fn = new ScalarFunction(
-                new FunctionName(name), argTypes, retType, hasVarArgs, userVisible, true);
+    public static ScalarFunction createVecBuiltin(String name, String prepareFnSymbolBName, String symbol,
+            String closeFnSymbolName, ArrayList<Type> argTypes, boolean hasVarArgs, Type retType, boolean userVisible,
+            NullableMode nullableMode) {
+        ScalarFunction fn = new ScalarFunction(new FunctionName(name), argTypes, retType, hasVarArgs, userVisible,
+                true);
+        if (prepareFnSymbolBName != null) {
+            fn.prepareFnSymbol = prepareFnSymbolBName;
+        }
         fn.symbolName = symbol;
+        if (closeFnSymbolName != null) {
+            fn.closeFnSymbol = closeFnSymbolName;
+        }
         fn.nullableMode = nullableMode;
         return fn;
     }
diff --git a/fe/fe-core/src/main/java/org/apache/doris/rewrite/RewriteLikePredicateRule.java b/fe/fe-core/src/main/java/org/apache/doris/rewrite/RewriteLikePredicateRule.java
deleted file mode 100644
index c9679e8a33..0000000000
--- a/fe/fe-core/src/main/java/org/apache/doris/rewrite/RewriteLikePredicateRule.java
+++ /dev/null
@@ -1,50 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-package org.apache.doris.rewrite;
-
-import org.apache.doris.analysis.Analyzer;
-import org.apache.doris.analysis.CastExpr;
-import org.apache.doris.analysis.Expr;
-import org.apache.doris.analysis.LikePredicate;
-import org.apache.doris.analysis.SlotRef;
-import org.apache.doris.analysis.TypeDef;
-import org.apache.doris.catalog.Type;
-import org.apache.doris.common.AnalysisException;
-
-/**
- * Rewrite `int` to `string` in like predicate
- * in order to support `int` in like predicate, same as MySQL
- */
-public class RewriteLikePredicateRule implements ExprRewriteRule {
-    public static RewriteLikePredicateRule INSTANCE = new RewriteLikePredicateRule();
-
-    @Override
-    public Expr apply(Expr expr, Analyzer analyzer, ExprRewriter.ClauseType clauseType) throws AnalysisException {
-        if (expr instanceof LikePredicate) {
-            Expr leftChild = expr.getChild(0);
-            if (leftChild instanceof SlotRef) {
-                Type type = leftChild.getType();
-                if (type.isFixedPointType()) {
-                    return new LikePredicate(((LikePredicate) expr).getOp(),
-                            new CastExpr(new TypeDef(Type.VARCHAR), leftChild), expr.getChild(1));
-                }
-            }
-        }
-        return expr;
-    }
-}


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[incubator-doris] 04/09: [feature](function)(vectorized) Support all geolocation functions on vectorized engine (#8846)

Posted by mo...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch dev-1.0.1
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git

commit d003cc7524520b4c1d4532683f6d07998737d946
Author: Gabriel <ga...@gmail.com>
AuthorDate: Mon Apr 11 09:36:53 2022 +0800

    [feature](function)(vectorized) Support all geolocation functions on vectorized engine (#8846)
---
 be/src/geo/geo_functions.cpp                       |  18 --
 be/src/geo/geo_functions.h                         |  20 ++
 be/src/vec/functions/functions_geo.cpp             | 318 +++++++++++++++++++++
 be/src/vec/functions/functions_geo.h               |  21 ++
 be/test/vec/function/function_geo_test.cpp         | 197 +++++++++++++
 .../vectorized-execution-engine.md                 |   9 +-
 .../vectorized-execution-engine.md                 |   9 +-
 gensrc/script/doris_builtins_functions.py          |  19 +-
 8 files changed, 574 insertions(+), 37 deletions(-)

diff --git a/be/src/geo/geo_functions.cpp b/be/src/geo/geo_functions.cpp
index 4baf9245cc..33ea5ca0c0 100644
--- a/be/src/geo/geo_functions.cpp
+++ b/be/src/geo/geo_functions.cpp
@@ -106,14 +106,6 @@ StringVal GeoFunctions::st_as_wkt(doris_udf::FunctionContext* ctx,
     return result;
 }
 
-struct StConstructState {
-    StConstructState() : is_null(false) {}
-    ~StConstructState() {}
-
-    bool is_null;
-    std::string encoded_buf;
-};
-
 void GeoFunctions::st_from_wkt_close(FunctionContext* ctx,
                                      FunctionContext::FunctionStateScope scope) {
     if (scope != FunctionContext::FRAGMENT_LOCAL) {
@@ -229,16 +221,6 @@ doris_udf::StringVal GeoFunctions::st_circle(FunctionContext* ctx, const DoubleV
     }
 }
 
-struct StContainsState {
-    StContainsState() : is_null(false), shapes{nullptr, nullptr} {}
-    ~StContainsState() {
-        delete shapes[0];
-        delete shapes[1];
-    }
-    bool is_null;
-    GeoShape* shapes[2];
-};
-
 void GeoFunctions::st_contains_prepare(doris_udf::FunctionContext* ctx,
                                        doris_udf::FunctionContext::FunctionStateScope scope) {
     if (scope != FunctionContext::FRAGMENT_LOCAL) {
diff --git a/be/src/geo/geo_functions.h b/be/src/geo/geo_functions.h
index ef1e896e2f..90f7d6bccb 100644
--- a/be/src/geo/geo_functions.h
+++ b/be/src/geo/geo_functions.h
@@ -18,6 +18,7 @@
 #pragma once
 
 #include "geo/geo_common.h"
+#include "geo/geo_types.h"
 #include "udf/udf.h"
 
 namespace doris {
@@ -107,4 +108,23 @@ public:
                                   doris_udf::FunctionContext::FunctionStateScope);
 };
 
+struct StConstructState {
+    StConstructState() : is_null(false) {}
+    ~StConstructState() {}
+
+    bool is_null;
+    std::string encoded_buf;
+};
+
+
+struct StContainsState {
+    StContainsState() : is_null(false), shapes{nullptr, nullptr} {}
+    ~StContainsState() {
+        delete shapes[0];
+        delete shapes[1];
+    }
+    bool is_null;
+    GeoShape* shapes[2];
+};
+
 } // namespace doris
diff --git a/be/src/vec/functions/functions_geo.cpp b/be/src/vec/functions/functions_geo.cpp
index 3b8adfc878..ba4c370506 100644
--- a/be/src/vec/functions/functions_geo.cpp
+++ b/be/src/vec/functions/functions_geo.cpp
@@ -18,12 +18,15 @@
 #include "vec/functions/functions_geo.h"
 
 #include "geo/geo_types.h"
+#include "geo/geo_functions.h"
 #include "gutil/strings/substitute.h"
+#include "vec/columns/column_const.h"
 #include "vec/functions/simple_function_factory.h"
 
 namespace doris::vectorized {
 
 struct StPoint {
+    static constexpr auto NEED_CONTEXT = false;
     static constexpr auto NAME = "st_point";
     static const size_t NUM_ARGS = 2;
     static Status execute(Block& block, const ColumnNumbers& arguments, size_t result) {
@@ -66,6 +69,7 @@ struct StAsWktName {
 
 template<typename FunctionName>
 struct StAsText {
+    static constexpr auto NEED_CONTEXT = false;
     static constexpr auto NAME = FunctionName::NAME;
     static const size_t NUM_ARGS = 1;
     static Status execute(Block& block, const ColumnNumbers& arguments,size_t result) {
@@ -99,6 +103,7 @@ struct StAsText {
 };
 
 struct StX {
+    static constexpr auto NEED_CONTEXT = false;
     static constexpr auto NAME = "st_x";
     static const size_t NUM_ARGS = 1;
     static Status execute(Block& block, const ColumnNumbers& arguments,size_t result) {
@@ -132,6 +137,7 @@ struct StX {
 };
 
 struct StY {
+    static constexpr auto NEED_CONTEXT = false;
     static constexpr auto NAME = "st_y";
     static const size_t NUM_ARGS = 1;
     static Status execute(Block& block, const ColumnNumbers& arguments,size_t result) {
@@ -165,6 +171,7 @@ struct StY {
 };
 
 struct StDistanceSphere {
+    static constexpr auto NEED_CONTEXT = false;
     static constexpr auto NAME = "st_distance_sphere";
     static const size_t NUM_ARGS = 4;
     static Status execute(Block& block, const ColumnNumbers& arguments, size_t result) {
@@ -199,6 +206,308 @@ struct StDistanceSphere {
     }
 };
 
+struct StCircle {
+    static constexpr auto NEED_CONTEXT = true;
+    static constexpr auto NAME = "st_circle";
+    static const size_t NUM_ARGS = 3;
+    static Status execute(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                          size_t result) {
+        DCHECK_EQ(arguments.size(), 3);
+        auto return_type = remove_nullable(block.get_data_type(result));
+        auto center_lng = block.get_by_position(arguments[0])
+                                  .column->convert_to_full_column_if_const();
+        auto center_lat = block.get_by_position(arguments[1])
+                                  .column->convert_to_full_column_if_const();
+        auto radius = block.get_by_position(arguments[2])
+                              .column->convert_to_full_column_if_const();
+
+        const auto size = center_lng->size();
+
+        MutableColumnPtr res = nullptr;
+        auto null_type = std::reinterpret_pointer_cast<const DataTypeNullable>(return_type);
+        res = ColumnNullable::create(return_type->create_column(), ColumnUInt8::create());
+
+        StConstructState* state =
+                (StConstructState*) context->get_function_state(FunctionContext::FRAGMENT_LOCAL);
+        if (state == nullptr) {
+            GeoCircle circle;
+            std::string buf;
+            for (int row = 0; row < size; ++row) {
+                auto lng_value = center_lng->get_float64(row);
+                auto lat_value = center_lat->get_float64(row);
+                auto radius_value = radius->get_float64(row);
+
+                auto value = circle.init(lng_value, lat_value, radius_value);
+                if (value != GEO_PARSE_OK) {
+                    res->insert_data(nullptr, 0);
+                    continue;
+                }
+                buf.clear();
+                circle.encode_to(&buf);
+                res->insert_data(buf.data(), buf.size());
+            }
+            block.replace_by_position(result, std::move(res));
+        } else {
+            if (state->is_null) {
+                res->insert_data(nullptr, 0);
+                block.replace_by_position(result, ColumnConst::create(std::move(res), size));
+            } else {
+                res->insert_data(state->encoded_buf.data(), state->encoded_buf.size());
+                block.replace_by_position(result, ColumnConst::create(std::move(res), size));
+            }
+        }
+        return Status::OK();
+    }
+
+    static Status prepare(FunctionContext* context, FunctionContext::FunctionStateScope scope) {
+        if (scope != FunctionContext::FRAGMENT_LOCAL) {
+            return Status::OK();
+        }
+
+        if (!context->is_arg_constant(0) || !context->is_arg_constant(1)
+            || !context->is_arg_constant(2)) {
+            return Status::OK();
+        }
+
+        auto state = new StConstructState();
+        DoubleVal* lng = reinterpret_cast<DoubleVal*>(context->get_constant_arg(0));
+        DoubleVal* lat = reinterpret_cast<DoubleVal*>(context->get_constant_arg(1));
+        DoubleVal* radius = reinterpret_cast<DoubleVal*>(context->get_constant_arg(2));
+        if (lng->is_null || lat->is_null || radius->is_null) {
+            state->is_null = true;
+        } else {
+            std::unique_ptr<GeoCircle> circle(new GeoCircle());
+
+            auto res = circle->init(lng->val, lat->val, radius->val);
+            if (res != GEO_PARSE_OK) {
+                state->is_null = true;
+            } else {
+                circle->encode_to(&state->encoded_buf);
+            }
+        }
+        context->set_function_state(scope, state);
+
+        return Status::OK();
+    }
+
+    static Status close(FunctionContext* context, FunctionContext::FunctionStateScope scope) {
+        if (scope != FunctionContext::FRAGMENT_LOCAL) {
+            return Status::OK();
+        }
+        StConstructState* state = reinterpret_cast<StConstructState*>(
+                context->get_function_state(scope));
+        delete state;
+        return Status::OK();
+    }
+};
+
+struct StContains {
+    static constexpr auto NEED_CONTEXT = true;
+    static constexpr auto NAME = "st_contains";
+    static const size_t NUM_ARGS = 2;
+    static Status execute(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                          size_t result) {
+        DCHECK_EQ(arguments.size(), 2);
+        auto return_type = remove_nullable(block.get_data_type(result));
+        auto shape1 = block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        auto shape2 = block.get_by_position(arguments[1]).column->convert_to_full_column_if_const();
+
+        const auto size = shape1->size();
+        auto null_type = std::reinterpret_pointer_cast<const DataTypeNullable>(return_type);
+        auto res = ColumnNullable::create(return_type->create_column(), ColumnUInt8::create());
+
+        StContainsState* state = (StContainsState*) context->get_function_state(FunctionContext::FRAGMENT_LOCAL);
+        if (state != nullptr && state->is_null) {
+            res->insert_data(nullptr, 0);
+            block.replace_by_position(result, ColumnConst::create(std::move(res), size));
+            return Status::OK();
+        }
+
+        StContainsState local_state;
+        int i;
+        GeoShape* shapes[2] = {nullptr, nullptr};
+        for (int row = 0; row < size; ++row) {
+            auto lhs_value = shape1->get_data_at(row);
+            auto rhs_value = shape2->get_data_at(row);
+            StringRef* strs[2] = {&lhs_value, &rhs_value};
+            for (i = 0; i < 2; ++i) {
+                if (state != nullptr && state->shapes[i] != nullptr) {
+                    shapes[i] = state->shapes[i];
+                } else {
+                    shapes[i] = local_state.shapes[i] = GeoShape::from_encoded(strs[i]->data, strs[i]->size);
+                    if (shapes[i] == nullptr) {
+                        res->insert_data(nullptr, 0);
+                        break;
+                    }
+                }
+            }
+
+            if (i == 2) {
+                auto contains_value = shapes[0]->contains(shapes[1]);
+                res->insert_data(const_cast<const char*>((char*)&contains_value), 0);
+            }
+        }
+        block.replace_by_position(result, std::move(res));
+        return Status::OK();
+    }
+
+    static Status prepare(FunctionContext* context, FunctionContext::FunctionStateScope scope) {
+        if (scope != FunctionContext::FRAGMENT_LOCAL) {
+            return Status::OK();
+        }
+
+        if (!context->is_arg_constant(0) && !context->is_arg_constant(1)) {
+            return Status::OK();
+        }
+
+        auto contains_ctx = new StContainsState();
+        for (int i = 0; !contains_ctx->is_null && i < 2; ++i) {
+            if (context->is_arg_constant(i)) {
+                StringVal* str = reinterpret_cast<StringVal*>(context->get_constant_arg(i));
+                if (str->is_null) {
+                    contains_ctx->is_null = true;
+                } else {
+                    contains_ctx->shapes[i] = GeoShape::from_encoded(str->ptr, str->len);
+                    if (contains_ctx->shapes[i] == nullptr) {
+                        contains_ctx->is_null = true;
+                    }
+                }
+            }
+        }
+
+        context->set_function_state(scope, contains_ctx);
+        return Status::OK();
+    }
+
+    static Status close(FunctionContext* context, FunctionContext::FunctionStateScope scope) {
+        if (scope != FunctionContext::FRAGMENT_LOCAL) {
+            return Status::OK();
+        }
+        StContainsState* state = reinterpret_cast<StContainsState*>(
+                context->get_function_state(scope));
+        delete state;
+        return Status::OK();
+    }
+};
+
+struct StGeometryFromText{
+    static constexpr auto NAME = "st_geometryfromtext";
+    static constexpr GeoShapeType shape_type = GEO_SHAPE_ANY;
+};
+
+struct StGeomFromText{
+    static constexpr auto NAME = "st_geomfromtext";
+    static constexpr GeoShapeType shape_type = GEO_SHAPE_ANY;
+};
+
+struct StLineFromText{
+    static constexpr auto NAME = "st_linefromtext";
+    static constexpr GeoShapeType shape_type = GEO_SHAPE_LINE_STRING;
+};
+
+struct StLineStringFromText{
+    static constexpr auto NAME = "st_linestringfromtext";
+    static constexpr GeoShapeType shape_type = GEO_SHAPE_LINE_STRING;
+};
+
+struct StPolygon{
+    static constexpr auto NAME = "st_polygon";
+    static constexpr GeoShapeType shape_type = GEO_SHAPE_POLYGON;
+};
+
+struct StPolyFromText{
+    static constexpr auto NAME = "st_polyfromtext";
+    static constexpr GeoShapeType shape_type = GEO_SHAPE_POLYGON;
+};
+
+struct StPolygonFromText{
+    static constexpr auto NAME = "st_polygonfromtext";
+    static constexpr GeoShapeType shape_type = GEO_SHAPE_POLYGON;
+};
+
+template<typename Impl>
+struct StGeoFromText {
+    static constexpr auto NEED_CONTEXT = true;
+    static constexpr auto NAME = Impl::NAME;
+    static const size_t NUM_ARGS = 1;
+    static Status execute(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                          size_t result) {
+        DCHECK_EQ(arguments.size(), 1);
+        auto return_type = remove_nullable(block.get_data_type(result));
+        auto geo = block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+
+        const auto size = geo->size();
+        auto null_type = std::reinterpret_pointer_cast<const DataTypeNullable>(return_type);
+        auto res = ColumnNullable::create(return_type->create_column(), ColumnUInt8::create());
+
+        StConstructState* state = (StConstructState*) context->get_function_state(FunctionContext::FRAGMENT_LOCAL);
+        if (state == nullptr) {
+            GeoParseStatus status;
+            std::string buf;
+            for (int row = 0; row < size; ++row) {
+                auto value = geo->get_data_at(row);
+                std::unique_ptr<GeoShape> shape(GeoShape::from_wkt(value.data, value.size, &status));
+                if (shape == nullptr || status != GEO_PARSE_OK ||
+                    (Impl::shape_type != GEO_SHAPE_ANY && shape->type() != Impl::shape_type)) {
+                    res->insert_data(nullptr, 0);
+                    continue;
+                }
+                buf.clear();
+                shape->encode_to(&buf);
+                res->insert_data(buf.data(), buf.size());
+            }
+            block.replace_by_position(result, std::move(res));
+        } else {
+            if (state->is_null) {
+                res->insert_data(nullptr, 0);
+                block.replace_by_position(result, ColumnConst::create(std::move(res), size));
+            } else {
+                res->insert_data(state->encoded_buf.data(), state->encoded_buf.size());
+                block.replace_by_position(result, ColumnConst::create(std::move(res), size));
+            }
+        }
+        return Status::OK();
+    }
+
+    static Status prepare(FunctionContext* context, FunctionContext::FunctionStateScope scope) {
+        if (scope != FunctionContext::FRAGMENT_LOCAL) {
+            return Status::OK();
+        }
+
+        if (!context->is_arg_constant(0)) {
+            return Status::OK();
+        }
+
+        auto state = new StConstructState();
+        auto str_value = reinterpret_cast<StringVal*>(context->get_constant_arg(0));
+        if (str_value->is_null) {
+            state->is_null = true;
+        } else {
+            GeoParseStatus status;
+            std::unique_ptr<GeoShape> shape(GeoShape::from_wkt(const_cast<const char*>(
+                (char*)str_value->ptr), str_value->len, &status));
+            if (shape == nullptr ||
+                (Impl::shape_type != GEO_SHAPE_ANY && shape->type() != Impl::shape_type)) {
+                state->is_null = true;
+            } else {
+                shape->encode_to(&state->encoded_buf);
+            }
+        }
+
+        context->set_function_state(scope, state);
+        return Status::OK();
+    }
+
+    static Status close(FunctionContext* context, FunctionContext::FunctionStateScope scope) {
+        if (scope == FunctionContext::FRAGMENT_LOCAL) {
+            StConstructState* state = reinterpret_cast<StConstructState*>(
+                    context->get_function_state(scope));
+            delete state;
+        }
+        return Status::OK();
+    }
+};
+
 void register_geo_functions(SimpleFunctionFactory& factory) {
     factory.register_function<GeoFunction<StPoint>>();
     factory.register_function<GeoFunction<StAsText<StAsWktName>>>();
@@ -206,6 +515,15 @@ void register_geo_functions(SimpleFunctionFactory& factory) {
     factory.register_function<GeoFunction<StX, DataTypeFloat64>>();
     factory.register_function<GeoFunction<StY, DataTypeFloat64>>();
     factory.register_function<GeoFunction<StDistanceSphere, DataTypeFloat64>>();
+    factory.register_function<GeoFunction<StContains, DataTypeUInt8>>();
+    factory.register_function<GeoFunction<StCircle>>();
+    factory.register_function<GeoFunction<StGeoFromText<StGeometryFromText>>>();
+    factory.register_function<GeoFunction<StGeoFromText<StGeomFromText>>>();
+    factory.register_function<GeoFunction<StGeoFromText<StLineFromText>>>();
+    factory.register_function<GeoFunction<StGeoFromText<StLineStringFromText>>>();
+    factory.register_function<GeoFunction<StGeoFromText<StPolygon>>>();
+    factory.register_function<GeoFunction<StGeoFromText<StPolygonFromText>>>();
+    factory.register_function<GeoFunction<StGeoFromText<StPolyFromText>>>();
 }
 
 } // namespace doris::vectorized
diff --git a/be/src/vec/functions/functions_geo.h b/be/src/vec/functions/functions_geo.h
index 4533adb6bc..d18b808c3f 100644
--- a/be/src/vec/functions/functions_geo.h
+++ b/be/src/vec/functions/functions_geo.h
@@ -43,6 +43,27 @@ public:
     Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
                         size_t result, size_t input_rows_count) override {
         return Impl::execute(block, arguments, result);
+        if constexpr (Impl::NEED_CONTEXT) {
+            return Impl::execute(context, block, arguments, result);
+        } else {
+            return Impl::execute(block, arguments, result);
+        }
+    }
+
+    Status prepare(FunctionContext* context, FunctionContext::FunctionStateScope scope) override {
+        if constexpr (Impl::NEED_CONTEXT) {
+            return Impl::prepare(context, scope);
+        } else {
+            return Status::OK();
+        }
+    }
+
+    Status close(FunctionContext* context, FunctionContext::FunctionStateScope scope) override {
+        if constexpr (Impl::NEED_CONTEXT) {
+            return Impl::close(context, scope);
+        } else {
+            return Status::OK();
+        }
     }
 };
 
diff --git a/be/test/vec/function/function_geo_test.cpp b/be/test/vec/function/function_geo_test.cpp
index 15925f0b96..f514233459 100644
--- a/be/test/vec/function/function_geo_test.cpp
+++ b/be/test/vec/function/function_geo_test.cpp
@@ -142,6 +142,203 @@ TEST(function_geo_test, function_geo_st_distance_sphere) {
     }
 }
 
+TEST(function_geo_test, function_geo_st_contains) {
+    std::string func_name = "st_contains";
+    {
+        InputTypeSet input_types = {TypeIndex::String, TypeIndex::String};
+
+        std::string buf1;
+        std::string buf2;
+        std::string buf3;
+        GeoParseStatus status;
+
+        std::string shape1 = std::string("POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0))");
+        std::unique_ptr<GeoShape> shape(
+                GeoShape::from_wkt(shape1.data(), shape1.size(), &status));
+        ASSERT_TRUE(status == GEO_PARSE_OK);
+        ASSERT_TRUE(shape != nullptr);
+        shape->encode_to(&buf1);
+
+        GeoPoint point1;
+        status = point1.from_coord(5, 5);
+        ASSERT_TRUE(status == GEO_PARSE_OK);
+        point1.encode_to(&buf2);
+
+        GeoPoint point2;
+        status = point2.from_coord(50, 50);
+        ASSERT_TRUE(status == GEO_PARSE_OK);
+        point2.encode_to(&buf3);
+
+        DataSet data_set = {
+                {{buf1, buf2}, (uint8_t) 1},
+                {{buf1, buf3}, (uint8_t) 0},
+                {{buf1, Null()}, Null()},
+                {{Null(), buf3}, Null()}};
+
+        check_function<DataTypeUInt8 , true>(func_name, input_types, data_set);
+    }
+}
+
+TEST(function_geo_test, function_geo_st_circle) {
+    std::string func_name = "st_circle";
+    {
+        InputTypeSet input_types = {TypeIndex::Float64, TypeIndex::Float64, TypeIndex::Float64};
+
+        GeoCircle circle;
+        std::string buf;
+        auto value = circle.init(111, 64, 10000);
+        ASSERT_TRUE(value == GEO_PARSE_OK);
+        circle.encode_to(&buf);
+        DataSet data_set = {
+                {{(double) 111, (double) 64, (double) 10000}, buf},
+                {{Null(), (double) 64, (double) 10000}, Null()},
+                {{(double) 111, Null(), (double) 10000}, Null()},
+                {{(double) 111, (double) 64, Null()}, Null()}};
+
+        check_function<DataTypeString , true>(func_name, input_types, data_set);
+    }
+}
+
+TEST(function_geo_test, function_geo_st_geometryfromtext) {
+    std::string func_name = "st_geometryfromtext";
+    {
+        InputTypeSet input_types = {TypeIndex::String};
+
+        GeoParseStatus status;
+        std::string buf;
+        std::string input = "LINESTRING (1 1, 2 2)";
+        std::unique_ptr<GeoShape> shape(GeoShape::from_wkt(input.data(), input.size(), &status));
+        ASSERT_TRUE(shape != nullptr);
+        ASSERT_TRUE(status == GEO_PARSE_OK);
+        shape->encode_to(&buf);
+        DataSet data_set = {
+                {{std::string("LINESTRING (1 1, 2 2)")}, buf},
+                {{Null()}, Null()}};
+
+        check_function<DataTypeString , true>(func_name, input_types, data_set);
+    }
+}
+
+TEST(function_geo_test, function_geo_st_geomfromtext) {
+    std::string func_name = "st_geomfromtext";
+    {
+        InputTypeSet input_types = {TypeIndex::String};
+
+        GeoParseStatus status;
+        std::string buf;
+        std::string input = "LINESTRING (1 1, 2 2)";
+        std::unique_ptr<GeoShape> shape(GeoShape::from_wkt(input.data(), input.size(), &status));
+        ASSERT_TRUE(shape != nullptr);
+        ASSERT_TRUE(status == GEO_PARSE_OK);
+        shape->encode_to(&buf);
+        DataSet data_set = {
+            {{std::string("LINESTRING (1 1, 2 2)")}, buf},
+            {{Null()}, Null()}};
+
+        check_function<DataTypeString , true>(func_name, input_types, data_set);
+    }
+}
+
+TEST(function_geo_test, function_geo_st_linefromtext) {
+    std::string func_name = "st_linefromtext";
+    {
+        InputTypeSet input_types = {TypeIndex::String};
+
+        GeoParseStatus status;
+        std::string buf;
+        std::string input = "LINESTRING (1 1, 2 2)";
+        std::unique_ptr<GeoShape> shape(GeoShape::from_wkt(input.data(), input.size(), &status));
+        ASSERT_TRUE(shape != nullptr);
+        ASSERT_TRUE(status == GEO_PARSE_OK);
+        shape->encode_to(&buf);
+        DataSet data_set = {
+            {{std::string("LINESTRING (1 1, 2 2)")}, buf},
+            {{Null()}, Null()}};
+
+        check_function<DataTypeString , true>(func_name, input_types, data_set);
+    }
+}
+
+TEST(function_geo_test, function_geo_st_linestringfromtext) {
+    std::string func_name = "st_linestringfromtext";
+    {
+        InputTypeSet input_types = {TypeIndex::String};
+
+        GeoParseStatus status;
+        std::string buf;
+        std::string input = "LINESTRING (1 1, 2 2)";
+        std::unique_ptr<GeoShape> shape(GeoShape::from_wkt(input.data(), input.size(), &status));
+        ASSERT_TRUE(shape != nullptr);
+        ASSERT_TRUE(status == GEO_PARSE_OK);
+        shape->encode_to(&buf);
+        DataSet data_set = {
+                {{std::string("LINESTRING (1 1, 2 2)")}, buf},
+                {{Null()}, Null()}};
+
+        check_function<DataTypeString , true>(func_name, input_types, data_set);
+    }
+}
+
+TEST(function_geo_test, function_geo_st_polygon) {
+    std::string func_name = "st_polygon";
+    {
+        InputTypeSet input_types = {TypeIndex::String};
+
+        GeoParseStatus status;
+        std::string buf;
+        std::string input = "POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0))";
+        std::unique_ptr<GeoShape> shape(GeoShape::from_wkt(input.data(), input.size(), &status));
+        ASSERT_TRUE(shape != nullptr);
+        ASSERT_TRUE(status == GEO_PARSE_OK);
+        shape->encode_to(&buf);
+        DataSet data_set = {
+                {{std::string("POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0))")}, buf},
+                {{Null()}, Null()}};
+
+        check_function<DataTypeString , true>(func_name, input_types, data_set);
+    }
+}
+
+TEST(function_geo_test, function_geo_st_polygonfromtext) {
+    std::string func_name = "st_polygonfromtext";
+    {
+        InputTypeSet input_types = {TypeIndex::String};
+
+        GeoParseStatus status;
+        std::string buf;
+        std::string input = "POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0))";
+        std::unique_ptr<GeoShape> shape(GeoShape::from_wkt(input.data(), input.size(), &status));
+        ASSERT_TRUE(shape != nullptr);
+        ASSERT_TRUE(status == GEO_PARSE_OK);
+        shape->encode_to(&buf);
+        DataSet data_set = {
+                {{std::string("POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0))")}, buf},
+                {{Null()}, Null()}};
+
+        check_function<DataTypeString , true>(func_name, input_types, data_set);
+    }
+}
+
+TEST(function_geo_test, function_geo_st_polyfromtext) {
+    std::string func_name = "st_polyfromtext";
+    {
+        InputTypeSet input_types = {TypeIndex::String};
+
+        GeoParseStatus status;
+        std::string buf;
+        std::string input = "POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0))";
+        std::unique_ptr<GeoShape> shape(GeoShape::from_wkt(input.data(), input.size(), &status));
+        ASSERT_TRUE(shape != nullptr);
+        ASSERT_TRUE(status == GEO_PARSE_OK);
+        shape->encode_to(&buf);
+        DataSet data_set = {
+                {{std::string("POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0))")}, buf},
+                {{Null()}, Null()}};
+
+        check_function<DataTypeString , true>(func_name, input_types, data_set);
+    }
+}
+
 } // namespace doris::vectorized
 
 int main(int argc, char** argv) {
diff --git a/docs/en/administrator-guide/vectorized-execution-engine.md b/docs/en/administrator-guide/vectorized-execution-engine.md
index 37d9bcafa8..6f5a07c40b 100644
--- a/docs/en/administrator-guide/vectorized-execution-engine.md
+++ b/docs/en/administrator-guide/vectorized-execution-engine.md
@@ -116,8 +116,7 @@ In most scenarios, users only need to turn on the session variable by default to
 
 #### Type B
 
-1. The `geolocation function` is not supported, including all functions starting with `ST_` in the function. For details, please refer to the section on SQL functions in the official documentation.
-2. The `UDF` and `UDAF` of the original row storage execution engine are not supported.
-3. The maximum length of `string/text` type is 1MB instead of the default 2GB. That is, when the vectorization engine is turned on, it is impossible to query or import strings larger than 1MB. However, if you turn off the vectorization engine, you can still query and import normally.
-4. The export method of `select ... into outfile` is not supported.
-5. Extrenal broker appearance is not supported.
+1. The `UDF` and `UDAF` of the original row storage execution engine are not supported.
+2. The maximum length of `string/text` type is 1MB instead of the default 2GB. That is, when the vectorization engine is turned on, it is impossible to query or import strings larger than 1MB. However, if you turn off the vectorization engine, you can still query and import normally.
+3. The export method of `select ... into outfile` is not supported.
+4. Extrenal broker appearance is not supported.
diff --git a/docs/zh-CN/administrator-guide/vectorized-execution-engine.md b/docs/zh-CN/administrator-guide/vectorized-execution-engine.md
index f97be1f540..4bcd1fa999 100644
--- a/docs/zh-CN/administrator-guide/vectorized-execution-engine.md
+++ b/docs/zh-CN/administrator-guide/vectorized-execution-engine.md
@@ -113,8 +113,7 @@ set batch_size = 4096;
 4. bitmap/hll 类型在向量化执行引擎中：输入均为NULL，则输出的结果为NULL而不是0。
 
 #### b类
-1. 不支持`地理位置函数` ，包含了函数中所有以`ST_`开头的函数。具体请参考官方文档SQL函数的部分。
-2. 不支持原有行存执行引擎的`UDF`与`UDAF`。
-3. `string/text`类型最大长度支持为1MB，而不是默认的2GB。即当开启向量化引擎后，将无法查询或导入大于1MB的字符串。但如果关闭向量化引擎，则依然可以正常查询和导入。
-4. 不支持 `select ... into outfile` 的导出方式。 
-5. 不支持extrenal broker外表。
+1. 不支持原有行存执行引擎的`UDF`与`UDAF`。
+2. `string/text`类型最大长度支持为1MB，而不是默认的2GB。即当开启向量化引擎后，将无法查询或导入大于1MB的字符串。但如果关闭向量化引擎，则依然可以正常查询和导入。
+3. 不支持 `select ... into outfile` 的导出方式。 
+4. 不支持extrenal broker外表。
diff --git a/gensrc/script/doris_builtins_functions.py b/gensrc/script/doris_builtins_functions.py
index 2de4da6c14..9d1b602def 100755
--- a/gensrc/script/doris_builtins_functions.py
+++ b/gensrc/script/doris_builtins_functions.py
@@ -1326,7 +1326,8 @@ visible_functions = [
 
     # geo functions
     [['ST_Point'], 'VARCHAR', ['DOUBLE', 'DOUBLE'],
-        '_ZN5doris12GeoFunctions8st_pointEPN9doris_udf15FunctionContextERKNS1_9DoubleValES6_', '', '', 'vec', 'ALWAYS_NULLABLE'],
+        '_ZN5doris12GeoFunctions8st_pointEPN9doris_udf15FunctionContextERKNS1_9DoubleValES6_',
+        '', '', 'vec', 'ALWAYS_NULLABLE'],
     [['ST_X'], 'DOUBLE', ['VARCHAR'],
         '_ZN5doris12GeoFunctions4st_xEPN9doris_udf15FunctionContextERKNS1_9StringValE',
         '', '', 'vec', 'ALWAYS_NULLABLE'],
@@ -1354,46 +1355,46 @@ visible_functions = [
         '_ZN5doris12GeoFunctions11st_from_wktEPN9doris_udf15FunctionContextERKNS1_9StringValE',
         '_ZN5doris12GeoFunctions19st_from_wkt_prepareEPN9doris_udf15FunctionContextENS2_18FunctionStateScopeE',
         '_ZN5doris12GeoFunctions17st_from_wkt_closeEPN9doris_udf15FunctionContextENS2_18FunctionStateScopeE',
-        '', 'ALWAYS_NULLABLE'],
+        'vec', 'ALWAYS_NULLABLE'],
     [['ST_GeometryFromText', 'ST_GeomFromText'], 'VARCHAR', ['STRING'],
         '_ZN5doris12GeoFunctions11st_from_wktEPN9doris_udf15FunctionContextERKNS1_9StringValE',
         '_ZN5doris12GeoFunctions19st_from_wkt_prepareEPN9doris_udf15FunctionContextENS2_18FunctionStateScopeE',
         '_ZN5doris12GeoFunctions17st_from_wkt_closeEPN9doris_udf15FunctionContextENS2_18FunctionStateScopeE',
-        '', 'ALWAYS_NULLABLE'],
+        'vec', 'ALWAYS_NULLABLE'],
 
     [['ST_LineFromText', 'ST_LineStringFromText'], 'VARCHAR', ['VARCHAR'],
         '_ZN5doris12GeoFunctions7st_lineEPN9doris_udf15FunctionContextERKNS1_9StringValE',
         '_ZN5doris12GeoFunctions15st_line_prepareEPN9doris_udf15FunctionContextENS2_18FunctionStateScopeE',
         '_ZN5doris12GeoFunctions17st_from_wkt_closeEPN9doris_udf15FunctionContextENS2_18FunctionStateScopeE',
-        '', 'ALWAYS_NULLABLE'],
+        'vec', 'ALWAYS_NULLABLE'],
     [['ST_LineFromText', 'ST_LineStringFromText'], 'VARCHAR', ['STRING'],
         '_ZN5doris12GeoFunctions7st_lineEPN9doris_udf15FunctionContextERKNS1_9StringValE',
         '_ZN5doris12GeoFunctions15st_line_prepareEPN9doris_udf15FunctionContextENS2_18FunctionStateScopeE',
         '_ZN5doris12GeoFunctions17st_from_wkt_closeEPN9doris_udf15FunctionContextENS2_18FunctionStateScopeE',
-        '', 'ALWAYS_NULLABLE'],
+        'vec', 'ALWAYS_NULLABLE'],
 
     [['ST_Polygon', 'ST_PolyFromText', 'ST_PolygonFromText'], 'VARCHAR', ['VARCHAR'],
         '_ZN5doris12GeoFunctions10st_polygonEPN9doris_udf15FunctionContextERKNS1_9StringValE',
         '_ZN5doris12GeoFunctions18st_polygon_prepareEPN9doris_udf15FunctionContextENS2_18FunctionStateScopeE',
         '_ZN5doris12GeoFunctions17st_from_wkt_closeEPN9doris_udf15FunctionContextENS2_18FunctionStateScopeE',
-        '', 'ALWAYS_NULLABLE'],
+        'vec', 'ALWAYS_NULLABLE'],
     [['ST_Polygon', 'ST_PolyFromText', 'ST_PolygonFromText'], 'VARCHAR', ['STRING'],
         '_ZN5doris12GeoFunctions10st_polygonEPN9doris_udf15FunctionContextERKNS1_9StringValE',
         '_ZN5doris12GeoFunctions18st_polygon_prepareEPN9doris_udf15FunctionContextENS2_18FunctionStateScopeE',
         '_ZN5doris12GeoFunctions17st_from_wkt_closeEPN9doris_udf15FunctionContextENS2_18FunctionStateScopeE',
-        '', 'ALWAYS_NULLABLE'],
+        'vec', 'ALWAYS_NULLABLE'],
 
     [['ST_Circle'], 'VARCHAR', ['DOUBLE', 'DOUBLE', 'DOUBLE'],
         '_ZN5doris12GeoFunctions9st_circleEPN9doris_udf15FunctionContextERKNS1_9DoubleValES6_S6_',
         '_ZN5doris12GeoFunctions17st_circle_prepareEPN9doris_udf15FunctionContextENS2_18FunctionStateScopeE',
         '_ZN5doris12GeoFunctions17st_from_wkt_closeEPN9doris_udf15FunctionContextENS2_18FunctionStateScopeE',
-        '', ''],
+        'vec', 'ALWAYS_NULLABLE'],
 
     [['ST_Contains'], 'BOOLEAN', ['VARCHAR', 'VARCHAR'],
         '_ZN5doris12GeoFunctions11st_containsEPN9doris_udf15FunctionContextERKNS1_9StringValES6_',
         '_ZN5doris12GeoFunctions19st_contains_prepareEPN9doris_udf15FunctionContextENS2_18FunctionStateScopeE',
         '_ZN5doris12GeoFunctions17st_contains_closeEPN9doris_udf15FunctionContextENS2_18FunctionStateScopeE',
-        '', 'ALWAYS_NULLABLE'],
+        'vec', 'ALWAYS_NULLABLE'],
     # grouping sets functions
     [['grouping_id'], 'BIGINT', ['BIGINT'],
         '_ZN5doris21GroupingSetsFunctions11grouping_idEPN9doris_udf15FunctionContextERKNS1_9BigIntValE',


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[incubator-doris] 09/09: [fix](error-code) replace invalid format specifier (#8940)

Posted by mo...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch dev-1.0.1
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git

commit 4ba2fdb96eae746712a5311755c09e27a2c116b0
Author: Mingyu Chen <mo...@gmail.com>
AuthorDate: Sun Apr 10 20:37:32 2022 +0800

    [fix](error-code) replace invalid format specifier (#8940)
    
    change %lu and %ld to %d
---
 .../java/org/apache/doris/common/ErrorCode.java    | 90 +++++++++++-----------
 1 file changed, 45 insertions(+), 45 deletions(-)

diff --git a/fe/fe-core/src/main/java/org/apache/doris/common/ErrorCode.java b/fe/fe-core/src/main/java/org/apache/doris/common/ErrorCode.java
index bd9d58e985..d3ffad3760 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/common/ErrorCode.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/common/ErrorCode.java
@@ -112,7 +112,7 @@ public enum ErrorCode {
     ERR_BLOB_USED_AS_KEY(1073, new byte[]{'4', '2', '0', '0', '0'}, "BLOB column '%s' can't be used in key " +
             "specification with the used table type"),
     ERR_TOO_BIG_FIELDLENGTH(1074, new byte[]{'4', '2', '0', '0', '0'}, "Column length too big for column '%s' (max = " +
-            "%lu); use BLOB or TEXT instead"),
+            "%d); use BLOB or TEXT instead"),
     ERR_WRONG_AUTO_KEY(1075, new byte[]{'4', '2', '0', '0', '0'}, "Incorrect table definition; there can be only one " +
             "auto column and it must be defined as a key"),
     ERR_READY(1076, new byte[]{'H', 'Y', '0', '0', '0'}, "%s: ready for connections. Version: '%s' socket: '%s' port:" +
@@ -120,7 +120,7 @@ public enum ErrorCode {
     ERR_NORMAL_SHUTDOWN(1077, new byte[]{'H', 'Y', '0', '0', '0'}, "%s: Normal shutdown"),
     ERR_GOT_SIGNAL(1078, new byte[]{'H', 'Y', '0', '0', '0'}, "%s: Got signal %d. Aborting!"),
     ERR_SHUTDOWN_COMPLETE(1079, new byte[]{'H', 'Y', '0', '0', '0'}, "%s: Shutdown complete"),
-    ERR_FORCING_CLOSE(1080, new byte[]{'0', '8', 'S', '0', '1'}, "%s: Forcing close of thread %ld user: '%s'"),
+    ERR_FORCING_CLOSE(1080, new byte[]{'0', '8', 'S', '0', '1'}, "%s: Forcing close of thread %d user: '%s'"),
     ERR_IPSOCK_ERROR(1081, new byte[]{'0', '8', 'S', '0', '1'}, "Can't create IP socket"),
     ERR_NO_SUCH_INDEX(1082, new byte[]{'4', '2', 'S', '1', '2'}, "Table '%s' has no index like the one used in CREATE" +
             " INDEX; recreate the table"),
@@ -131,8 +131,8 @@ public enum ErrorCode {
     ERR_TEXTFILE_NOT_READABLE(1085, new byte[]{'H', 'Y', '0', '0', '0'}, "The file '%s' must be in the database " +
             "directory or be readable by all"),
     ERR_FILE_EXISTS_ERROR(1086, new byte[]{'H', 'Y', '0', '0', '0'}, "File '%s' already exists"),
-    ERR_LOAD_INF(1087, new byte[]{'H', 'Y', '0', '0', '0'}, "Records: %ld Deleted: %ld Skipped: %ld Warnings: %ld"),
-    ERR_ALTER_INF(1088, new byte[]{'H', 'Y', '0', '0', '0'}, "Records: %ld Duplicates: %ld"),
+    ERR_LOAD_INF(1087, new byte[]{'H', 'Y', '0', '0', '0'}, "Records: %d Deleted: %d Skipped: %d Warnings: %d"),
+    ERR_ALTER_INF(1088, new byte[]{'H', 'Y', '0', '0', '0'}, "Records: %d Duplicates: %d"),
     ERR_WRONG_SUB_KEY(1089, new byte[]{'H', 'Y', '0', '0', '0'}, "Incorrect prefix key; the used key part isn't a " +
             "string, the used length is longer than the key part, or the storage engine doesn't support unique prefix" +
             " keys"),
@@ -140,11 +140,11 @@ public enum ErrorCode {
             "TABLE; use DROP TABLE instead"),
     ERR_CANT_DROP_FIELD_OR_KEY(1091, new byte[]{'4', '2', '0', '0', '0'}, "Can't DROP '%s'; check that column/key " +
             "exists"),
-    ERR_INSERT_INF(1092, new byte[]{'H', 'Y', '0', '0', '0'}, "Records: %ld Duplicates: %ld Warnings: %ld"),
+    ERR_INSERT_INF(1092, new byte[]{'H', 'Y', '0', '0', '0'}, "Records: %d Duplicates: %d Warnings: %d"),
     ERR_UPDATE_TABLE_USED(1093, new byte[]{'H', 'Y', '0', '0', '0'}, "You can't specify target table '%s' for update " +
             "in FROM clause"),
-    ERR_NO_SUCH_THREAD(1094, new byte[]{'H', 'Y', '0', '0', '0'}, "Unknown thread id: %lu"),
-    ERR_KILL_DENIED_ERROR(1095, new byte[]{'H', 'Y', '0', '0', '0'}, "You are not owner of thread %lu"),
+    ERR_NO_SUCH_THREAD(1094, new byte[]{'H', 'Y', '0', '0', '0'}, "Unknown thread id: %d"),
+    ERR_KILL_DENIED_ERROR(1095, new byte[]{'H', 'Y', '0', '0', '0'}, "You are not owner of thread %d"),
     ERR_NO_TABLES_USED(1096, new byte[]{'H', 'Y', '0', '0', '0'}, "No tables used"),
     ERR_TOO_BIG_SET(1097, new byte[]{'H', 'Y', '0', '0', '0'}, "Too many strings for column %s and SET"),
     ERR_NO_UNIQUE_LOGFILE(1098, new byte[]{'H', 'Y', '0', '0', '0'}, "Can't generate a unique log-filename %s.(1-999)"),
@@ -174,8 +174,8 @@ public enum ErrorCode {
             "in a join"),
     ERR_TOO_MANY_FIELDS(1117, new byte[]{'H', 'Y', '0', '0', '0'}, "Too many columns"),
     ERR_TOO_BIG_ROWSIZE(1118, new byte[]{'4', '2', '0', '0', '0'}, "Row size too large. The maximum row size for the " +
-            "used table type, not counting BLOBs, is %ld. You have to change some columns to TEXT or BLOBs"),
-    ERR_STACK_OVERRUN(1119, new byte[]{'H', 'Y', '0', '0', '0'}, "Thread stack overrun: Used: %ld of a %ld stack. Use" +
+            "used table type, not counting BLOBs, is %d. You have to change some columns to TEXT or BLOBs"),
+    ERR_STACK_OVERRUN(1119, new byte[]{'H', 'Y', '0', '0', '0'}, "Thread stack overrun: Used: %d of a %d stack. Use" +
             " 'mysqld --thread_stack=#' to specify a bigger stack if needed"),
     ERR_WRONG_OUTER_JOIN(1120, new byte[]{'4', '2', '0', '0', '0'}, "Cross dependency found in OUTER JOIN; examine " +
             "your ON conditions"),
@@ -197,11 +197,11 @@ public enum ErrorCode {
     ERR_PASSWORD_NOT_ALLOWED(1132, new byte[]{'4', '2', '0', '0', '0'}, "You must have privileges to update tables in" +
             " the mysql database to be able to change passwords for others"),
     ERR_PASSWORD_NO_MATCH(1133, new byte[]{'4', '2', '0', '0', '0'}, "Can't find any matching row in the user table"),
-    ERR_UPDATE_INF(1134, new byte[]{'H', 'Y', '0', '0', '0'}, "Rows matched: %ld Changed: %ld Warnings: %ld"),
+    ERR_UPDATE_INF(1134, new byte[]{'H', 'Y', '0', '0', '0'}, "Rows matched: %d Changed: %d Warnings: %d"),
     ERR_CANT_CREATE_THREAD(1135, new byte[]{'H', 'Y', '0', '0', '0'}, "Can't create a new thread (Errno %d); if you " +
             "are not out of available memory, you can consult the manual for a possible OS-dependent bug"),
     ERR_WRONG_VALUE_COUNT_ON_ROW(1136, new byte[]{'2', '1', 'S', '0', '1'}, "Column count doesn't match value count " +
-            "at row %ld"),
+            "at row %d"),
     ERR_CANT_REOPEN_TABLE(1137, new byte[]{'H', 'Y', '0', '0', '0'}, "Can't reopen table: '%s'"),
     ERR_INVALID_USE_OF_NULL(1138, new byte[]{'2', '2', '0', '0', '4'}, "Invalid use of NULL value"),
     ERR_REGEXP_ERROR(1139, new byte[]{'4', '2', '0', '0', '0'}, "Got error '%s' from regexp"),
@@ -227,7 +227,7 @@ public enum ErrorCode {
     ERR_DELAYED_CANT_CHANGE_LOCK(1150, new byte[]{'H', 'Y', '0', '0', '0'}, "Delayed insert thread couldn't get " +
             "requested lock for table %s"),
     ERR_TOO_MANY_DELAYED_THREADS(1151, new byte[]{'H', 'Y', '0', '0', '0'}, "Too many delayed threads in use"),
-    ERR_ABORTING_CONNECTION(1152, new byte[]{'0', '8', 'S', '0', '1'}, "Aborted connection %ld to db: '%s' user: '%s'" +
+    ERR_ABORTING_CONNECTION(1152, new byte[]{'0', '8', 'S', '0', '1'}, "Aborted connection %d to db: '%s' user: '%s'" +
             " (%s)"),
     ERR_NET_PACKET_TOO_LARGE(1153, new byte[]{'0', '8', 'S', '0', '1'}, "Got a packet bigger than " +
             "'max_allowed_packet' bytes"),
@@ -274,7 +274,7 @@ public enum ErrorCode {
     ERR_ERROR_DURING_ROLLBACK(1181, new byte[]{'H', 'Y', '0', '0', '0'}, "Got error %d during ROLLBACK"),
     ERR_ERROR_DURING_FLUSH_LOGS(1182, new byte[]{'H', 'Y', '0', '0', '0'}, "Got error %d during FLUSH_LOGS"),
     ERR_ERROR_DURING_CHECKPOINT(1183, new byte[]{'H', 'Y', '0', '0', '0'}, "Got error %d during CHECKPOINT"),
-    ERR_NEW_ABORTING_CONNECTION(1184, new byte[]{'0', '8', 'S', '0', '1'}, "Aborted connection %ld to db: '%s' user: " +
+    ERR_NEW_ABORTING_CONNECTION(1184, new byte[]{'0', '8', 'S', '0', '1'}, "Aborted connection %d to db: '%s' user: " +
             "'%s' host: '%s' (%s)"),
     ERR_UNUSED_10(1185, new byte[]{}, "You should never see it"),
     ERR_FLUSH_MASTER_BINLOG_CLOSED(1186, new byte[]{'H', 'Y', '0', '0', '0'}, "Binlog closed, cannot RESET MASTER"),
@@ -344,7 +344,7 @@ public enum ErrorCode {
             "tables is disabled"),
     ERR_DUP_ARGUMENT(1225, new byte[]{'H', 'Y', '0', '0', '0'}, "Option '%s' used twice in statement"),
     ERR_USER_LIMIT_REACHED(1226, new byte[]{'4', '2', '0', '0', '0'}, "User '%s' has exceeded the '%s' resource " +
-            "(current value: %ld)"),
+            "(current value: %d)"),
     ERR_SPECIFIC_ACCESS_DENIED_ERROR(1227, new byte[]{'4', '2', '0', '0', '0'}, "Access denied; you need (at least " +
             "one of) the %s privilege(s) for this operation"),
     ERR_LOCAL_VARIABLE(1228, new byte[]{'H', 'Y', '0', '0', '0'}, "Variable '%s' is a SESSION variable and can't be " +
@@ -395,14 +395,14 @@ public enum ErrorCode {
             "(probably, length of uncompressed data was corrupted)"),
     ERR_ZLIB_Z_DATA_ERROR(1259, new byte[]{'H', 'Y', '0', '0', '0'}, "ZLIB: Input data corrupted"),
     ERR_CUT_VALUE_GROUP_CONCAT(1260, new byte[]{'H', 'Y', '0', '0', '0'}, "Row %u was cut by GROUP_CONCAT()"),
-    ERR_WARN_TOO_FEW_RECORDS(1261, new byte[]{'0', '1', '0', '0', '0'}, "Row %ld doesn't contain data for all columns"),
-    ERR_WARN_TOO_MANY_RECORDS(1262, new byte[]{'0', '1', '0', '0', '0'}, "Row %ld was truncated; it contained more " +
+    ERR_WARN_TOO_FEW_RECORDS(1261, new byte[]{'0', '1', '0', '0', '0'}, "Row %d doesn't contain data for all columns"),
+    ERR_WARN_TOO_MANY_RECORDS(1262, new byte[]{'0', '1', '0', '0', '0'}, "Row %d was truncated; it contained more " +
             "data than there were input columns"),
     ERR_WARN_NULL_TO_NOTNULL(1263, new byte[]{'2', '2', '0', '0', '4'}, "Column set to default value; NULL supplied " +
-            "to NOT NULL column '%s' at row %ld"),
+            "to NOT NULL column '%s' at row %d"),
     ERR_WARN_DATA_OUT_OF_RANGE(1264, new byte[]{'2', '2', '0', '0', '3'}, "Out of range value for column '%s' at row " +
-            "%ld"),
-    WARN_DATA_TRUNCATED(1265, new byte[]{'0', '1', '0', '0', '0'}, "Data truncated for column '%s' at row %ld"),
+            "%d"),
+    WARN_DATA_TRUNCATED(1265, new byte[]{'0', '1', '0', '0', '0'}, "Data truncated for column '%s' at row %d"),
     ERR_WARN_USING_OTHER_HANDLER(1266, new byte[]{'H', 'Y', '0', '0', '0'}, "Using storage engine %s for table '%s'"),
     ERR_CANT_AGGREGATE_2COLLATIONS(1267, new byte[]{'H', 'Y', '0', '0', '0'}, "Illegal mix of collations (%s,%s) and " +
             "(%s,%s) for operation '%s'"),
@@ -434,8 +434,8 @@ public enum ErrorCode {
             "options are ignored"),
     ERR_WRONG_NAME_FOR_INDEX(1280, new byte[]{'4', '2', '0', '0', '0'}, "Incorrect index name '%s'"),
     ERR_WRONG_NAME_FOR_CATALOG(1281, new byte[]{'4', '2', '0', '0', '0'}, "Incorrect catalog name '%s'"),
-    ERR_WARN_QC_RESIZE(1282, new byte[]{'H', 'Y', '0', '0', '0'}, "Query cache failed to set size %lu; new query " +
-            "cache size is %lu"),
+    ERR_WARN_QC_RESIZE(1282, new byte[]{'H', 'Y', '0', '0', '0'}, "Query cache failed to set size %d; new query " +
+            "cache size is %d"),
     ERR_BAD_FT_COLUMN(1283, new byte[]{'H', 'Y', '0', '0', '0'}, "Column '%s' cannot be part of FULLTEXT index"),
     ERR_UNKNOWN_KEY_CACHE(1284, new byte[]{'H', 'Y', '0', '0', '0'}, "Unknown key cache '%s'"),
     ERR_WARN_HOSTNAME_WONT_WORK(1285, new byte[]{'H', 'Y', '0', '0', '0'}, "MariaDB is started in --skip-name-resolve" +
@@ -461,10 +461,10 @@ public enum ErrorCode {
     ERR_GET_TEMPORARY_ERRMSG(1297, new byte[]{'H', 'Y', '0', '0', '0'}, "Got temporary error %d '%s' from %s"),
     ERR_UNKNOWN_TIME_ZONE(1298, new byte[]{'H', 'Y', '0', '0', '0'}, "Unknown or incorrect time zone: '%s'"),
     ERR_WARN_INVALID_TIMESTAMP(1299, new byte[]{'H', 'Y', '0', '0', '0'}, "Invalid TIMESTAMP value in column '%s' at " +
-            "row %ld"),
+            "row %d"),
     ERR_INVALID_CHARACTER_STRING(1300, new byte[]{'H', 'Y', '0', '0', '0'}, "Invalid %s character string: '%s'"),
     ERR_WARN_ALLOWED_PACKET_OVERFLOWED(1301, new byte[]{'H', 'Y', '0', '0', '0'}, "Result of %s() was larger than " +
-            "max_allowed_packet (%ld) - truncated"),
+            "max_allowed_packet (%d) - truncated"),
     ERR_CONFLICTING_DECLARATIONS(1302, new byte[]{'H', 'Y', '0', '0', '0'}, "Conflicting declarations: '%s%s' and " +
             "'%s%s'"),
     ERR_SP_NO_RECURSIVE_CREATE(1303, new byte[]{'2', 'F', '0', '0', '3'}, "Can't create a %s from within another " +
@@ -558,7 +558,7 @@ public enum ErrorCode {
     ERR_NO_DEFAULT_FOR_FIELD(1364, new byte[]{'H', 'Y', '0', '0', '0'}, "Field '%s' doesn't have a default value"),
     ERR_DIVISION_BY_ZER(1365, new byte[]{'2', '2', '0', '1', '2'}, "Division by 0"),
     ERR_TRUNCATED_WRONG_VALUE_FOR_FIELD(1366, new byte[]{'H', 'Y', '0', '0', '0'}, "Incorrect %s value: '%s' for " +
-            "column '%s' at row %ld"),
+            "column '%s' at row %d"),
     ERR_ILLEGAL_VALUE_FOR_TYPE(1367, new byte[]{'2', '2', '0', '0', '7'}, "Illegal %s '%s' value found during parsing"),
     ERR_VIEW_NONUPD_CHECK(1368, new byte[]{'H', 'Y', '0', '0', '0'}, "CHECK OPTION on non-updatable view '%s.%s'"),
     ERR_VIEW_CHECK_FAILED(1369, new byte[]{'H', 'Y', '0', '0', '0'}, "CHECK OPTION failed '%s.%s'"),
@@ -612,7 +612,7 @@ public enum ErrorCode {
             "privileges"),
     ERR_PROC_AUTO_REVOKE_FAIL(1405, new byte[]{'H', 'Y', '0', '0', '0'}, "Failed to revoke all privileges to dropped " +
             "routine"),
-    ERR_DATA_TOO_LONG(1406, new byte[]{'2', '2', '0', '0', '1'}, "Data too long for column '%s' at row %ld"),
+    ERR_DATA_TOO_LONG(1406, new byte[]{'2', '2', '0', '0', '1'}, "Data too long for column '%s' at row %d"),
     ERR_SP_BAD_SQLSTATE(1407, new byte[]{'4', '2', '0', '0', '0'}, "Bad SQLSTATE: '%s'"),
     ERR_STARTUP(1408, new byte[]{'H', 'Y', '0', '0', '0'}, "%s: ready for connections. Version: '%s' socket: '%s' " +
             "port: %d %s"),
@@ -643,7 +643,7 @@ public enum ErrorCode {
             "variable)"),
     ERR_EXEC_STMT_WITH_OPEN_CURSOR(1420, new byte[]{'H', 'Y', '0', '0', '0'}, "You can't execute a prepared statement" +
             " which has an open cursor associated with it. Reset the statement to re-execute it."),
-    ERR_STMT_HAS_NO_OPEN_CURSOR(1421, new byte[]{'H', 'Y', '0', '0', '0'}, "The statement (%lu) has no open cursor."),
+    ERR_STMT_HAS_NO_OPEN_CURSOR(1421, new byte[]{'H', 'Y', '0', '0', '0'}, "The statement (%d) has no open cursor."),
     ERR_COMMIT_NOT_ALLOWED_IN_SF_OR_TRG(1422, new byte[]{'H', 'Y', '0', '0', '0'}, "Explicit or implicit commit is " +
             "not allowed in stored function or trigger."),
     ERR_NO_DEFAULT_FOR_VIEW_FIELD(1423, new byte[]{'H', 'Y', '0', '0', '0'}, "Field of view '%s.%s' underlying table " +
@@ -651,9 +651,9 @@ public enum ErrorCode {
     ERR_SP_NO_RECURSION(1424, new byte[]{'H', 'Y', '0', '0', '0'}, "Recursive stored functions and triggers are not " +
             "allowed."),
     ERR_TOO_BIG_SCALE(1425, new byte[]{'4', '2', '0', '0', '0'}, "Too big scale %d specified for column '%s'. Maximum" +
-            " is %lu."),
+            " is %d."),
     ERR_TOO_BIG_PRECISION(1426, new byte[]{'4', '2', '0', '0', '0'}, "Too big precision %d specified for column '%s'." +
-            " Maximum is %lu."),
+            " Maximum is %d."),
     ERR_M_BIGGER_THAN_D(1427, new byte[]{'4', '2', '0', '0', '0'}, "For float(M,D, double(M,D or decimal(M,D, M must " +
             "be >= D (column '%s')."),
     ERR_WRONG_LOCK_OF_SYSTEM_TABLE(1428, new byte[]{'H', 'Y', '0', '0', '0'}, "You can't combine write-locking of " +
@@ -671,12 +671,12 @@ public enum ErrorCode {
     ERR_CANT_CREATE_FEDERATED_TABLE(1434, new byte[]{'H', 'Y', '0', '0', '0'}, "Can't create federated table. Foreign" +
             " data src error: %s"),
     ERR_TRG_IN_WRONG_SCHEMA(1435, new byte[]{'H', 'Y', '0', '0', '0'}, "Trigger in wrong schema"),
-    ERR_STACK_OVERRUN_NEED_MORE(1436, new byte[]{'H', 'Y', '0', '0', '0'}, "Thread stack overrun: %ld bytes used of a" +
-            " %ld byte stack, and %ld bytes needed. Use 'mysqld --thread_stack=#' to specify a bigger stack."),
+    ERR_STACK_OVERRUN_NEED_MORE(1436, new byte[]{'H', 'Y', '0', '0', '0'}, "Thread stack overrun: %d bytes used of a" +
+            " %d byte stack, and %d bytes needed. Use 'mysqld --thread_stack=#' to specify a bigger stack."),
     ERR_TOO_LONG_BODY(1437, new byte[]{'4', '2', '0', '0', '0'}, "Routine body for '%s' is too long"),
     ERR_WARN_CANT_DROP_DEFAULT_KEYCACHE(1438, new byte[]{'H', 'Y', '0', '0', '0'}, "Cannot drop default keycache"),
     ERR_TOO_BIG_DISPLAYWIDTH(1439, new byte[]{'4', '2', '0', '0', '0'}, "Display width out of range for column '%s' " +
-            "(max = %lu)"),
+            "(max = %d)"),
     ERR_XAER_DUPID(1440, new byte[]{'X', 'A', 'E', '0', '8'}, "XAER_DUPID: The XID already exists"),
     ERR_DATETIME_FUNCTION_OVERFLOW(1441, new byte[]{'2', '2', '0', '0', '8'}, "Datetime function: %s field overflow"),
     ERR_CANT_UPDATE_USED_TABLE_IN_SF_OR_TRG(1442, new byte[]{'H', 'Y', '0', '0', '0'}, "Can't update table '%s' in " +
@@ -720,7 +720,7 @@ public enum ErrorCode {
             "TABLE `%s`\" or dump/reload to fix it!"),
     ERR_SP_NO_AGGREGATE(1460, new byte[]{'4', '2', '0', '0', '0'}, "AGGREGATE is not supported for stored functions"),
     ERR_MAX_PREPARED_STMT_COUNT_REACHED(1461, new byte[]{'4', '2', '0', '0', '0'}, "Can't create more than " +
-            "max_prepared_stmt_count statements (current value: %lu)"),
+            "max_prepared_stmt_count statements (current value: %d)"),
     ERR_VIEW_RECURSIVE(1462, new byte[]{'H', 'Y', '0', '0', '0'}, "`%s`.`%s` contains view recursion"),
     ERR_NON_GROUPING_FIELD_USED(1463, new byte[]{'4', '2', '0', '0', '0'}, "Non-grouping field '%s' is used in %s " +
             "clause"),
@@ -1004,9 +1004,9 @@ public enum ErrorCode {
     ERR_EXCEPTIONS_WRITE_ERROR(1627, new byte[]{'H', 'Y', '0', '0', '0'}, "Write to exceptions table failed. Message:" +
             " %s"),
     ERR_TOO_LONG_TABLE_COMMENT(1628, new byte[]{'H', 'Y', '0', '0', '0'}, "Comment for table '%s' is too long (max = " +
-            "%lu)"),
+            "%d)"),
     ERR_TOO_LONG_FIELD_COMMENT(1629, new byte[]{'H', 'Y', '0', '0', '0'}, "Comment for field '%s' is too long (max = " +
-            "%lu)"),
+            "%d)"),
     ERR_FUNC_INEXISTENT_NAME_COLLISION(1630, new byte[]{'4', '2', '0', '0', '0'}, "FUNCTION %s does not exist. Check " +
             "the 'Function Name Parsing and Resolution' section in the Reference Manual"),
     ERR_DATABASE_NAME(1631, new byte[]{'H', 'Y', '0', '0', '0'}, "Database"),
@@ -1117,7 +1117,7 @@ public enum ErrorCode {
     ERR_SPATIAL_MUST_HAVE_GEOM_COL(1687, new byte[]{'4', '2', '0', '0', '0'}, "A SPATIAL index may only contain a " +
             "geometrical type column"),
     ERR_TOO_LONG_INDEX_COMMENT(1688, new byte[]{'H', 'Y', '0', '0', '0'}, "Comment for index '%s' is too long (max = " +
-            "%lu)"),
+            "%d)"),
     ERR_LOCK_ABORTED(1689, new byte[]{'H', 'Y', '0', '0', '0'}, "Wait on a lock was aborted due to a pending " +
             "exclusive lock"),
     ERR_DATA_OUT_OF_RANGE(1690, new byte[]{'2', '2', '0', '0', '3'}, "%s value is out of range in '%s'"),
@@ -1160,7 +1160,7 @@ public enum ErrorCode {
     WARN_OPTION_BELOW_LIMIT(1708, new byte[]{'H', 'Y', '0', '0', '0'}, "The value of '%s' should be no less than the " +
             "value of '%s'"),
     ERR_INDEX_COLUMN_TOO_LONG(1709, new byte[]{'H', 'Y', '0', '0', '0'}, "Index column size too large. The maximum " +
-            "column size is %lu bytes."),
+            "column size is %d bytes."),
     ERR_ERROR_IN_TRIGGER_BODY(1710, new byte[]{'H', 'Y', '0', '0', '0'}, "Trigger '%s' has an error in its body: '%s'"),
     ERR_ERROR_IN_UNKNOWN_TRIGGER_BODY(1711, new byte[]{'H', 'Y', '0', '0', '0'}, "Unknown trigger has an error in its" +
             " body: '%s'"),
@@ -1229,8 +1229,8 @@ public enum ErrorCode {
     ERR_TABLES_DIFFERENT_METADATA(1736, new byte[]{'H', 'Y', '0', '0', '0'}, "Tables have different definitions"),
     ERR_ROW_DOES_NOT_MATCH_PARTITION(1737, new byte[]{'H', 'Y', '0', '0', '0'}, "Found a row that does not match the " +
             "partition"),
-    ERR_BINLOG_CACHE_SIZE_GREATER_THAN_MAX(1738, new byte[]{'H', 'Y', '0', '0', '0'}, "Option binlog_cache_size (%lu)" +
-            " is greater than max_binlog_cache_size (%lu); setting binlog_cache_size equal to max_binlog_cache_size."),
+    ERR_BINLOG_CACHE_SIZE_GREATER_THAN_MAX(1738, new byte[]{'H', 'Y', '0', '0', '0'}, "Option binlog_cache_size (%d)" +
+            " is greater than max_binlog_cache_size (%d); setting binlog_cache_size equal to max_binlog_cache_size."),
     ERR_WARN_INDEX_NOT_APPLICABLE(1739, new byte[]{'H', 'Y', '0', '0', '0'}, "Cannot use %s access on index '%s' due " +
             "to type or collation conversion on field '%s'"),
     ERR_PARTITION_EXCHANGE_FOREIGN_KEY(1740, new byte[]{'H', 'Y', '0', '0', '0'}, "Table to exchange with partition " +
@@ -1242,7 +1242,7 @@ public enum ErrorCode {
     ERR_BINLOG_READ_EVENT_CHECKSUM_FAILURE(1744, new byte[]{'H', 'Y', '0', '0', '0'}, "Replication event checksum " +
             "verification failed while reading from a log file."),
     ERR_BINLOG_STMT_CACHE_SIZE_GREATER_THAN_MAX(1745, new byte[]{'H', 'Y', '0', '0', '0'}, "Option " +
-            "binlog_stmt_cache_size (%lu) is greater than max_binlog_stmt_cache_size (%lu); setting " +
+            "binlog_stmt_cache_size (%d) is greater than max_binlog_stmt_cache_size (%d); setting " +
             "binlog_stmt_cache_size " +
             "equal to max_binlog_stmt_cache_size."),
     ERR_CANT_UPDATE_TABLE_IN_CREATE_TABLE_SELECT(1746, new byte[]{'H', 'Y', '0', '0', '0'}, "Can't update table '%s' " +
@@ -1354,7 +1354,7 @@ public enum ErrorCode {
     ERR_CANT_EXECUTE_IN_READ_ONLY_TRANSACTION(1792, new byte[]{'2', '5', '0', '0', '6'}, "Cannot execute statement in" +
             " a READ ONLY transaction."),
     ERR_TOO_LONG_TABLE_PARTITION_COMMENT(1793, new byte[]{'H', 'Y', '0', '0', '0'}, "Comment for table partition '%s'" +
-            " is too long (max = %lu"),
+            " is too long (max = %d"),
     ERR_SLAVE_CONFIGURATION(1794, new byte[]{'H', 'Y', '0', '0', '0'}, "Slave is not configured or failed to " +
             "initialize properly. You must at least set --server-id to enable either a master or a slave. Additional " +
             "error " +
@@ -1387,17 +1387,17 @@ public enum ErrorCode {
             " table '%s'. Cannot discard the table."),
     ERR_TABLE_SCHEMA_MISMATCH(1808, new byte[]{'H', 'Y', '0', '0', '0'}, "Schema mismatch (%s"),
     ERR_TABLE_IN_SYSTEM_TABLESPACE(1809, new byte[]{'H', 'Y', '0', '0', '0'}, "Table '%s' in system tablespace"),
-    ERR_IO_READ_ERROR(1810, new byte[]{'H', 'Y', '0', '0', '0'}, "IO Read error: (%lu, %s) %s"),
-    ERR_IO_WRITE_ERROR(1811, new byte[]{'H', 'Y', '0', '0', '0'}, "IO Write error: (%lu, %s) %s"),
+    ERR_IO_READ_ERROR(1810, new byte[]{'H', 'Y', '0', '0', '0'}, "IO Read error: (%d, %s) %s"),
+    ERR_IO_WRITE_ERROR(1811, new byte[]{'H', 'Y', '0', '0', '0'}, "IO Write error: (%d, %s) %s"),
     ERR_TABLESPACE_MISSING(1812, new byte[]{'H', 'Y', '0', '0', '0'}, "Tablespace is missing for table '%s'"),
     ERR_TABLESPACE_EXISTS(1813, new byte[]{'H', 'Y', '0', '0', '0'}, "Tablespace for table '%s' exists. Please " +
             "DISCARD the tablespace before IMPORT."),
     ERR_TABLESPACE_DISCARDED(1814, new byte[]{'H', 'Y', '0', '0', '0'}, "Tablespace has been discarded for table '%s'"),
     ERR_INTERNAL_ERROR(1815, new byte[]{'H', 'Y', '0', '0', '0'}, "Internal error: %s"),
     ERR_INNODB_IMPORT_ERROR(1816, new byte[]{'H', 'Y', '0', '0', '0'}, "ALTER TABLE '%s' IMPORT TABLESPACE failed " +
-            "with error %lu : '%s'"),
+            "with error %d : '%s'"),
     ERR_INNODB_INDEX_CORRUPT(1817, new byte[]{'H', 'Y', '0', '0', '0'}, "Index corrupt: %s"),
-    ERR_INVALID_YEAR_COLUMN_LENGTH(1818, new byte[]{'H', 'Y', '0', '0', '0'}, "YEAR(%lu) column type is deprecated. " +
+    ERR_INVALID_YEAR_COLUMN_LENGTH(1818, new byte[]{'H', 'Y', '0', '0', '0'}, "YEAR(%d) column type is deprecated. " +
             "Creating YEAR(4) column instead."),
     ERR_NOT_VALID_PASSWORD(1819, new byte[]{'H', 'Y', '0', '0', '0'}, "Your password does not satisfy the current " +
             "policy requirements"),
@@ -1491,7 +1491,7 @@ public enum ErrorCode {
             "you must change it using a client that supports expired passwords."),
     ERR_ROW_IN_WRONG_PARTITION(1863, new byte[]{'H', 'Y', '0', '0', '0'}, "Found a row in wrong partition %s"),
     ERR_MTS_EVENT_BIGGER_PENDING_JOBS_SIZE_MAX(1864, new byte[]{'H', 'Y', '0', '0', '0'}, "Cannot schedule event %s, " +
-            "relay-log name %s, position %s to Worker thread because its size %lu exceeds %lu of " +
+            "relay-log name %s, position %s to Worker thread because its size %d exceeds %d of " +
             "slave_pending_jobs_size_max" +
             "."),
     ERR_INNODB_NO_FT_USES_PARSER(1865, new byte[]{'H', 'Y', '0', '0', '0'}, "Cannot CREATE FULLTEXT INDEX WITH PARSER" +


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[incubator-doris] 08/09: [fix] fix the problem that using tsan to compile，BE will stack overflow when start (#8904)

Posted by mo...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch dev-1.0.1
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git

commit a4697fd76cc75c01e7e7c426b5854b7e398f8b27
Author: Lightman <31...@users.noreply.github.com>
AuthorDate: Sat Apr 9 19:17:28 2022 +0800

    [fix] fix the problem that using tsan to compile，BE will stack overflow when start (#8904)
    
    Currently TSAN can only be compiled using CLang, not GCC.
    And when compiling with -o0, stack overflow occurs at startup, issue #8868.
    A function definition will be reported missing at compile time, the file provided in PR #8665 is required.
---
 be/CMakeLists.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/be/CMakeLists.txt b/be/CMakeLists.txt
index ebd9542875..14183cf0d7 100644
--- a/be/CMakeLists.txt
+++ b/be/CMakeLists.txt
@@ -425,7 +425,8 @@ SET(CXX_FLAGS_UBSAN "${CXX_GCC_FLAGS} -O0 -fno-wrapv -fsanitize=undefined")
 # Set the flags to the thread sanitizer, also known as "tsan"
 # Turn on sanitizer and debug symbols to get stack traces:
 # Use -Wno-builtin-declaration-mismatch to mute warnings like "new declaration ‘__tsan_atomic16 __tsan_atomic16_fetch_nand(..."
-SET(CXX_FLAGS_TSAN "${CXX_GCC_FLAGS} -O0 -fsanitize=thread -DTHREAD_SANITIZER -Wno-builtin-declaration-mismatch")
+# If use -O0 to compile, BE will stack overflow when start. https://github.com/apache/incubator-doris/issues/8868
+SET(CXX_FLAGS_TSAN "${CXX_GCC_FLAGS} -O1 -fsanitize=thread -DTHREAD_SANITIZER -Wno-missing-declarations")
 
 # Set compile flags based on the build type.
 if ("${CMAKE_BUILD_TYPE}" STREQUAL "DEBUG")


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org

[incubator-doris] 06/09: [fix] check disk capacity before writing data (#8887)

Posted by mo...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch dev-1.0.1
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git

commit 9c8d005abc2616e703f263fd6d013c647e42f58a
Author: Mingyu Chen <mo...@gmail.com>
AuthorDate: Fri Apr 8 11:29:49 2022 +0800

    [fix] check disk capacity before writing data (#8887)
    
    1. We forgot to check disk capacity when writing data.
    2. TODO: the user specified disk capacity is not used now. We need to find a way to use it.
    3. Avoid print too much compaction log when there is not suitable version for compaction.
---
 be/src/olap/compaction.cpp                       |  1 +
 be/src/olap/data_dir.cpp                         |  1 -
 be/src/olap/delta_writer.cpp                     |  1 +
 be/src/olap/push_handler.cpp                     |  2 ++
 be/src/olap/rowset/beta_rowset_writer.cpp        |  2 +-
 be/src/olap/rowset/rowset_writer_context.h       |  5 +++++
 be/src/olap/rowset/segment_v2/segment_writer.cpp | 11 +++++++++--
 be/src/olap/rowset/segment_v2/segment_writer.h   |  4 +++-
 be/src/olap/schema_change.cpp                    |  3 +++
 be/src/olap/tablet.cpp                           | 16 ++++++++++++----
 be/src/olap/tablet_manager.cpp                   |  1 +
 11 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/be/src/olap/compaction.cpp b/be/src/olap/compaction.cpp
index edb7559752..f0dc268570 100644
--- a/be/src/olap/compaction.cpp
+++ b/be/src/olap/compaction.cpp
@@ -154,6 +154,7 @@ OLAPStatus Compaction::construct_output_rowset_writer() {
     context.tablet_id = _tablet->tablet_id();
     context.partition_id = _tablet->partition_id();
     context.tablet_schema_hash = _tablet->schema_hash();
+    context.data_dir = _tablet->data_dir();
     context.rowset_type = StorageEngine::instance()->default_rowset_type();
     if (_tablet->tablet_meta()->preferred_rowset_type() == BETA_ROWSET) {
         context.rowset_type = BETA_ROWSET;
diff --git a/be/src/olap/data_dir.cpp b/be/src/olap/data_dir.cpp
index 33f472906d..96aec01909 100644
--- a/be/src/olap/data_dir.cpp
+++ b/be/src/olap/data_dir.cpp
@@ -755,7 +755,6 @@ bool DataDir::reach_capacity_limit(int64_t incoming_data_size) {
     double used_pct = (_disk_capacity_bytes - _available_bytes + incoming_data_size) /
                       (double)_disk_capacity_bytes;
     int64_t left_bytes = _available_bytes - incoming_data_size;
-
     if (used_pct >= config::storage_flood_stage_usage_percent / 100.0 &&
         left_bytes <= config::storage_flood_stage_left_capacity_bytes) {
         LOG(WARNING) << "reach capacity limit. used pct: " << used_pct
diff --git a/be/src/olap/delta_writer.cpp b/be/src/olap/delta_writer.cpp
index c0b17b2105..9738f8f55b 100644
--- a/be/src/olap/delta_writer.cpp
+++ b/be/src/olap/delta_writer.cpp
@@ -135,6 +135,7 @@ OLAPStatus DeltaWriter::init() {
     writer_context.load_id = _req.load_id;
     writer_context.segments_overlap = OVERLAPPING;
     writer_context.parent_mem_tracker = _mem_tracker;
+    writer_context.data_dir = _tablet->data_dir();
     RETURN_NOT_OK(RowsetFactory::create_rowset_writer(writer_context, &_rowset_writer));
 
     _tablet_schema = &(_tablet->tablet_schema());
diff --git a/be/src/olap/push_handler.cpp b/be/src/olap/push_handler.cpp
index dd57cc2dc4..bb3149216b 100644
--- a/be/src/olap/push_handler.cpp
+++ b/be/src/olap/push_handler.cpp
@@ -226,6 +226,7 @@ OLAPStatus PushHandler::_convert_v2(TabletSharedPtr cur_tablet, TabletSharedPtr
         context.tablet_id = cur_tablet->tablet_id();
         context.partition_id = _request.partition_id;
         context.tablet_schema_hash = cur_tablet->schema_hash();
+        context.data_dir = cur_tablet->data_dir();
         context.rowset_type = StorageEngine::instance()->default_rowset_type();
         if (cur_tablet->tablet_meta()->preferred_rowset_type() == BETA_ROWSET) {
             context.rowset_type = BETA_ROWSET;
@@ -412,6 +413,7 @@ OLAPStatus PushHandler::_convert(TabletSharedPtr cur_tablet, TabletSharedPtr new
         context.tablet_id = cur_tablet->tablet_id();
         context.partition_id = _request.partition_id;
         context.tablet_schema_hash = cur_tablet->schema_hash();
+        context.data_dir = cur_tablet->data_dir();
         context.rowset_type = StorageEngine::instance()->default_rowset_type();
         if (cur_tablet->tablet_meta()->preferred_rowset_type() == BETA_ROWSET) {
             context.rowset_type = BETA_ROWSET;
diff --git a/be/src/olap/rowset/beta_rowset_writer.cpp b/be/src/olap/rowset/beta_rowset_writer.cpp
index 4b68b39059..3cfeb27091 100644
--- a/be/src/olap/rowset/beta_rowset_writer.cpp
+++ b/be/src/olap/rowset/beta_rowset_writer.cpp
@@ -225,7 +225,7 @@ OLAPStatus BetaRowsetWriter::_create_segment_writer(std::unique_ptr<segment_v2::
     DCHECK(wblock != nullptr);
     segment_v2::SegmentWriterOptions writer_options;
     writer->reset(new segment_v2::SegmentWriter(wblock.get(), _num_segment,
-            _context.tablet_schema, writer_options, _context.parent_mem_tracker));
+            _context.tablet_schema, _context.data_dir, writer_options, _context.parent_mem_tracker));
     {
         std::lock_guard<SpinLock> l(_lock);
         _wblocks.push_back(std::move(wblock));
diff --git a/be/src/olap/rowset/rowset_writer_context.h b/be/src/olap/rowset/rowset_writer_context.h
index 8c314f5dba..51dab55a34 100644
--- a/be/src/olap/rowset/rowset_writer_context.h
+++ b/be/src/olap/rowset/rowset_writer_context.h
@@ -67,6 +67,11 @@ struct RowsetWriterContext {
     // the default is set to INT32_MAX to avoid overflow issue when casting from uint32_t to int.
     // test cases can change this value to control flush timing
     uint32_t max_rows_per_segment = INT32_MAX;
+    // not owned, point to the data dir of this rowset
+    // for checking disk capacity when write data to disk.
+    // ATTN: not support for RowsetConvertor.
+    // (because it hard to refactor, and RowsetConvertor will be deprecated in future)
+    DataDir* data_dir = nullptr;
 };
 
 } // namespace doris
diff --git a/be/src/olap/rowset/segment_v2/segment_writer.cpp b/be/src/olap/rowset/segment_v2/segment_writer.cpp
index adbfef9694..669c770655 100644
--- a/be/src/olap/rowset/segment_v2/segment_writer.cpp
+++ b/be/src/olap/rowset/segment_v2/segment_writer.cpp
@@ -19,6 +19,7 @@
 
 #include "common/logging.h" // LOG
 #include "env/env.h"        // Env
+#include "olap/data_dir.h"
 #include "olap/fs/block_manager.h"
 #include "olap/row.h"                             // ContiguousRow
 #include "olap/row_cursor.h"                      // RowCursor
@@ -37,8 +38,10 @@ const char* k_segment_magic = "D0R1";
 const uint32_t k_segment_magic_length = 4;
 
 SegmentWriter::SegmentWriter(fs::WritableBlock* wblock, uint32_t segment_id,
-                             const TabletSchema* tablet_schema, const SegmentWriterOptions& opts, std::shared_ptr<MemTracker> parent)
-        : _segment_id(segment_id), _tablet_schema(tablet_schema), _opts(opts), _wblock(wblock), _mem_tracker(MemTracker::CreateTracker(
+                             const TabletSchema* tablet_schema, DataDir* data_dir,
+                             const SegmentWriterOptions& opts, std::shared_ptr<MemTracker> parent)
+        : _segment_id(segment_id), _tablet_schema(tablet_schema), _data_dir(data_dir),
+          _opts(opts), _wblock(wblock), _mem_tracker(MemTracker::CreateTracker(
                 -1, "Segment-" + std::to_string(segment_id), parent, false)) {
     CHECK_NOTNULL(_wblock);
 }
@@ -134,6 +137,10 @@ uint64_t SegmentWriter::estimate_segment_size() {
 }
 
 Status SegmentWriter::finalize(uint64_t* segment_file_size, uint64_t* index_size) {
+    // check disk capacity
+    if (_data_dir != nullptr && _data_dir->reach_capacity_limit((int64_t) estimate_segment_size())) {
+        return Status::InternalError(fmt::format("disk {} exceed capacity limit.", _data_dir->path_hash()));
+    }
     for (auto& column_writer : _column_writers) {
         RETURN_IF_ERROR(column_writer->finish());
     }
diff --git a/be/src/olap/rowset/segment_v2/segment_writer.h b/be/src/olap/rowset/segment_v2/segment_writer.h
index d0600996ad..ebba30fe70 100644
--- a/be/src/olap/rowset/segment_v2/segment_writer.h
+++ b/be/src/olap/rowset/segment_v2/segment_writer.h
@@ -28,6 +28,7 @@
 
 namespace doris {
 
+class DataDir;
 class MemTracker;
 class RowBlock;
 class RowCursor;
@@ -53,7 +54,7 @@ struct SegmentWriterOptions {
 class SegmentWriter {
 public:
     explicit SegmentWriter(fs::WritableBlock* block, uint32_t segment_id,
-                           const TabletSchema* tablet_schema, const SegmentWriterOptions& opts, std::shared_ptr<MemTracker> parent = nullptr);
+                           const TabletSchema* tablet_schema, DataDir* data_dir, const SegmentWriterOptions& opts, std::shared_ptr<MemTracker> parent = nullptr);
     ~SegmentWriter();
 
     Status init(uint32_t write_mbytes_per_sec);
@@ -83,6 +84,7 @@ private:
 private:
     uint32_t _segment_id;
     const TabletSchema* _tablet_schema;
+    DataDir* _data_dir;
     SegmentWriterOptions _opts;
 
     // Not owned. owned by RowsetWriter
diff --git a/be/src/olap/schema_change.cpp b/be/src/olap/schema_change.cpp
index eb500a9bc8..0cd9ae7567 100644
--- a/be/src/olap/schema_change.cpp
+++ b/be/src/olap/schema_change.cpp
@@ -1338,6 +1338,7 @@ bool SchemaChangeWithSorting::_internal_sorting(const std::vector<RowBlock*>& ro
     context.rowset_type = new_rowset_type;
     context.path_desc = new_tablet->tablet_path_desc();
     context.tablet_schema = &(new_tablet->tablet_schema());
+    context.data_dir = new_tablet->data_dir();
     context.rowset_state = VISIBLE;
     context.version = version;
     context.segments_overlap = segments_overlap;
@@ -1734,6 +1735,7 @@ OLAPStatus SchemaChangeHandler::schema_version_convert(TabletSharedPtr base_tabl
     writer_context.tablet_id = new_tablet->tablet_id();
     writer_context.partition_id = (*base_rowset)->partition_id();
     writer_context.tablet_schema_hash = new_tablet->schema_hash();
+    writer_context.data_dir = new_tablet->data_dir();
     writer_context.rowset_type = (*base_rowset)->rowset_meta()->rowset_type();
     if (new_tablet->tablet_meta()->preferred_rowset_type() == BETA_ROWSET) {
         writer_context.rowset_type = BETA_ROWSET;
@@ -1878,6 +1880,7 @@ OLAPStatus SchemaChangeHandler::_convert_historical_rowsets(const SchemaChangePa
         writer_context.tablet_id = new_tablet->tablet_id();
         writer_context.partition_id = new_tablet->partition_id();
         writer_context.tablet_schema_hash = new_tablet->schema_hash();
+        writer_context.data_dir = new_tablet->data_dir();
         // linked schema change can't change rowset type, therefore we preserve rowset type in schema change now
         writer_context.rowset_type = rs_reader->rowset()->rowset_meta()->rowset_type();
         if (sc_params.new_tablet->tablet_meta()->preferred_rowset_type() == BETA_ROWSET) {
diff --git a/be/src/olap/tablet.cpp b/be/src/olap/tablet.cpp
index 6516b81617..90b70672d1 100644
--- a/be/src/olap/tablet.cpp
+++ b/be/src/olap/tablet.cpp
@@ -1333,11 +1333,15 @@ Status Tablet::prepare_compaction_and_calculate_permits(CompactionType compactio
         OLAPStatus res = _cumulative_compaction->prepare_compact();
         if (res != OLAP_SUCCESS) {
             set_last_cumu_compaction_failure_time(UnixMillis());
+            *permits = 0;
             if (res != OLAP_ERR_CUMULATIVE_NO_SUITABLE_VERSION) {
                 DorisMetrics::instance()->cumulative_compaction_request_failed->increment(1);
+                return Status::InternalError(fmt::format("prepare cumulative compaction with err: {}", res));
             }
-            *permits = 0;
-            return Status::InternalError(fmt::format("prepare compaction with err: {}", res));
+            // return OK if OLAP_ERR_CUMULATIVE_NO_SUITABLE_VERSION, so that we don't need to
+            // print too much useless logs.
+            // And because we set permits to 0, so even if we return OK here, nothing will be done.
+            return Status::OK();
         }
         compaction_rowsets = _cumulative_compaction->get_input_rowsets();
     } else {
@@ -1358,11 +1362,15 @@ Status Tablet::prepare_compaction_and_calculate_permits(CompactionType compactio
         OLAPStatus res = _base_compaction->prepare_compact();
         if (res != OLAP_SUCCESS) {
             set_last_base_compaction_failure_time(UnixMillis());
+            *permits = 0;
             if (res != OLAP_ERR_BE_NO_SUITABLE_VERSION) {
                 DorisMetrics::instance()->base_compaction_request_failed->increment(1);
+                return Status::InternalError(fmt::format("prepare base compaction with err: {}", res));
             }
-            *permits = 0;
-            return Status::InternalError(fmt::format("prepare compaction with err: {}", res));
+            // return OK if OLAP_ERR_BE_NO_SUITABLE_VERSION, so that we don't need to
+            // print too much useless logs.
+            // And because we set permits to 0, so even if we return OK here, nothing will be done.
+            return Status::OK();
         }
         compaction_rowsets = _base_compaction->get_input_rowsets();
     }
diff --git a/be/src/olap/tablet_manager.cpp b/be/src/olap/tablet_manager.cpp
index e997d74101..31c74c650f 100644
--- a/be/src/olap/tablet_manager.cpp
+++ b/be/src/olap/tablet_manager.cpp
@@ -1195,6 +1195,7 @@ OLAPStatus TabletManager::_create_initial_rowset_unlocked(const TCreateTabletReq
             context.tablet_id = tablet->tablet_id();
             context.partition_id = tablet->partition_id();
             context.tablet_schema_hash = tablet->schema_hash();
+            context.data_dir = tablet->data_dir();
             if (!request.__isset.storage_format ||
                 request.storage_format == TStorageFormat::DEFAULT) {
                 context.rowset_type = StorageEngine::instance()->default_rowset_type();


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org