You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by mo...@apache.org on 2022/12/24 09:09:53 UTC

[doris] 01/15: [enhancement] (streamload) allow table in url when do two-phase commit (#15246) (#15248)

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch branch-1.2-lts
in repository https://gitbox.apache.org/repos/asf/doris.git

commit b89c88bccb3d8860fa83d41ea235acfa7105991b
Author: zhengyu <fr...@gmail.com>
AuthorDate: Thu Dec 22 17:00:51 2022 +0800

    [enhancement] (streamload) allow table in url when do two-phase commit (#15246) (#15248)
    
    Make it works even if user provide us with (unnecessary) table info in url.
    i.e. `curl -X PUT --location-trusted -u user:passwd -H "txn_id:18036" -H \
    "txn_operation:commit" http://fe_host:http_port/api/{db}/{table}/_stream_load_2pc`
    can still works!
    
    Signed-off-by: freemandealer <fr...@gmail.com>
---
 be/src/service/http_service.cpp                    |  2 ++
 .../import/import-way/stream-load-manual.md        | 38 ++++++++++++----------
 .../import/import-way/stream-load-manual.md        | 24 ++++++++------
 .../org/apache/doris/httpv2/rest/LoadAction.java   |  9 +++++
 4 files changed, 46 insertions(+), 27 deletions(-)

diff --git a/be/src/service/http_service.cpp b/be/src/service/http_service.cpp
index 24045a10df..b62e54e6b1 100644
--- a/be/src/service/http_service.cpp
+++ b/be/src/service/http_service.cpp
@@ -66,6 +66,8 @@ Status HttpService::start() {
     StreamLoad2PCAction* streamload_2pc_action = _pool.add(new StreamLoad2PCAction(_env));
     _ev_http_server->register_handler(HttpMethod::PUT, "/api/{db}/_stream_load_2pc",
                                       streamload_2pc_action);
+    _ev_http_server->register_handler(HttpMethod::PUT, "/api/{db}/{table}/_stream_load_2pc",
+                                      streamload_2pc_action);
 
     // register download action
     std::vector<std::string> allow_paths;
diff --git a/docs/en/docs/data-operate/import/import-way/stream-load-manual.md b/docs/en/docs/data-operate/import/import-way/stream-load-manual.md
index e9917e160f..51103df3b9 100644
--- a/docs/en/docs/data-operate/import/import-way/stream-load-manual.md
+++ b/docs/en/docs/data-operate/import/import-way/stream-load-manual.md
@@ -5,7 +5,7 @@
 }
 ---
 
-<!-- 
+<!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
@@ -108,7 +108,7 @@ Stream load uses HTTP protocol, so all parameters related to import tasks are se
 
 
 + column_separator
-  
+
     Used to specify the column separator in the load file. The default is `\t`. If it is an invisible character, you need to add `\x` as a prefix and hexadecimal to indicate the separator.
 
     For example, the separator `\x01` of the hive file needs to be specified as `-H "column_separator:\x01"`.
@@ -116,7 +116,7 @@ Stream load uses HTTP protocol, so all parameters related to import tasks are se
     You can use a combination of multiple characters as the column separator.
 
 + line_delimiter
-  
+
    Used to specify the line delimiter in the load file. The default is `\n`.
 
    You can use a combination of multiple characters as the column separator.
@@ -150,13 +150,13 @@ The number of rows in the original file = `dpp.abnorm.ALL + dpp.norm.ALL`
     The function transformation configuration of data to be imported includes the sequence change of columns and the expression transformation, in which the expression transformation method is consistent with the query statement.
 
     ```
-    Examples of column order transformation: There are three columns of original data (src_c1,src_c2,src_c3), and there are also three columns (dst_c1,dst_c2,dst_c3) in the doris table at present. 
+    Examples of column order transformation: There are three columns of original data (src_c1,src_c2,src_c3), and there are also three columns (dst_c1,dst_c2,dst_c3) in the doris table at present.
     when the first column src_c1 of the original file corresponds to the dst_c1 column of the target table, while the second column src_c2 of the original file corresponds to the dst_c2 column of the target table and the third column src_c3 of the original file corresponds to the dst_c3 column of the target table,which is written as follows:
     columns: dst_c1, dst_c2, dst_c3
-	
+
     when the first column src_c1 of the original file corresponds to the dst_c2 column of the target table, while the second column src_c2 of the original file corresponds to the dst_c3 column of the target table and the third column src_c3 of the original file corresponds to the dst_c1 column of the target table,which is written as follows:
     columns: dst_c2, dst_c3, dst_c1
-	
+
     Example of expression transformation: There are two columns in the original file and two columns in the target table (c1, c2). However, both columns in the original file need to be transformed by functions to correspond to the two columns in the target table.
     columns: tmp_c1, tmp_c2, c1 = year(tmp_c1), c2 = mouth(tmp_c2)
     Tmp_* is a placeholder, representing two original columns in the original file.
@@ -201,17 +201,21 @@ The number of rows in the original file = `dpp.abnorm.ALL + dpp.norm.ALL`
       "CommitAndPublishTimeMs": 0
   }
   ```
-    2. Trigger the commit operation on the transaction
+    2. Trigger the commit operation on the transaction.
+    Note 1) requesting to fe and be both works
+    Note 2) `{table}` in url can be omit when commit
   ```shell
-  curl -X PUT --location-trusted -u user:passwd  -H "txn_id:18036" -H "txn_operation:commit"  http://fe_host:http_port/api/{db}/_stream_load_2pc
+  curl -X PUT --location-trusted -u user:passwd  -H "txn_id:18036" -H "txn_operation:commit"  http://fe_host:http_port/api/{db}/{table}/_stream_load_2pc
   {
       "status": "Success",
       "msg": "transaction [18036] commit successfully."
   }
   ```
     3. Trigger an abort operation on a transaction
+    Note 1) requesting to fe and be both works
+    Note 2) `{table}` in url can be omit when abort
   ```shell
-  curl -X PUT --location-trusted -u user:passwd  -H "txn_id:18037" -H "txn_operation:abort"  http://fe_host:http_port/api/{db}/_stream_load_2pc
+  curl -X PUT --location-trusted -u user:passwd  -H "txn_id:18037" -H "txn_operation:abort"  http://fe_host:http_port/api/{db}/{table}/_stream_load_2pc
   {
       "status": "Success",
       "msg": "transaction [18037] abort successfully."
@@ -261,7 +265,7 @@ The following main explanations are given for the Stream load import result para
 	"Label Already Exists": Label duplicate, need to be replaced Label.
 
 	"Fail": Import failed.
-	
+
 + ExistingJobStatus: The state of the load job corresponding to the existing Label.
 
     This field is displayed only when the status is "Label Already Exists". The user can know the status of the load job corresponding to Label through this state. "RUNNING" means that the job is still executing, and "FINISHED" means that the job is successful.
@@ -283,7 +287,7 @@ The following main explanations are given for the Stream load import result para
 + BeginTxnTimeMs: The time cost for RPC to Fe to begin a transaction, Unit milliseconds.
 
 + StreamLoadPutTimeMs: The time cost for RPC to Fe to get a stream load plan, Unit milliseconds.
-  
+
 + ReadDataTimeMs: Read data time, Unit milliseconds.
 
 + WriteDataTimeMs: Write data time, Unit milliseconds.
@@ -387,21 +391,21 @@ Cluster situation: The concurrency of Stream load is not affected by cluster siz
 		To sort out the possible methods mentioned above: Search FE Master's log with Label to see if there are two ``redirect load action to destination = ``redirect load action to destination cases in the same Label. If so, the request is submitted repeatedly by the Client side.
 
 		It is recommended that the user calculate the approximate import time based on the amount of data currently requested, and change the request overtime on the client side to a value greater than the import timeout time according to the import timeout time to avoid multiple submissions of the request by the client side.
-		
+
 	3. Connection reset abnormal
-	
+
 	  In the community version 0.14.0 and earlier versions, the connection reset exception occurred after Http V2 was enabled, because the built-in web container is tomcat, and Tomcat has pits in 307 (Temporary Redirect). There is a problem with the implementation of this protocol. All In the case of using Stream load to import a large amount of data, a connect reset exception will occur. This is because tomcat started data transmission before the 307 jump, which resulted in the lack of aut [...]
-	
+
 	  After the upgrade, also upgrade the http client version of your program to `4.5.13`,Introduce the following dependencies in your pom.xml file
-	
+
 	  ```xml
 	      <dependency>
 	        <groupId>org.apache.httpcomponents</groupId>
 	        <artifactId>httpclient</artifactId>
 	        <version>4.5.13</version>
-	      </dependency>  
+	      </dependency>
 	  ```
-	
+
 
 ## More Help
 
diff --git a/docs/zh-CN/docs/data-operate/import/import-way/stream-load-manual.md b/docs/zh-CN/docs/data-operate/import/import-way/stream-load-manual.md
index 816ab3b46b..0d4526ce20 100644
--- a/docs/zh-CN/docs/data-operate/import/import-way/stream-load-manual.md
+++ b/docs/zh-CN/docs/data-operate/import/import-way/stream-load-manual.md
@@ -5,7 +5,7 @@
 }
 ---
 
-<!-- 
+<!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
@@ -78,7 +78,7 @@ Stream Load 通过 HTTP 协议提交和传输数据。这里通过 `curl` 命令
 ```shell
 curl --location-trusted -u user:passwd [-H ""...] -T data.file -XPUT http://fe_host:http_port/api/{db}/{table}/_stream_load
 
-# Header 中支持属性见下面的 ‘导入任务参数’ 说明 
+# Header 中支持属性见下面的 ‘导入任务参数’ 说明
 # 格式为: -H "key1:value1"
 ```
 
@@ -152,13 +152,13 @@ Stream Load 由于使用的是 HTTP 协议,所以所有导入任务有关的
 
   ```text
   列顺序变换例子:原始数据有三列(src_c1,src_c2,src_c3), 目前doris表也有三列(dst_c1,dst_c2,dst_c3)
-  
+
   如果原始表的src_c1列对应目标表dst_c1列,原始表的src_c2列对应目标表dst_c2列,原始表的src_c3列对应目标表dst_c3列,则写法如下:
   columns: dst_c1, dst_c2, dst_c3
-  
+
   如果原始表的src_c1列对应目标表dst_c2列,原始表的src_c2列对应目标表dst_c3列,原始表的src_c3列对应目标表dst_c1列,则写法如下:
   columns: dst_c2, dst_c3, dst_c1
-  
+
   表达式变换例子:原始文件有两列,目标表也有两列(c1,c2)但是原始文件的两列均需要经过函数变换才能对应目标表的两列,则写法如下:
   columns: tmp_c1, tmp_c2, c1 = year(tmp_c1), c2 = month(tmp_c2)
   其中 tmp_*是一个占位符,代表的是原始文件中的两个原始列。
@@ -186,10 +186,10 @@ Stream Load 由于使用的是 HTTP 协议,所以所有导入任务有关的
 
   默认的两阶段批量事务提交为关闭。
 
-  > **开启方式:** 在be.conf中配置`disable_stream_load_2pc=false` 并且 在 HEADER 中声明 `two_phase_commit=true` 。 
-  
+  > **开启方式:** 在be.conf中配置`disable_stream_load_2pc=false` 并且 在 HEADER 中声明 `two_phase_commit=true` 。
+
   示例:
-  
+
   1. 发起stream load预提交操作
   ```shell
   curl  --location-trusted -u user:passwd -H "two_phase_commit:true" -T test.txt http://fe_host:http_port/api/{db}/{table}/_stream_load
@@ -213,16 +213,20 @@ Stream Load 由于使用的是 HTTP 协议,所以所有导入任务有关的
   }
   ```
   2. 对事务触发commit操作
+  注意1) 请求发往fe或be均可
+  注意2) commit 的时候可以省略 url 中的 `{table}`
   ```shell
-  curl -X PUT --location-trusted -u user:passwd  -H "txn_id:18036" -H "txn_operation:commit"  http://fe_host:http_port/api/{db}/_stream_load_2pc
+  curl -X PUT --location-trusted -u user:passwd  -H "txn_id:18036" -H "txn_operation:commit"  http://fe_host:http_port/api/{db}/{table}/_stream_load_2pc
   {
       "status": "Success",
       "msg": "transaction [18036] commit successfully."
   }
   ```
   3. 对事务触发abort操作
+  注意1) 请求发往fe或be均可
+  注意2) abort 的时候可以省略 url 中的 `{table}`
   ```shell
-  curl -X PUT --location-trusted -u user:passwd  -H "txn_id:18037" -H "txn_operation:abort"  http://fe_host:http_port/api/{db}/_stream_load_2pc
+  curl -X PUT --location-trusted -u user:passwd  -H "txn_id:18037" -H "txn_operation:abort"  http://fe_host:http_port/api/{db}/{table}/_stream_load_2pc
   {
       "status": "Success",
       "msg": "transaction [18037] abort successfully."
diff --git a/fe/fe-core/src/main/java/org/apache/doris/httpv2/rest/LoadAction.java b/fe/fe-core/src/main/java/org/apache/doris/httpv2/rest/LoadAction.java
index 0a9313e925..b5ca33b058 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/httpv2/rest/LoadAction.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/httpv2/rest/LoadAction.java
@@ -85,6 +85,15 @@ public class LoadAction extends RestBaseController {
         return executeStreamLoad2PC(request, db);
     }
 
+    @RequestMapping(path = "/api/{" + DB_KEY + "}/{" + TABLE_KEY + "}/_stream_load_2pc", method = RequestMethod.PUT)
+    public Object streamLoad2PC_table(HttpServletRequest request,
+                                      HttpServletResponse response,
+                                      @PathVariable(value = DB_KEY) String db,
+                                      @PathVariable(value = TABLE_KEY) String table) {
+        executeCheckPassword(request, response);
+        return executeStreamLoad2PC(request, db);
+    }
+
     // Same as Multi load, to be compatible with http v1's response body,
     // we return error by using RestBaseResult.
     private Object executeWithoutPassword(HttpServletRequest request,


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org