You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/08/24 12:45:47 UTC

[GitHub] [incubator-doris] HappenLee opened a new pull request #4438: [ODBC SCAN NODE] 4/4 Add ODBC_SCAN_NODE and Odbc_Scanner in BE and add ODBC_SCAN_NODE docs

HappenLee opened a new pull request #4438:
URL: https://github.com/apache/incubator-doris/pull/4438


   
   #4376  issue
   ## Proposed changes
   
   Describe the big picture of your changes here to communicate to the maintainers why we should accept this pull request. If it fixes a bug or resolves a feature request, be sure to link to that issue.
   
   ## Types of changes
   
   What types of changes does your code introduce to Doris?
   _Put an `x` in the boxes that apply_
   
   - [] Bugfix (non-breaking change which fixes an issue)
   - [x] New feature (non-breaking change which adds functionality)
   - [] Breaking change (fix or feature that would cause existing functionality to not work as expected)
   - [] Documentation Update (if none of the other choices apply)
   - [] Code refactor (Modify the code structure, format the code, etc...)
   
   ## Checklist
   
   _Put an `x` in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code._
   
   - [x] I have create an issue on (Fix #ISSUE), and have described the bug/feature there in detail
   - [x] Compiling and unit tests pass locally with my changes
   - [x] I have added tests that prove my fix is effective or that my feature works
   - [x] If this change need a document change, I have updated the document
   - [x] Any dependent changes have been merged
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] HappenLee commented on a change in pull request #4438: [ODBC SCAN NODE] 4/4 Add ODBC_SCAN_NODE and Odbc_Scanner in BE and add ODBC_SCAN_NODE docs

Posted by GitBox <gi...@apache.org>.
HappenLee commented on a change in pull request #4438:
URL: https://github.com/apache/incubator-doris/pull/4438#discussion_r477155972



##########
File path: docs/zh-CN/extending-doris/odbc-of-doris.md
##########
@@ -0,0 +1,172 @@
+---
+{
+    "title": "ODBC of Doris",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# ODBC External Table Of Doris
+
+ODBC External Table Of Doris 提供了Doris通过数据库访问的标准接口(ODBC)来访问外部表,外部表省去了繁琐的数据导入工作,让Doris可以具有了访问各式数据库的能力,并借助Doris本身的OLAP的能力来解决外部表的数据分析问题:
+ 
+ 1. 支持各种数据源接入Doris
+ 2. 支持Doris与各种数据源中的表联合查询,进行更加复杂的分析操作
+
+本文档主要介绍该功能的实现原理、使用方式等。
+
+## 名词解释
+
+### Doirs相关
+* FE:Frontend,Doris 的前端节点,负责元数据管理和请求接入
+* BE:Backend,Doris 的后端节点,负责查询执行和数据存储
+
+## 使用方法
+
+### Doris中创建ODBC的外表
+
+```
+CREATE EXTERNAL TABLE `baseall_oracle` (
+  `k1` decimal(9, 3) NOT NULL COMMENT "",
+  `k2` char(10) NOT NULL COMMENT "",
+  `k3` datetime NOT NULL COMMENT "",
+  `k5` varchar(20) NOT NULL COMMENT "",
+  `k6` double NOT NULL COMMENT ""
+) ENGINE=ODBC
+COMMENT "ODBC"
+PROPERTIES (
+"host" = "192.168.0.1",
+"port" = "8086",
+"user" = "test",
+"password" = "test",
+"database" = "test",
+"table" = "baseall",
+"driver" = "Oracle 19 ODBC driver",
+"type" = "oracle"
+);
+```
+
+参数说明:
+
+参数 | 说明
+---|---
+**hosts** | 外表数据库的IP地址
+**driver** | ODBC外表的Driver名,该名字需要和be/conf/odbcinst.ini中的Driver名一致。
+**type** | 外表数据库的类型,当前支持oracle与mysql
+**user** | 外表数据库的用户名
+**password** | 对应用户的密码信息
+
+
+##### ODBC Driver的安装和配置
+各大主流数据库都会提供ODBC的访问Driver,用户可以执行参照参照各数据库官方推荐的方式安装对应的ODBC Driver LiB库。
+
+
+安装完成之后,查找对应的数据库的Driver Lib库的路径,并且修改be/conf/odbcinst.ini的配置:
+```
+[MySQL Driver]
+Description     = ODBC for MySQL
+Driver          = /usr/lib64/libmyodbc8w.so
+FileUsage       = 1 
+```
+* 上述配置`[]`里的对应的是Driver名,在建立外部表时需要保持外部表的Driver名和配置文件之中的一致。
+* `Driver=`  这个要根据实际BE安装Driver的路径来填写,本质上就是一个动态库的路径,这里需要保证该动态库的前置依赖都被满足。
+
+**切记,这里要求所有的BE节点都安装上相同的Driver,并且安装路径相同,同时有相同的be/conf/odbcinst.ini的配置。**
+
+
+### 查询用法
+
+完成在Doris中建立ODBC外表后,除了无法使用Doris中的数据模型(rollup、预聚合、物化视图等)外,与普通的Doris表并无区别
+
+
+```
+select * from oracle_table where k1 > 1000 and k3 ='term' or k4 like '%doris'
+```
+
+
+
+## 类型匹配
+
+各个数据之间数据类型存在不同,这里列出了各个数据库中的类型和Doris之中数据类型匹配的情况。
+
+### MySQL类型
+
+|  MySQL  | Doris  |             替换方案              |
+| :------: | :----: | :-------------------------------: |
+|  BOOLEAN  | BOOLEAN  |                         |
+|   CHAR   |  CHAR  |            当前仅支持UTF8编码            |
+| VARCHAR | VARCHAR |       当前仅支持UTF8编码       |
+|   DATE   |  DATE  |                                   |
+|  FLOAT   |  FLOAT  |                                   |
+|   TINYINT   | TINYINT |  |
+|   SMALLINT  | SMALLINT |  |
+|   INT  | INT |  |
+|   BIGINT  | BIGINT |  |
+|   DOUBLE  | DOUBLE |  |
+|   DATE  | DATE |  |
+|   DECIMAL  | DECIMAL |  |
+

Review comment:
       mysql now do not support hll, i will add datetime type in doc




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] HappenLee commented on a change in pull request #4438: [ODBC SCAN NODE] 4/4 Add ODBC_SCAN_NODE and Odbc_Scanner in BE and add ODBC_SCAN_NODE docs

Posted by GitBox <gi...@apache.org>.
HappenLee commented on a change in pull request #4438:
URL: https://github.com/apache/incubator-doris/pull/4438#discussion_r477156203



##########
File path: be/src/exec/odbc_scanner.cpp
##########
@@ -0,0 +1,248 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <boost/algorithm/string.hpp>
+#include <codecvt>
+#include <sqlext.h>
+
+#include "exec/odbc_scanner.h"
+#include "common/logging.h"
+#include "runtime/primitive_type.h"
+
+#define ODBC_DISPOSE(h, ht, x, op) { auto rc = x;\
+                                if (rc != SQL_SUCCESS && rc != SQL_SUCCESS_WITH_INFO) \
+                                { \
+                                    return error_status(op, handle_diagnostic_record(h, ht, rc)); \
+                                } \
+                                if (rc == SQL_ERROR) \
+                                { \
+                                    auto err_msg = std::string("Errro in") + std::string(op); \
+                                    return Status::InternalError(err_msg.c_str()); \
+                                }  \
+                            } \
+
+static constexpr uint32_t SMALL_COLUMN_SIZE_BUFFER = 100;
+// Now we only treat HLL, CHAR, VARCHAR as big column
+static constexpr uint32_t BIG_COLUMN_SIZE_BUFFER = 65535;
+
+namespace doris {
+
+ODBCScanner::ODBCScanner(const ODBCScannerParam& param)
+        : _connect_string(build_connect_string(param)),
+          _type(param.type),
+          _tuple_desc(param.tuple_desc),
+          _is_open(false),
+          _field_num(0),
+          _row_count(0),
+          _env(nullptr),
+          _dbc(nullptr),
+          _stmt(nullptr) {
+}
+
+ODBCScanner::~ODBCScanner() {
+    if (_stmt != nullptr) {
+        SQLFreeHandle(SQL_HANDLE_STMT, _stmt);
+    }
+
+    if (_dbc != nullptr) {
+        SQLDisconnect(_dbc);
+        SQLFreeHandle(SQL_HANDLE_DBC, _dbc);
+    }
+
+    if (_env != nullptr) {
+        SQLFreeHandle(SQL_HANDLE_ENV, _env);
+    }
+}
+
+Status ODBCScanner::open() {
+    if (_is_open) {
+        LOG(INFO) << "this scanner already opened";
+        return Status::OK();
+    }
+
+    // Allocate an environment
+    if (SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &_env) != SQL_SUCCESS) {
+        return Status::InternalError("alloc env failed");
+    }
+    // We want ODBC 3 support
+    ODBC_DISPOSE(_env, SQL_HANDLE_ENV, SQLSetEnvAttr(_env, SQL_ATTR_ODBC_VERSION, (void *) SQL_OV_ODBC3, 0), "set env attr");
+    // Allocate a connection handle
+    ODBC_DISPOSE(_env, SQL_HANDLE_ENV, SQLAllocHandle(SQL_HANDLE_DBC, _env, &_dbc), "alloc dbc");
+    // Connect to the Database
+    ODBC_DISPOSE(_dbc, SQL_HANDLE_DBC, SQLDriverConnect(_dbc, NULL, (SQLCHAR*)_connect_string.c_str(), SQL_NTS,
+                           NULL, 0, NULL, SQL_DRIVER_COMPLETE_REQUIRED), "driver connect");
+
+    LOG(INFO) << "connect success:";

Review comment:
       ok




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] HappenLee commented on a change in pull request #4438: [ODBC SCAN NODE] 4/4 Add ODBC_SCAN_NODE and Odbc_Scanner in BE and add ODBC_SCAN_NODE docs

Posted by GitBox <gi...@apache.org>.
HappenLee commented on a change in pull request #4438:
URL: https://github.com/apache/incubator-doris/pull/4438#discussion_r477109250



##########
File path: docs/zh-CN/extending-doris/odbc-of-doris.md
##########
@@ -0,0 +1,172 @@
+---
+{
+    "title": "ODBC of Doris",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# ODBC External Table Of Doris
+
+ODBC External Table Of Doris 提供了Doris通过数据库访问的标准接口(ODBC)来访问外部表,外部表省去了繁琐的数据导入工作,让Doris可以具有了访问各式数据库的能力,并借助Doris本身的OLAP的能力来解决外部表的数据分析问题:
+ 
+ 1. 支持各种数据源接入Doris
+ 2. 支持Doris与各种数据源中的表联合查询,进行更加复杂的分析操作
+
+本文档主要介绍该功能的实现原理、使用方式等。
+
+## 名词解释
+
+### Doirs相关
+* FE:Frontend,Doris 的前端节点,负责元数据管理和请求接入
+* BE:Backend,Doris 的后端节点,负责查询执行和数据存储
+
+## 使用方法
+
+### Doris中创建ODBC的外表
+
+```
+CREATE EXTERNAL TABLE `baseall_oracle` (
+  `k1` decimal(9, 3) NOT NULL COMMENT "",
+  `k2` char(10) NOT NULL COMMENT "",
+  `k3` datetime NOT NULL COMMENT "",
+  `k5` varchar(20) NOT NULL COMMENT "",
+  `k6` double NOT NULL COMMENT ""
+) ENGINE=ODBC
+COMMENT "ODBC"
+PROPERTIES (
+"host" = "192.168.0.1",
+"port" = "8086",
+"user" = "test",
+"password" = "test",
+"database" = "test",
+"table" = "baseall",
+"driver" = "Oracle 19 ODBC driver",
+"type" = "oracle"
+);
+```
+
+参数说明:
+
+参数 | 说明
+---|---
+**hosts** | 外表数据库的IP地址
+**driver** | ODBC外表的Driver名,该名字需要和be/conf/odbcinst.ini中的Driver名一致。
+**type** | 外表数据库的类型,当前支持oracle与mysql
+**user** | 外表数据库的用户名
+**password** | 对应用户的密码信息
+
+
+##### ODBC Driver的安装和配置
+各大主流数据库都会提供ODBC的访问Driver,用户可以执行参照参照各数据库官方推荐的方式安装对应的ODBC Driver LiB库。
+
+
+安装完成之后,查找对应的数据库的Driver Lib库的路径,并且修改be/conf/odbcinst.ini的配置:
+```
+[MySQL Driver]
+Description     = ODBC for MySQL
+Driver          = /usr/lib64/libmyodbc8w.so
+FileUsage       = 1 
+```
+* 上述配置`[]`里的对应的是Driver名,在建立外部表时需要保持外部表的Driver名和配置文件之中的一致。
+* `Driver=`  这个要根据实际BE安装Driver的路径来填写,本质上就是一个动态库的路径,这里需要保证该动态库的前置依赖都被满足。
+
+**切记,这里要求所有的BE节点都安装上相同的Driver,并且安装路径相同,同时有相同的be/conf/odbcinst.ini的配置。**
+
+
+### 查询用法
+
+完成在Doris中建立ODBC外表后,除了无法使用Doris中的数据模型(rollup、预聚合、物化视图等)外,与普通的Doris表并无区别
+
+
+```
+select * from oracle_table where k1 > 1000 and k3 ='term' or k4 like '%doris'
+```
+
+
+
+## 类型匹配
+
+各个数据之间数据类型存在不同,这里列出了各个数据库中的类型和Doris之中数据类型匹配的情况。
+
+### MySQL类型
+
+|  MySQL  | Doris  |             替换方案              |
+| :------: | :----: | :-------------------------------: |
+|  BOOLEAN  | BOOLEAN  |                         |
+|   CHAR   |  CHAR  |            当前仅支持UTF8编码            |
+| VARCHAR | VARCHAR |       当前仅支持UTF8编码       |
+|   DATE   |  DATE  |                                   |
+|  FLOAT   |  FLOAT  |                                   |
+|   TINYINT   | TINYINT |  |
+|   SMALLINT  | SMALLINT |  |
+|   INT  | INT |  |
+|   BIGINT  | BIGINT |  |
+|   DOUBLE  | DOUBLE |  |
+|   DATE  | DATE |  |
+|   DECIMAL  | DECIMAL |  |
+
+### Oracle类型                          
+
+|  Oracle  | Doris  |             替换方案              |
+| :------: | :----: | :-------------------------------: |
+|  不支持 | BOOLEAN  |          Oracle可用number(1) 替换boolean               |
+|   CHAR   |  CHAR  |                       |
+| VARCHAR | VARCHAR |              |
+|   DATE   |  DATE  |                                   |
+|  FLOAT   |  FLOAT  |                                   |
+|  无   | TINYINT | Oracle可由NUMMBER替换 |
+|   SMALLINT  | SMALLINT |  |
+|   INT  | INT |  |
+|   无  | BIGINT |  Oracle可由NUMMBER替换 |
+|   无  | DOUBLE | Oracle可由NUMMBER替换 |
+|   DATE  | DATE |  |
+|   NUMBER  | DECIMAL |  |
+
+## Q&A
+
+1. 与原先的MySQL外表的关系
+
+    在接入ODBC外表之后,原先的访问MySQL外表的方式将被逐渐弃用。如果之前没有使用过MySQL外表,建议新接入的MySQL表直接使用ODBC的MySQL外表。
+    
+2. 除了MySQL和Oracle,是否能够支持更多的数据库
+
+    目前Doris只适配了MySQL和Oracle,关于其他的数据库的适配工作正在规划之中,原则上来说任何支持ODBC访问的数据库都能通过ODBC外表来访问。如果您有访问其他外表的需求,欢迎修改代码并贡献给Doris。
+
+3. 什么场合适合通过外表访问
+
+    通常在外表数据量较小,少于100W条时,可以通过外部表的方式访问。由于外表无法发挥Doris在存储引擎部分的能力和会带来额外的网络开销,所以建议根据实际对查询的访问时延要求来确定是否通过外部表访问还是将数据导入Doris之中。
+
+4. 通过Oracle访问出现乱码
+
+   尝试在BE启动脚本之中添加如下参数:`export NLS_LANG=AMERICAN_AMERICA.AL32UTF8`, 并重新启动所有BE

Review comment:
       不会有什么影响,但是原则上不使用oracle数据库的话是用不到这个环境变量的




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #4438: [ODBC SCAN NODE] 4/4 Add ODBC_SCAN_NODE and Odbc_Scanner in BE and add ODBC_SCAN_NODE docs

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #4438:
URL: https://github.com/apache/incubator-doris/pull/4438#discussion_r492513438



##########
File path: be/src/exec/odbc_scan_node.h
##########
@@ -0,0 +1,96 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef  DORIS_BE_SRC_QUERY_EXEC_ODBC_SCAN_NODE_H
+#define  DORIS_BE_SRC_QUERY_EXEC_ODBC_SCAN_NODE_H
+
+#include <memory>
+
+#include "runtime/descriptors.h"
+#include "exec/scan_node.h"
+#include "exec/odbc_scanner.h"
+
+namespace doris {
+
+class TextConverter;
+class Tuple;
+class TupleDescriptor;
+class RuntimeState;
+class MemPool;
+class Status;
+
+class OdbcScanNode : public ScanNode {
+public:
+    OdbcScanNode(ObjectPool* pool, const TPlanNode& tnode, const DescriptorTbl& descs);
+    ~OdbcScanNode();
+
+    // initialize _mysql_scanner, and create _text_converter.
+    virtual Status prepare(RuntimeState* state);
+
+    // Start MySQL scan using _mysql_scanner.
+    virtual Status open(RuntimeState* state);
+
+    // Fill the next row batch by calling next() on the _mysql_scanner,
+    // converting text data in MySQL cells to binary data.
+    virtual Status get_next(RuntimeState* state, RowBatch* row_batch, bool* eos);
+
+    // Close the _mysql_scanner, and report errors.

Review comment:
       Modify the comment

##########
File path: be/src/exec/odbc_scanner.cpp
##########
@@ -0,0 +1,257 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <boost/algorithm/string.hpp>
+#include <codecvt>
+#include <sqlext.h>
+
+#include "exec/odbc_scanner.h"
+#include "common/logging.h"
+#include "runtime/primitive_type.h"
+
+#define ODBC_DISPOSE(h, ht, x, op) { auto rc = x;\
+                                if (rc != SQL_SUCCESS && rc != SQL_SUCCESS_WITH_INFO) \
+                                { \
+                                    return error_status(op, handle_diagnostic_record(h, ht, rc)); \
+                                } \
+                                if (rc == SQL_ERROR) \
+                                { \
+                                    auto err_msg = std::string("Errro in") + std::string(op); \
+                                    return Status::InternalError(err_msg.c_str()); \
+                                }  \
+                            } \
+
+static constexpr uint32_t SMALL_COLUMN_SIZE_BUFFER = 100;
+// Now we only treat HLL, CHAR, VARCHAR as big column
+static constexpr uint32_t BIG_COLUMN_SIZE_BUFFER = 65535;
+
+static std::u16string utf8_to_wstring (const std::string& str)

Review comment:
       ```suggestion
   static std::u16string utf8_to_wstring(const std::string& str)
   ```

##########
File path: docs/zh-CN/extending-doris/odbc-of-doris.md
##########
@@ -0,0 +1,217 @@
+---
+{
+    "title": "ODBC of Doris",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# ODBC External Table Of Doris
+
+ODBC External Table Of Doris 提供了Doris通过数据库访问的标准接口(ODBC)来访问外部表,外部表省去了繁琐的数据导入工作,让Doris可以具有了访问各式数据库的能力,并借助Doris本身的OLAP的能力来解决外部表的数据分析问题:
+
+ 1. 支持各种数据源接入Doris
+ 2. 支持Doris与各种数据源中的表联合查询,进行更加复杂的分析操作
+
+本文档主要介绍该功能的实现原理、使用方式等。
+
+## 名词解释
+
+### Doirs相关
+* FE:Frontend,Doris 的前端节点,负责元数据管理和请求接入
+* BE:Backend,Doris 的后端节点,负责查询执行和数据存储
+
+## 使用方法
+
+### Doris中创建ODBC的外表
+
+#### 1. 不使用Resource创建ODBC的外表
+
+```
+CREATE EXTERNAL TABLE `baseall_oracle` (
+  `k1` decimal(9, 3) NOT NULL COMMENT "",
+  `k2` char(10) NOT NULL COMMENT "",
+  `k3` datetime NOT NULL COMMENT "",
+  `k5` varchar(20) NOT NULL COMMENT "",
+  `k6` double NOT NULL COMMENT ""
+) ENGINE=ODBC
+COMMENT "ODBC"
+PROPERTIES (
+"host" = "192.168.0.1",
+"port" = "8086",
+"user" = "test",
+"password" = "test",
+"database" = "test",
+"table" = "baseall",
+"driver" = "Oracle 19 ODBC driver",
+"odbc_type" = "oracle"
+);
+```
+
+#### 2. 通过ODBC_Resource来创建ODBC外表 (推荐使用的方式)
+```
+create external resource "oracle_odbc"
+    properties 

Review comment:
       Indent




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] wutiangan commented on a change in pull request #4438: [ODBC SCAN NODE] 4/4 Add ODBC_SCAN_NODE and Odbc_Scanner in BE and add ODBC_SCAN_NODE docs

Posted by GitBox <gi...@apache.org>.
wutiangan commented on a change in pull request #4438:
URL: https://github.com/apache/incubator-doris/pull/4438#discussion_r475597949



##########
File path: be/src/exec/odbc_scan_node.cpp
##########
@@ -0,0 +1,270 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "odbc_scan_node.h"
+
+#include <sstream>
+
+#include "exec/text_converter.hpp"
+#include "gen_cpp/PlanNodes_types.h"
+#include "runtime/runtime_state.h"
+#include "runtime/row_batch.h"
+#include "runtime/string_value.h"
+#include "runtime/tuple_row.h"
+#include "util/runtime_profile.h"
+
+namespace doris {
+
+OdbcScanNode::OdbcScanNode(ObjectPool* pool, const TPlanNode& tnode,
+                             const DescriptorTbl& descs)
+        : ScanNode(pool, tnode, descs),
+          _is_init(false),
+          _table_name(tnode.odbc_scan_node.table_name),
+          _tuple_id(tnode.odbc_scan_node.tuple_id),
+          _columns(tnode.odbc_scan_node.columns),
+          _filters(tnode.odbc_scan_node.filters),
+          _tuple_desc(nullptr) {
+}
+
+OdbcScanNode::~OdbcScanNode() {
+}
+
+Status OdbcScanNode::prepare(RuntimeState* state) {
+    VLOG(1) << "OdbcScanNode::Prepare";
+
+    if (_is_init) {
+        return Status::OK();
+    }
+
+    if (NULL == state) {
+        return Status::InternalError("input pointer is NULL.");
+    }
+
+    RETURN_IF_ERROR(ScanNode::prepare(state));
+    // get tuple desc
+    _tuple_desc = state->desc_tbl().get_tuple_descriptor(_tuple_id);
+
+    if (NULL == _tuple_desc) {
+        return Status::InternalError("Failed to get tuple descriptor.");
+    }
+
+    _slot_num = _tuple_desc->slots().size();
+    // get odbc table info
+    const ODBCTableDescriptor* odbc_table =
+            static_cast<const ODBCTableDescriptor*>(_tuple_desc->table_desc());
+
+    if (NULL == odbc_table) {
+        return Status::InternalError("odbc table pointer is NULL.");
+    }
+
+    _odbc_param.host = odbc_table->host();
+    _odbc_param.port = odbc_table->port();
+    _odbc_param.user = odbc_table->user();
+    _odbc_param.passwd = odbc_table->passwd();
+    _odbc_param.db = odbc_table->db();
+    _odbc_param.drivier = odbc_table->driver();
+    _odbc_param.type = odbc_table->type();
+    _odbc_param.tuple_desc = _tuple_desc;
+
+    _odbc_scanner.reset(new (std::nothrow)ODBCScanner(_odbc_param));
+
+    if (_odbc_scanner.get() == nullptr) {
+        return Status::InternalError("new a odbc scanner failed.");
+    }
+
+    _tuple_pool.reset(new(std::nothrow) MemPool(mem_tracker().get()));
+
+    if (_tuple_pool.get() == NULL) {
+        return Status::InternalError("new a mem pool failed.");
+    }
+
+    _text_converter.reset(new(std::nothrow) TextConverter('\\'));
+
+    if (_text_converter.get() == NULL) {
+        return Status::InternalError("new a text convertor failed.");
+    }
+
+    _is_init = true;
+
+    return Status::OK();
+}
+
+Status OdbcScanNode::open(RuntimeState* state) {
+    RETURN_IF_ERROR(ExecNode::open(state));
+    VLOG(1) << "OdbcScanNode::Open";
+
+    if (NULL == state) {
+        return Status::InternalError("input pointer is NULL.");
+    }
+
+    if (!_is_init) {
+        return Status::InternalError("used before initialize.");
+    }
+
+    RETURN_IF_ERROR(exec_debug_action(TExecNodePhase::OPEN));
+    RETURN_IF_CANCELLED(state);
+    SCOPED_TIMER(_runtime_profile->total_time_counter());
+    RETURN_IF_ERROR(_odbc_scanner->open());
+    RETURN_IF_ERROR(_odbc_scanner->query(_table_name, _columns, _filters));
+    // check materialize slot num
+
+    return Status::OK();
+}
+
+Status OdbcScanNode::write_text_slot(char* value, int value_length,
+                                      SlotDescriptor* slot, RuntimeState* state) {
+    if (!_text_converter->write_slot(slot, _tuple, value, value_length,
+                                     true, false, _tuple_pool.get())) {
+        std::stringstream ss;
+        ss << "fail to convert odbc value '" << value << "' TO " << slot->type();
+        return Status::InternalError(ss.str());
+    }
+
+    return Status::OK();
+}
+
+Status OdbcScanNode::get_next(RuntimeState* state, RowBatch* row_batch, bool* eos) {
+    VLOG(1) << "OdbcScanNode::GetNext";
+
+    if (NULL == state || NULL == row_batch || NULL == eos) {
+        return Status::InternalError("input is NULL pointer");
+    }
+
+    if (!_is_init) {
+        return Status::InternalError("used before initialize.");
+    }
+
+    RETURN_IF_ERROR(exec_debug_action(TExecNodePhase::GETNEXT));
+    RETURN_IF_CANCELLED(state);
+    SCOPED_TIMER(_runtime_profile->total_time_counter());
+    SCOPED_TIMER(materialize_tuple_timer());
+
+    if (reached_limit()) {
+        *eos = true;
+        return Status::OK();
+    }
+
+    // create new tuple buffer for row_batch
+    int tuple_buffer_size = row_batch->capacity() * _tuple_desc->byte_size();
+    void* tuple_buffer = _tuple_pool->allocate(tuple_buffer_size);
+
+    if (NULL == tuple_buffer) {
+        return Status::InternalError("Allocate memory failed.");
+    }
+
+    _tuple = reinterpret_cast<Tuple*>(tuple_buffer);
+    // Indicates whether there are more rows to process. Set in _hbase_scanner.next().
+    bool odbc_eos = false;
+
+    while (true) {
+        RETURN_IF_CANCELLED(state);
+
+        if (reached_limit() || row_batch->is_full()) {
+            // hang on to last allocated chunk in pool, we'll keep writing into it in the
+            // next get_next() call
+            row_batch->tuple_data_pool()->acquire_data(_tuple_pool.get(), !reached_limit());
+            *eos = reached_limit();
+            return Status::OK();
+        }
+
+        RETURN_IF_ERROR(_odbc_scanner->get_next_row(&odbc_eos));
+
+        if (odbc_eos) {
+            row_batch->tuple_data_pool()->acquire_data(_tuple_pool.get(), false);
+            *eos = true;
+            return Status::OK();
+        }
+
+        int row_idx = row_batch->add_row();
+        TupleRow* row = row_batch->get_row(row_idx);
+        // scan node is the first tuple of tuple row
+        row->set_tuple(0, _tuple);
+        memset(_tuple, 0, _tuple_desc->num_null_bytes());
+        int j = 0;
+
+        for (int i = 0; i < _slot_num; ++i) {
+            auto slot_desc = _tuple_desc->slots()[i];
+            // because the fe planner filter the non_materialize column
+            if (!slot_desc->is_materialized()) {
+                continue;
+            }
+
+            const auto& column_data = _odbc_scanner->get_column_data(j);
+            if (column_data.strlen_or_ind == SQL_NULL_DATA) {
+                if (slot_desc->is_nullable()) {
+                    _tuple->set_null(slot_desc->null_indicator_offset());
+                } else {
+                    std::stringstream ss;
+                    ss << "nonnull column contains NULL. table=" << _table_name
+                       << ", column=" << slot_desc->col_name();
+                    return Status::InternalError(ss.str());
+                }
+            } else if (column_data.strlen_or_ind > column_data.buffer_length) {
+                std::stringstream ss;
+                ss << "nonnull column contains NULL. table=" << _table_name
+                   << ", column=" << slot_desc->col_name();
+                return Status::InternalError(ss.str());
+            } else {
+                    RETURN_IF_ERROR(
+                            write_text_slot(static_cast<char*>(column_data.target_value_ptr), column_data.strlen_or_ind, slot_desc, state));
+            }
+            j++;
+        }
+
+        // Before we fix the problem utf8 encode sql query in SQLexecDirect
+        // we need to check some filter can not encode in asii code, like chinese

Review comment:
       asii --> ascii

##########
File path: docs/zh-CN/extending-doris/odbc-of-doris.md
##########
@@ -0,0 +1,172 @@
+---
+{
+    "title": "ODBC of Doris",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# ODBC External Table Of Doris
+
+ODBC External Table Of Doris 提供了Doris通过数据库访问的标准接口(ODBC)来访问外部表,外部表省去了繁琐的数据导入工作,让Doris可以具有了访问各式数据库的能力,并借助Doris本身的OLAP的能力来解决外部表的数据分析问题:
+ 
+ 1. 支持各种数据源接入Doris
+ 2. 支持Doris与各种数据源中的表联合查询,进行更加复杂的分析操作
+
+本文档主要介绍该功能的实现原理、使用方式等。
+
+## 名词解释
+
+### Doirs相关
+* FE:Frontend,Doris 的前端节点,负责元数据管理和请求接入
+* BE:Backend,Doris 的后端节点,负责查询执行和数据存储
+
+## 使用方法
+
+### Doris中创建ODBC的外表
+
+```
+CREATE EXTERNAL TABLE `baseall_oracle` (
+  `k1` decimal(9, 3) NOT NULL COMMENT "",
+  `k2` char(10) NOT NULL COMMENT "",
+  `k3` datetime NOT NULL COMMENT "",
+  `k5` varchar(20) NOT NULL COMMENT "",
+  `k6` double NOT NULL COMMENT ""
+) ENGINE=ODBC
+COMMENT "ODBC"
+PROPERTIES (
+"host" = "192.168.0.1",
+"port" = "8086",
+"user" = "test",
+"password" = "test",
+"database" = "test",
+"table" = "baseall",
+"driver" = "Oracle 19 ODBC driver",
+"type" = "oracle"
+);
+```
+
+参数说明:
+
+参数 | 说明
+---|---
+**hosts** | 外表数据库的IP地址
+**driver** | ODBC外表的Driver名,该名字需要和be/conf/odbcinst.ini中的Driver名一致。
+**type** | 外表数据库的类型,当前支持oracle与mysql
+**user** | 外表数据库的用户名
+**password** | 对应用户的密码信息
+
+
+##### ODBC Driver的安装和配置
+各大主流数据库都会提供ODBC的访问Driver,用户可以执行参照参照各数据库官方推荐的方式安装对应的ODBC Driver LiB库。
+
+
+安装完成之后,查找对应的数据库的Driver Lib库的路径,并且修改be/conf/odbcinst.ini的配置:
+```
+[MySQL Driver]
+Description     = ODBC for MySQL
+Driver          = /usr/lib64/libmyodbc8w.so
+FileUsage       = 1 
+```
+* 上述配置`[]`里的对应的是Driver名,在建立外部表时需要保持外部表的Driver名和配置文件之中的一致。
+* `Driver=`  这个要根据实际BE安装Driver的路径来填写,本质上就是一个动态库的路径,这里需要保证该动态库的前置依赖都被满足。
+
+**切记,这里要求所有的BE节点都安装上相同的Driver,并且安装路径相同,同时有相同的be/conf/odbcinst.ini的配置。**
+
+
+### 查询用法
+
+完成在Doris中建立ODBC外表后,除了无法使用Doris中的数据模型(rollup、预聚合、物化视图等)外,与普通的Doris表并无区别
+
+
+```
+select * from oracle_table where k1 > 1000 and k3 ='term' or k4 like '%doris'
+```
+
+
+
+## 类型匹配
+
+各个数据之间数据类型存在不同,这里列出了各个数据库中的类型和Doris之中数据类型匹配的情况。
+
+### MySQL类型
+
+|  MySQL  | Doris  |             替换方案              |
+| :------: | :----: | :-------------------------------: |
+|  BOOLEAN  | BOOLEAN  |                         |
+|   CHAR   |  CHAR  |            当前仅支持UTF8编码            |
+| VARCHAR | VARCHAR |       当前仅支持UTF8编码       |
+|   DATE   |  DATE  |                                   |
+|  FLOAT   |  FLOAT  |                                   |
+|   TINYINT   | TINYINT |  |
+|   SMALLINT  | SMALLINT |  |
+|   INT  | INT |  |
+|   BIGINT  | BIGINT |  |
+|   DOUBLE  | DOUBLE |  |
+|   DATE  | DATE |  |
+|   DECIMAL  | DECIMAL |  |
+
+### Oracle类型                          
+
+|  Oracle  | Doris  |             替换方案              |
+| :------: | :----: | :-------------------------------: |
+|  不支持 | BOOLEAN  |          Oracle可用number(1) 替换boolean               |
+|   CHAR   |  CHAR  |                       |
+| VARCHAR | VARCHAR |              |
+|   DATE   |  DATE  |                                   |
+|  FLOAT   |  FLOAT  |                                   |
+|  无   | TINYINT | Oracle可由NUMMBER替换 |
+|   SMALLINT  | SMALLINT |  |
+|   INT  | INT |  |
+|   无  | BIGINT |  Oracle可由NUMMBER替换 |
+|   无  | DOUBLE | Oracle可由NUMMBER替换 |
+|   DATE  | DATE |  |
+|   NUMBER  | DECIMAL |  |
+
+## Q&A
+
+1. 与原先的MySQL外表的关系
+
+    在接入ODBC外表之后,原先的访问MySQL外表的方式将被逐渐弃用。如果之前没有使用过MySQL外表,建议新接入的MySQL表直接使用ODBC的MySQL外表。
+    
+2. 除了MySQL和Oracle,是否能够支持更多的数据库
+
+    目前Doris只适配了MySQL和Oracle,关于其他的数据库的适配工作正在规划之中,原则上来说任何支持ODBC访问的数据库都能通过ODBC外表来访问。如果您有访问其他外表的需求,欢迎修改代码并贡献给Doris。
+
+3. 什么场合适合通过外表访问
+
+    通常在外表数据量较小,少于100W条时,可以通过外部表的方式访问。由于外表无法发挥Doris在存储引擎部分的能力和会带来额外的网络开销,所以建议根据实际对查询的访问时延要求来确定是否通过外部表访问还是将数据导入Doris之中。
+
+4. 通过Oracle访问出现乱码
+
+   尝试在BE启动脚本之中添加如下参数:`export NLS_LANG=AMERICAN_AMERICA.AL32UTF8`, 并重新启动所有BE

Review comment:
       这个参数如果直接放到be的启动脚本里会有啥影响?

##########
File path: be/src/exec/odbc_scanner.cpp
##########
@@ -0,0 +1,248 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <boost/algorithm/string.hpp>
+#include <codecvt>
+#include <sqlext.h>
+
+#include "exec/odbc_scanner.h"
+#include "common/logging.h"
+#include "runtime/primitive_type.h"
+
+#define ODBC_DISPOSE(h, ht, x, op) { auto rc = x;\
+                                if (rc != SQL_SUCCESS && rc != SQL_SUCCESS_WITH_INFO) \
+                                { \
+                                    return error_status(op, handle_diagnostic_record(h, ht, rc)); \
+                                } \
+                                if (rc == SQL_ERROR) \
+                                { \
+                                    auto err_msg = std::string("Errro in") + std::string(op); \
+                                    return Status::InternalError(err_msg.c_str()); \
+                                }  \
+                            } \
+
+static constexpr uint32_t SMALL_COLUMN_SIZE_BUFFER = 100;
+// Now we only treat HLL, CHAR, VARCHAR as big column
+static constexpr uint32_t BIG_COLUMN_SIZE_BUFFER = 65535;
+
+namespace doris {
+
+ODBCScanner::ODBCScanner(const ODBCScannerParam& param)
+        : _connect_string(build_connect_string(param)),
+          _type(param.type),
+          _tuple_desc(param.tuple_desc),
+          _is_open(false),
+          _field_num(0),
+          _row_count(0),
+          _env(nullptr),
+          _dbc(nullptr),
+          _stmt(nullptr) {
+}
+
+ODBCScanner::~ODBCScanner() {
+    if (_stmt != nullptr) {
+        SQLFreeHandle(SQL_HANDLE_STMT, _stmt);
+    }
+
+    if (_dbc != nullptr) {
+        SQLDisconnect(_dbc);
+        SQLFreeHandle(SQL_HANDLE_DBC, _dbc);
+    }
+
+    if (_env != nullptr) {
+        SQLFreeHandle(SQL_HANDLE_ENV, _env);
+    }
+}
+
+Status ODBCScanner::open() {
+    if (_is_open) {
+        LOG(INFO) << "this scanner already opened";
+        return Status::OK();
+    }
+
+    // Allocate an environment
+    if (SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &_env) != SQL_SUCCESS) {
+        return Status::InternalError("alloc env failed");
+    }
+    // We want ODBC 3 support
+    ODBC_DISPOSE(_env, SQL_HANDLE_ENV, SQLSetEnvAttr(_env, SQL_ATTR_ODBC_VERSION, (void *) SQL_OV_ODBC3, 0), "set env attr");
+    // Allocate a connection handle
+    ODBC_DISPOSE(_env, SQL_HANDLE_ENV, SQLAllocHandle(SQL_HANDLE_DBC, _env, &_dbc), "alloc dbc");
+    // Connect to the Database
+    ODBC_DISPOSE(_dbc, SQL_HANDLE_DBC, SQLDriverConnect(_dbc, NULL, (SQLCHAR*)_connect_string.c_str(), SQL_NTS,
+                           NULL, 0, NULL, SQL_DRIVER_COMPLETE_REQUIRED), "driver connect");
+
+    LOG(INFO) << "connect success:";

Review comment:
       add more detail informaton in log

##########
File path: be/src/exec/odbc_scanner.cpp
##########
@@ -0,0 +1,248 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <boost/algorithm/string.hpp>
+#include <codecvt>
+#include <sqlext.h>
+
+#include "exec/odbc_scanner.h"
+#include "common/logging.h"
+#include "runtime/primitive_type.h"
+
+#define ODBC_DISPOSE(h, ht, x, op) { auto rc = x;\
+                                if (rc != SQL_SUCCESS && rc != SQL_SUCCESS_WITH_INFO) \
+                                { \
+                                    return error_status(op, handle_diagnostic_record(h, ht, rc)); \
+                                } \
+                                if (rc == SQL_ERROR) \
+                                { \
+                                    auto err_msg = std::string("Errro in") + std::string(op); \
+                                    return Status::InternalError(err_msg.c_str()); \
+                                }  \
+                            } \
+
+static constexpr uint32_t SMALL_COLUMN_SIZE_BUFFER = 100;
+// Now we only treat HLL, CHAR, VARCHAR as big column
+static constexpr uint32_t BIG_COLUMN_SIZE_BUFFER = 65535;
+
+namespace doris {
+
+ODBCScanner::ODBCScanner(const ODBCScannerParam& param)
+        : _connect_string(build_connect_string(param)),
+          _type(param.type),
+          _tuple_desc(param.tuple_desc),
+          _is_open(false),
+          _field_num(0),
+          _row_count(0),
+          _env(nullptr),
+          _dbc(nullptr),
+          _stmt(nullptr) {
+}
+
+ODBCScanner::~ODBCScanner() {
+    if (_stmt != nullptr) {
+        SQLFreeHandle(SQL_HANDLE_STMT, _stmt);
+    }
+
+    if (_dbc != nullptr) {
+        SQLDisconnect(_dbc);
+        SQLFreeHandle(SQL_HANDLE_DBC, _dbc);
+    }
+
+    if (_env != nullptr) {
+        SQLFreeHandle(SQL_HANDLE_ENV, _env);
+    }
+}
+
+Status ODBCScanner::open() {
+    if (_is_open) {
+        LOG(INFO) << "this scanner already opened";
+        return Status::OK();
+    }
+
+    // Allocate an environment
+    if (SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &_env) != SQL_SUCCESS) {
+        return Status::InternalError("alloc env failed");
+    }
+    // We want ODBC 3 support
+    ODBC_DISPOSE(_env, SQL_HANDLE_ENV, SQLSetEnvAttr(_env, SQL_ATTR_ODBC_VERSION, (void *) SQL_OV_ODBC3, 0), "set env attr");
+    // Allocate a connection handle
+    ODBC_DISPOSE(_env, SQL_HANDLE_ENV, SQLAllocHandle(SQL_HANDLE_DBC, _env, &_dbc), "alloc dbc");
+    // Connect to the Database
+    ODBC_DISPOSE(_dbc, SQL_HANDLE_DBC, SQLDriverConnect(_dbc, NULL, (SQLCHAR*)_connect_string.c_str(), SQL_NTS,
+                           NULL, 0, NULL, SQL_DRIVER_COMPLETE_REQUIRED), "driver connect");
+
+    LOG(INFO) << "connect success:";
+
+    _is_open = true;
+    return Status::OK();
+}
+
+Status ODBCScanner::query(const std::string& query) {
+    if (!_is_open) {
+        return Status::InternalError( "Query before open.");
+    }
+
+    // Allocate a statement handle
+    ODBC_DISPOSE(_dbc, SQL_HANDLE_DBC, SQLAllocHandle(SQL_HANDLE_STMT, _dbc, &_stmt), "alloc statement");
+
+    ODBC_DISPOSE(_stmt, SQL_HANDLE_STMT, SQLExecDirect(_stmt, (SQLCHAR*)(query.c_str()), SQL_NTS), "exec direct");
+    // How many columns are there */
+    ODBC_DISPOSE(_stmt, SQL_HANDLE_STMT, SQLNumResultCols(_stmt, &_field_num), "count num colomn");
+
+    LOG(INFO) << "execute success:" << query <<  " column count:" << _field_num;
+
+    // check materialize num equeal _field_num

Review comment:
       equeal  -> equal

##########
File path: be/src/exec/odbc_scan_node.cpp
##########
@@ -0,0 +1,270 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "odbc_scan_node.h"
+
+#include <sstream>
+
+#include "exec/text_converter.hpp"
+#include "gen_cpp/PlanNodes_types.h"
+#include "runtime/runtime_state.h"
+#include "runtime/row_batch.h"
+#include "runtime/string_value.h"
+#include "runtime/tuple_row.h"
+#include "util/runtime_profile.h"
+
+namespace doris {
+
+OdbcScanNode::OdbcScanNode(ObjectPool* pool, const TPlanNode& tnode,
+                             const DescriptorTbl& descs)
+        : ScanNode(pool, tnode, descs),
+          _is_init(false),
+          _table_name(tnode.odbc_scan_node.table_name),
+          _tuple_id(tnode.odbc_scan_node.tuple_id),
+          _columns(tnode.odbc_scan_node.columns),
+          _filters(tnode.odbc_scan_node.filters),
+          _tuple_desc(nullptr) {
+}
+
+OdbcScanNode::~OdbcScanNode() {
+}
+
+Status OdbcScanNode::prepare(RuntimeState* state) {
+    VLOG(1) << "OdbcScanNode::Prepare";
+
+    if (_is_init) {
+        return Status::OK();
+    }
+
+    if (NULL == state) {
+        return Status::InternalError("input pointer is NULL.");
+    }
+
+    RETURN_IF_ERROR(ScanNode::prepare(state));
+    // get tuple desc
+    _tuple_desc = state->desc_tbl().get_tuple_descriptor(_tuple_id);
+
+    if (NULL == _tuple_desc) {
+        return Status::InternalError("Failed to get tuple descriptor.");
+    }
+
+    _slot_num = _tuple_desc->slots().size();
+    // get odbc table info
+    const ODBCTableDescriptor* odbc_table =
+            static_cast<const ODBCTableDescriptor*>(_tuple_desc->table_desc());
+
+    if (NULL == odbc_table) {
+        return Status::InternalError("odbc table pointer is NULL.");
+    }
+
+    _odbc_param.host = odbc_table->host();
+    _odbc_param.port = odbc_table->port();
+    _odbc_param.user = odbc_table->user();
+    _odbc_param.passwd = odbc_table->passwd();
+    _odbc_param.db = odbc_table->db();
+    _odbc_param.drivier = odbc_table->driver();
+    _odbc_param.type = odbc_table->type();
+    _odbc_param.tuple_desc = _tuple_desc;
+
+    _odbc_scanner.reset(new (std::nothrow)ODBCScanner(_odbc_param));
+
+    if (_odbc_scanner.get() == nullptr) {
+        return Status::InternalError("new a odbc scanner failed.");
+    }
+
+    _tuple_pool.reset(new(std::nothrow) MemPool(mem_tracker().get()));
+
+    if (_tuple_pool.get() == NULL) {
+        return Status::InternalError("new a mem pool failed.");
+    }
+
+    _text_converter.reset(new(std::nothrow) TextConverter('\\'));
+
+    if (_text_converter.get() == NULL) {
+        return Status::InternalError("new a text convertor failed.");
+    }
+
+    _is_init = true;
+
+    return Status::OK();
+}
+
+Status OdbcScanNode::open(RuntimeState* state) {
+    RETURN_IF_ERROR(ExecNode::open(state));
+    VLOG(1) << "OdbcScanNode::Open";
+
+    if (NULL == state) {
+        return Status::InternalError("input pointer is NULL.");
+    }
+
+    if (!_is_init) {
+        return Status::InternalError("used before initialize.");
+    }
+
+    RETURN_IF_ERROR(exec_debug_action(TExecNodePhase::OPEN));
+    RETURN_IF_CANCELLED(state);
+    SCOPED_TIMER(_runtime_profile->total_time_counter());
+    RETURN_IF_ERROR(_odbc_scanner->open());
+    RETURN_IF_ERROR(_odbc_scanner->query(_table_name, _columns, _filters));
+    // check materialize slot num
+
+    return Status::OK();
+}
+
+Status OdbcScanNode::write_text_slot(char* value, int value_length,
+                                      SlotDescriptor* slot, RuntimeState* state) {
+    if (!_text_converter->write_slot(slot, _tuple, value, value_length,
+                                     true, false, _tuple_pool.get())) {
+        std::stringstream ss;
+        ss << "fail to convert odbc value '" << value << "' TO " << slot->type();
+        return Status::InternalError(ss.str());
+    }
+
+    return Status::OK();
+}
+
+Status OdbcScanNode::get_next(RuntimeState* state, RowBatch* row_batch, bool* eos) {
+    VLOG(1) << "OdbcScanNode::GetNext";
+
+    if (NULL == state || NULL == row_batch || NULL == eos) {
+        return Status::InternalError("input is NULL pointer");
+    }
+
+    if (!_is_init) {
+        return Status::InternalError("used before initialize.");
+    }
+
+    RETURN_IF_ERROR(exec_debug_action(TExecNodePhase::GETNEXT));
+    RETURN_IF_CANCELLED(state);
+    SCOPED_TIMER(_runtime_profile->total_time_counter());
+    SCOPED_TIMER(materialize_tuple_timer());
+
+    if (reached_limit()) {
+        *eos = true;
+        return Status::OK();
+    }
+
+    // create new tuple buffer for row_batch
+    int tuple_buffer_size = row_batch->capacity() * _tuple_desc->byte_size();
+    void* tuple_buffer = _tuple_pool->allocate(tuple_buffer_size);
+
+    if (NULL == tuple_buffer) {
+        return Status::InternalError("Allocate memory failed.");
+    }
+
+    _tuple = reinterpret_cast<Tuple*>(tuple_buffer);
+    // Indicates whether there are more rows to process. Set in _hbase_scanner.next().

Review comment:
       _hbase_scanner?

##########
File path: be/src/exec/odbc_scanner.cpp
##########
@@ -0,0 +1,248 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <boost/algorithm/string.hpp>
+#include <codecvt>
+#include <sqlext.h>
+
+#include "exec/odbc_scanner.h"
+#include "common/logging.h"
+#include "runtime/primitive_type.h"
+
+#define ODBC_DISPOSE(h, ht, x, op) { auto rc = x;\
+                                if (rc != SQL_SUCCESS && rc != SQL_SUCCESS_WITH_INFO) \
+                                { \
+                                    return error_status(op, handle_diagnostic_record(h, ht, rc)); \
+                                } \
+                                if (rc == SQL_ERROR) \
+                                { \
+                                    auto err_msg = std::string("Errro in") + std::string(op); \
+                                    return Status::InternalError(err_msg.c_str()); \
+                                }  \
+                            } \
+
+static constexpr uint32_t SMALL_COLUMN_SIZE_BUFFER = 100;
+// Now we only treat HLL, CHAR, VARCHAR as big column
+static constexpr uint32_t BIG_COLUMN_SIZE_BUFFER = 65535;
+
+namespace doris {
+
+ODBCScanner::ODBCScanner(const ODBCScannerParam& param)
+        : _connect_string(build_connect_string(param)),
+          _type(param.type),
+          _tuple_desc(param.tuple_desc),
+          _is_open(false),
+          _field_num(0),
+          _row_count(0),
+          _env(nullptr),
+          _dbc(nullptr),
+          _stmt(nullptr) {
+}
+
+ODBCScanner::~ODBCScanner() {
+    if (_stmt != nullptr) {
+        SQLFreeHandle(SQL_HANDLE_STMT, _stmt);
+    }
+
+    if (_dbc != nullptr) {
+        SQLDisconnect(_dbc);
+        SQLFreeHandle(SQL_HANDLE_DBC, _dbc);
+    }
+
+    if (_env != nullptr) {
+        SQLFreeHandle(SQL_HANDLE_ENV, _env);
+    }
+}
+
+Status ODBCScanner::open() {
+    if (_is_open) {
+        LOG(INFO) << "this scanner already opened";
+        return Status::OK();
+    }
+
+    // Allocate an environment
+    if (SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &_env) != SQL_SUCCESS) {
+        return Status::InternalError("alloc env failed");
+    }
+    // We want ODBC 3 support
+    ODBC_DISPOSE(_env, SQL_HANDLE_ENV, SQLSetEnvAttr(_env, SQL_ATTR_ODBC_VERSION, (void *) SQL_OV_ODBC3, 0), "set env attr");
+    // Allocate a connection handle
+    ODBC_DISPOSE(_env, SQL_HANDLE_ENV, SQLAllocHandle(SQL_HANDLE_DBC, _env, &_dbc), "alloc dbc");
+    // Connect to the Database
+    ODBC_DISPOSE(_dbc, SQL_HANDLE_DBC, SQLDriverConnect(_dbc, NULL, (SQLCHAR*)_connect_string.c_str(), SQL_NTS,
+                           NULL, 0, NULL, SQL_DRIVER_COMPLETE_REQUIRED), "driver connect");
+
+    LOG(INFO) << "connect success:";
+
+    _is_open = true;
+    return Status::OK();
+}
+
+Status ODBCScanner::query(const std::string& query) {
+    if (!_is_open) {
+        return Status::InternalError( "Query before open.");
+    }
+
+    // Allocate a statement handle
+    ODBC_DISPOSE(_dbc, SQL_HANDLE_DBC, SQLAllocHandle(SQL_HANDLE_STMT, _dbc, &_stmt), "alloc statement");
+
+    ODBC_DISPOSE(_stmt, SQL_HANDLE_STMT, SQLExecDirect(_stmt, (SQLCHAR*)(query.c_str()), SQL_NTS), "exec direct");
+    // How many columns are there */
+    ODBC_DISPOSE(_stmt, SQL_HANDLE_STMT, SQLNumResultCols(_stmt, &_field_num), "count num colomn");
+
+    LOG(INFO) << "execute success:" << query <<  " column count:" << _field_num;
+
+    // check materialize num equeal _field_num
+    int materialize_num = 0;
+    for (int i = 0; i < _tuple_desc->slots().size(); ++i) {
+        if (_tuple_desc->slots()[i]->is_materialized()) {
+            materialize_num++;
+        }
+    }
+    if (_field_num != materialize_num) {
+        return Status::InternalError("input and output not equal.");
+    }
+
+    // allocate memory for the binding
+    for (int i = 0 ; i < _field_num ; i++ ) {
+        DataBinding* column_data = new DataBinding;
+        column_data->target_type = SQL_C_CHAR;
+        auto type = _tuple_desc->slots()[i]->type().type;
+        column_data->buffer_length = (type == TYPE_HLL || type == TYPE_CHAR || type == TYPE_VARCHAR) ? BIG_COLUMN_SIZE_BUFFER :
+                SMALL_COLUMN_SIZE_BUFFER;
+        column_data->target_value_ptr = malloc(sizeof(char) * column_data->buffer_length);
+        _columns_data.push_back(column_data);
+    }
+
+    // setup the binding
+    for (int i = 0 ; i < _field_num ; i++ ) {
+        ODBC_DISPOSE(_stmt, SQL_HANDLE_STMT, SQLBindCol(_stmt, (SQLUSMALLINT)i + 1, _columns_data[i].target_type,
+                              _columns_data[i].target_value_ptr, _columns_data[i].buffer_length, &(_columns_data[i].strlen_or_ind)), "bind col");
+    }
+
+    return Status::OK();
+}
+
+Status ODBCScanner::query(const std::string& table, const std::vector<std::string>& fields,
+                       const std::vector<std::string>& filters) {
+    if (!_is_open) {
+        return Status::InternalError("Query before open.");
+    }
+
+    _sql_str = "SELECT ";
+
+    for (int i = 0; i < fields.size(); ++i) {
+        if (0 != i) {
+            _sql_str += ",";
+        }
+
+        _sql_str += fields[i];
+    }
+
+    _sql_str += " FROM " + table;
+
+    if (!filters.empty()) {
+        _sql_str += " WHERE ";
+
+        for (int i = 0; i < filters.size(); ++i) {
+            if (0 != i) {
+                _sql_str += " AND";
+            }
+
+            _sql_str += " (" + filters[i] + ") ";
+        }
+    }
+
+    return query(_sql_str);
+}
+
+Status ODBCScanner::get_next_row(bool* eos) {
+    if (!_is_open) {
+        return Status::InternalError("GetNextRow before open.");
+    }
+
+    auto ret = SQLFetch(_stmt);
+    if (ret == SQL_SUCCESS || ret == SQL_SUCCESS_WITH_INFO) {
+        return Status::OK();
+    } else if (ret != SQL_NO_DATA_FOUND) {
+        return error_status("result fetch", handle_diagnostic_record(_stmt, SQL_HANDLE_STMT, ret));
+    }
+
+    *eos = true;
+    return Status::OK();
+}
+
+Status ODBCScanner::error_status(const std::string& prefix, const std::string& error_msg) {
+    std::stringstream msg;
+    msg << prefix << " Err: " << error_msg;
+    LOG(WARNING) << msg.str();
+    return Status::InternalError(msg.str());
+}
+
+std::string ODBCScanner::handle_diagnostic_record(SQLHANDLE      hHandle,

Review comment:
       add some comments

##########
File path: docs/zh-CN/extending-doris/odbc-of-doris.md
##########
@@ -0,0 +1,172 @@
+---
+{
+    "title": "ODBC of Doris",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# ODBC External Table Of Doris
+
+ODBC External Table Of Doris 提供了Doris通过数据库访问的标准接口(ODBC)来访问外部表,外部表省去了繁琐的数据导入工作,让Doris可以具有了访问各式数据库的能力,并借助Doris本身的OLAP的能力来解决外部表的数据分析问题:
+ 
+ 1. 支持各种数据源接入Doris
+ 2. 支持Doris与各种数据源中的表联合查询,进行更加复杂的分析操作
+
+本文档主要介绍该功能的实现原理、使用方式等。
+
+## 名词解释
+
+### Doirs相关
+* FE:Frontend,Doris 的前端节点,负责元数据管理和请求接入
+* BE:Backend,Doris 的后端节点,负责查询执行和数据存储
+
+## 使用方法
+
+### Doris中创建ODBC的外表
+
+```
+CREATE EXTERNAL TABLE `baseall_oracle` (
+  `k1` decimal(9, 3) NOT NULL COMMENT "",
+  `k2` char(10) NOT NULL COMMENT "",
+  `k3` datetime NOT NULL COMMENT "",
+  `k5` varchar(20) NOT NULL COMMENT "",
+  `k6` double NOT NULL COMMENT ""
+) ENGINE=ODBC
+COMMENT "ODBC"
+PROPERTIES (
+"host" = "192.168.0.1",
+"port" = "8086",
+"user" = "test",
+"password" = "test",
+"database" = "test",
+"table" = "baseall",
+"driver" = "Oracle 19 ODBC driver",
+"type" = "oracle"
+);
+```
+
+参数说明:
+
+参数 | 说明
+---|---
+**hosts** | 外表数据库的IP地址
+**driver** | ODBC外表的Driver名,该名字需要和be/conf/odbcinst.ini中的Driver名一致。
+**type** | 外表数据库的类型,当前支持oracle与mysql
+**user** | 外表数据库的用户名
+**password** | 对应用户的密码信息
+
+
+##### ODBC Driver的安装和配置
+各大主流数据库都会提供ODBC的访问Driver,用户可以执行参照参照各数据库官方推荐的方式安装对应的ODBC Driver LiB库。
+
+
+安装完成之后,查找对应的数据库的Driver Lib库的路径,并且修改be/conf/odbcinst.ini的配置:
+```
+[MySQL Driver]
+Description     = ODBC for MySQL
+Driver          = /usr/lib64/libmyodbc8w.so
+FileUsage       = 1 
+```
+* 上述配置`[]`里的对应的是Driver名,在建立外部表时需要保持外部表的Driver名和配置文件之中的一致。
+* `Driver=`  这个要根据实际BE安装Driver的路径来填写,本质上就是一个动态库的路径,这里需要保证该动态库的前置依赖都被满足。
+
+**切记,这里要求所有的BE节点都安装上相同的Driver,并且安装路径相同,同时有相同的be/conf/odbcinst.ini的配置。**
+
+
+### 查询用法
+
+完成在Doris中建立ODBC外表后,除了无法使用Doris中的数据模型(rollup、预聚合、物化视图等)外,与普通的Doris表并无区别
+
+
+```
+select * from oracle_table where k1 > 1000 and k3 ='term' or k4 like '%doris'
+```
+
+
+
+## 类型匹配
+
+各个数据之间数据类型存在不同,这里列出了各个数据库中的类型和Doris之中数据类型匹配的情况。
+
+### MySQL类型
+
+|  MySQL  | Doris  |             替换方案              |
+| :------: | :----: | :-------------------------------: |
+|  BOOLEAN  | BOOLEAN  |                         |
+|   CHAR   |  CHAR  |            当前仅支持UTF8编码            |
+| VARCHAR | VARCHAR |       当前仅支持UTF8编码       |
+|   DATE   |  DATE  |                                   |
+|  FLOAT   |  FLOAT  |                                   |
+|   TINYINT   | TINYINT |  |
+|   SMALLINT  | SMALLINT |  |
+|   INT  | INT |  |
+|   BIGINT  | BIGINT |  |
+|   DOUBLE  | DOUBLE |  |
+|   DATE  | DATE |  |
+|   DECIMAL  | DECIMAL |  |
+

Review comment:
       hll data type?
   datetime data type?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman merged pull request #4438: [ODBC SCAN NODE] 4/4 Add ODBC_SCAN_NODE and Odbc_Scanner in BE and add ODBC_SCAN_NODE docs

Posted by GitBox <gi...@apache.org>.
morningman merged pull request #4438:
URL: https://github.com/apache/incubator-doris/pull/4438


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman merged pull request #4438: [ODBC SCAN NODE] 4/4 Add ODBC_SCAN_NODE and Odbc_Scanner in BE and add ODBC_SCAN_NODE docs

Posted by GitBox <gi...@apache.org>.
morningman merged pull request #4438:
URL: https://github.com/apache/incubator-doris/pull/4438


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #4438: [ODBC SCAN NODE] 4/4 Add ODBC_SCAN_NODE and Odbc_Scanner in BE and add ODBC_SCAN_NODE docs

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #4438:
URL: https://github.com/apache/incubator-doris/pull/4438#discussion_r492513438



##########
File path: be/src/exec/odbc_scan_node.h
##########
@@ -0,0 +1,96 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef  DORIS_BE_SRC_QUERY_EXEC_ODBC_SCAN_NODE_H
+#define  DORIS_BE_SRC_QUERY_EXEC_ODBC_SCAN_NODE_H
+
+#include <memory>
+
+#include "runtime/descriptors.h"
+#include "exec/scan_node.h"
+#include "exec/odbc_scanner.h"
+
+namespace doris {
+
+class TextConverter;
+class Tuple;
+class TupleDescriptor;
+class RuntimeState;
+class MemPool;
+class Status;
+
+class OdbcScanNode : public ScanNode {
+public:
+    OdbcScanNode(ObjectPool* pool, const TPlanNode& tnode, const DescriptorTbl& descs);
+    ~OdbcScanNode();
+
+    // initialize _mysql_scanner, and create _text_converter.
+    virtual Status prepare(RuntimeState* state);
+
+    // Start MySQL scan using _mysql_scanner.
+    virtual Status open(RuntimeState* state);
+
+    // Fill the next row batch by calling next() on the _mysql_scanner,
+    // converting text data in MySQL cells to binary data.
+    virtual Status get_next(RuntimeState* state, RowBatch* row_batch, bool* eos);
+
+    // Close the _mysql_scanner, and report errors.

Review comment:
       Modify the comment

##########
File path: be/src/exec/odbc_scanner.cpp
##########
@@ -0,0 +1,257 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <boost/algorithm/string.hpp>
+#include <codecvt>
+#include <sqlext.h>
+
+#include "exec/odbc_scanner.h"
+#include "common/logging.h"
+#include "runtime/primitive_type.h"
+
+#define ODBC_DISPOSE(h, ht, x, op) { auto rc = x;\
+                                if (rc != SQL_SUCCESS && rc != SQL_SUCCESS_WITH_INFO) \
+                                { \
+                                    return error_status(op, handle_diagnostic_record(h, ht, rc)); \
+                                } \
+                                if (rc == SQL_ERROR) \
+                                { \
+                                    auto err_msg = std::string("Errro in") + std::string(op); \
+                                    return Status::InternalError(err_msg.c_str()); \
+                                }  \
+                            } \
+
+static constexpr uint32_t SMALL_COLUMN_SIZE_BUFFER = 100;
+// Now we only treat HLL, CHAR, VARCHAR as big column
+static constexpr uint32_t BIG_COLUMN_SIZE_BUFFER = 65535;
+
+static std::u16string utf8_to_wstring (const std::string& str)

Review comment:
       ```suggestion
   static std::u16string utf8_to_wstring(const std::string& str)
   ```

##########
File path: docs/zh-CN/extending-doris/odbc-of-doris.md
##########
@@ -0,0 +1,217 @@
+---
+{
+    "title": "ODBC of Doris",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# ODBC External Table Of Doris
+
+ODBC External Table Of Doris 提供了Doris通过数据库访问的标准接口(ODBC)来访问外部表,外部表省去了繁琐的数据导入工作,让Doris可以具有了访问各式数据库的能力,并借助Doris本身的OLAP的能力来解决外部表的数据分析问题:
+
+ 1. 支持各种数据源接入Doris
+ 2. 支持Doris与各种数据源中的表联合查询,进行更加复杂的分析操作
+
+本文档主要介绍该功能的实现原理、使用方式等。
+
+## 名词解释
+
+### Doirs相关
+* FE:Frontend,Doris 的前端节点,负责元数据管理和请求接入
+* BE:Backend,Doris 的后端节点,负责查询执行和数据存储
+
+## 使用方法
+
+### Doris中创建ODBC的外表
+
+#### 1. 不使用Resource创建ODBC的外表
+
+```
+CREATE EXTERNAL TABLE `baseall_oracle` (
+  `k1` decimal(9, 3) NOT NULL COMMENT "",
+  `k2` char(10) NOT NULL COMMENT "",
+  `k3` datetime NOT NULL COMMENT "",
+  `k5` varchar(20) NOT NULL COMMENT "",
+  `k6` double NOT NULL COMMENT ""
+) ENGINE=ODBC
+COMMENT "ODBC"
+PROPERTIES (
+"host" = "192.168.0.1",
+"port" = "8086",
+"user" = "test",
+"password" = "test",
+"database" = "test",
+"table" = "baseall",
+"driver" = "Oracle 19 ODBC driver",
+"odbc_type" = "oracle"
+);
+```
+
+#### 2. 通过ODBC_Resource来创建ODBC外表 (推荐使用的方式)
+```
+create external resource "oracle_odbc"
+    properties 

Review comment:
       Indent




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org