You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@linkis.apache.org by leo jie <le...@gmail.com> on 2022/04/27 12:59:25 UTC
JDBC engine multi-data source switching function implementation

1. JDBC引擎多数据源切换功能实现

背景：通过执行JDBC代码时指定数据源标识，实现JDBC可以获取不同的数据源配置，以支持JDBC多数据源切换

问题1: JDBC引擎启动过程中，对数据源配置的加载逻辑在哪

在DefaultEngineCreateService类中的方法createEngine内，包含这句逻辑
val (resourceTicketId, resource) = requestResource(engineCreateRequest,
labelFilter.choseEngineLabel(labelList), emNode, timeout)
在requestResource方法内，会组装管理台中用户对引擎配置的启动参数，放到properties中，
针对JDBC引擎为例，可以在此插入这样的逻辑嘛？如果用户提交脚本时，指定了一个数据源名称，就去拿这个数据源名称对应的引擎配置参数，来替换放到properties中，
JDBC的Executor拿到引擎配置，对应地连接不同的JDBC服务。

解答：
jdbc参数是在ec端拿的，截图如下：
[image: image.png]

并发引擎如jdbc、presto拿取的是运行时参数加实时拉取的参数，如果后续改为数据源的话，也建议修改为改这里。这样就可以支持多个jdbc 的连接

问题2: 脚本执行时，数据源切换标识加在哪里比较合适？
用户请求/entrance/execute或/entrance/submit，

%data_source_1
select * from table1

%data_source_2
select * from table2

类似VarSubstitutionInterceptor拦截器，增加数据源标识的拦截器，解析到数据源标识data_source_1或data_source_2之后，
把这个标识数据放在哪里合适好呢？jobRequest的labels中 还是 params中？放置好之后可以往下传递 然后被各种Request所感知到

解答：
放到运行时参数里面比较好的，因为这个是运行时的数据源参数，之前有预留datasource这个Key的。
[image: image.png]






1. JDBC engine multi-data source switching function implementation

Background: By specifying the data source identifier when executing JDBC
code, JDBC can obtain different data source configurations to support JDBC
multi-data source switching

Question 1: Where is the loading logic for the data source configuration
during the startup of the JDBC engine?

In the method createEngine in the DefaultEngineCreateService class, this
logic is included
val (resourceTicketId, resource) = requestResource(engineCreateRequest,
labelFilter.choseEngineLabel(labelList), emNode, timeout)
In the requestResource method, the startup parameters configured by the
user for the engine in the management console will be assembled and placed
in the properties.
Taking the JDBC engine as an example, can such logic be inserted here? If
the user specifies a data source name when submitting the script, take the
engine configuration parameter corresponding to the data source name and
replace it in the properties.
The JDBC Executor gets the engine configuration and connects to different
JDBC services accordingly.

answer:
The jdbc parameters are taken on the ec side. The screenshot is as follows:

Concurrency engines such as jdbc and presto take runtime parameters plus
real-time pull parameters. If it is changed to a data source later, it is
also recommended to change it to here. This can support multiple jdbc
connections

[image: image.png]

Question 2: When the script is executed, where is the appropriate place to
add the data source switch flag?
User requests /entrance/execute or /entrance/submit,

%data_source_1
select * from table1

%data_source_2
select * from table2

Similar to the VarSubstitutionInterceptor interceptor, the interceptor of
the data source identifier is added, and after parsing to the data source
identifier data_source_1 or data_source_2,
Where is the appropriate place to put this identification data? In the
labels of the jobRequest or in the params? After it is placed, it can be
passed down and then perceived by various Requests

answer:
It is better to put it in the runtime parameters, because this is the data
source parameter at runtime, and the key of datasource was reserved before.

[image: image.png]