You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@linkis.apache.org by Chen Xia <ca...@apache.org> on 2023/03/12 13:24:02 UTC

[DISCUSS] Error Code Module Refactoring

Current issues:

- The definition of the historical error code value is relatively
random, the readability needs to be improved, and there are
duplication problems, the specification needs to be determined, and
the stock needs to be optimized
- If the RPC call link is long, the root error will be lost. When
locating the problem, it needs to be checked layer by layer, which is
inconvenient to locate the problem. On the premise that trace is not
introduced, it is necessary to consider how to conveniently locate log
problems and locate exceptions to multi-level calls
- There will be more complex calling relationships among wds
ecological components. As linkis is the base of basic components, it
is necessary to consider how to provide a more general error code
module
- -----
当前存在的问题
- 历史错误码码值定义的比较随意,可读性需要提升,且存在重复问题,规范需要确定,存量的需要优化
- RPC调用链路长会丢失根异常,定位问题时,需要一层一层排查,不方便问题定位。在未引入trace的前提下
,需要考虑如何方便的进行日志问题定位, 多层级调用的异常定位
- wds生态组件之间 会存在比较复杂的调用关系,linkis作为基础组件的基座,需要考虑如何提供比较通用的错误码模块


### Description

Achieved Goal
- It is necessary to formulate a set of error code specifications to
meet the usage scenarios of WDS ecological components, and a set of
general and usable error code modules, which can be used by other
components in the form of jar packages.
- Able to achieve abnormal services that can clearly perceive the root
cause through the abnormal information of the interface
- The exception information supplements the service label, which
identifies the root component and service that throws the exception,
such as DSS-ProjectServer(ip:host), Linkis-SparkEC(ip:host),
Hadoop-HDFS, Spark-Job(app_1111-job1); And error type labels, such as
user initialization error, user input verification error, user
permission error, DSS service exception, Linkis component exception,
underlying computing and storage exception, exception caused by
change, etc.


-----
实现的目标
- 需要制定一套错误码规范 满足WDS生态组件的使用场景 ,一套通用的可服用的错误码模块,以jar包方式 供其他组件使用。
- 能够达到 通过接口异常信息,能比较清晰的感知到根因的异常服务
- 异常信息补充服务标签,标识出抛出异常的根组件和服务,如DSS-ProjectServer(ip:host)、Linkis-SparkEC(ip:host)、Hadoop-HDFS、Spark-Job(app_1111-job1);以及错误类型标签,如用户初始化错误、用户输入校验错误、用户权限错误、DSS服务异常、Linkis组件异常、底层计算存储异常、变更中导致异常等,


Best Regards!
Chen Xia.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@linkis.apache.org
For additional commands, e-mail: dev-help@linkis.apache.org