You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2021/02/01 12:05:07 UTC

[GitHub] [incubator-dolphinscheduler] zixi0825 edited a comment on issue #4283: [Feature] Data Quality Design

zixi0825 edited a comment on issue #4283:
URL: https://github.com/apache/incubator-dolphinscheduler/issues/4283#issuecomment-770778660


   ### Development Planning:
   
   Version 1.0
   
   - Data quality type task development, including front end and back end (development completed)
   
       - Automatic generation of rule input items by selecting rules in front-end interface (development completed)
   
       - Provide a variety of detection methods (developed)
   
       - Provide multiple failure strategies (development completed)
       
   - The main responsibility of the executor with spark as the computing engine is to run data quality detection SQL (developed)
   
   - Built in multiple detection rules, including single table null value detection, cross table accuracy detection, cross table value comparison and single table custom SQL detection, etc. (developed)
   
   - Quality inspection results view, including front end and back end (development completed)
   
       - You can view the workflow of the task (development completed)
   
   - Rule management, only support view (development completed)
   
       - Viewable rule definition (development completed)
    
   - Data source only supports JDBC and hive (developed)
   
   
   
   Version 2.0 (Time to be determined)
   
   - Optimize the user experience of front-end input items, introduce metadata management of multiple data sources, select tables and columns, etc. (to be developed)
   
   - Provide custom rule template, support single table rule customization (to be developed)
   
   - New rules modification and deletion (to be developed)
   
   - Support abnormal data export (to be developed)
   
   - Support multiple data source detection, such as file, ES, etc. (to be developed)
   
   - Support to run data quality inspection task independently (to be developed)
   
   ---------------------------------------------------------------------------------------------------------------------------------------
   
   ### 开发计划:
   1.0 版本 (已完成本地开发尚未提PR)
   - 数据质量类型任务开发,包括前端和后端(已完成开发)
      - 实现前端界面选择规则自动生成规则输入项(已完成开发)
      - 提供多种检测方式(已完成开发)
      - 提供多种失败策略(已完成开发)
     
   ![dqs_1](https://user-images.githubusercontent.com/10829956/106438867-02458980-64b2-11eb-900a-e2be48a46f07.gif)
   
   - 以Spark为计算引擎的Executor,主要职责是运行数据质量检测SQL (已完成开发)
   - 内置多种检测规则,包括单表空值检测、跨表准确性检测、跨表值比对和单表自定义SQL检测等等(已完成开发)
   - 质量检测结果查看,包括前端和后端(已完成开发)
     - 可查看任务所在工作流(已完成开发)
     
   ![dqs_2](https://user-images.githubusercontent.com/10829956/106441978-b1379480-64b5-11eb-8ea2-495a1b0bd873.gif)
   
   - 规则管理,仅支持查看(已完成开发)
     - 可查看规则定义(已完成开发)
     
   ![dqs_3](https://user-images.githubusercontent.com/10829956/106441997-b72d7580-64b5-11eb-9fd9-750db0bd69a9.gif)
   
   - 数据源仅支持JDBC和HIVE(已完成开发)
   
   2.0 版本 (时间待定)
   - 优化前端用户体验,包括输入项优化,引入多种数据源的元数据管理,选择表和列等(待开发)
   - 提供自定义规则模板,支持单表规则的自定义(待开发)
   - 新增规则修改和删除(待开发)
   - 支持异常数据导出(待开发)
   - 支持多种数据源检测,例如文件、ES等(待开发)
   - 支持单独运行数据质量检测任务(待开发)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org