You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2019/11/21 08:11:03 UTC

[GitHub] [incubator-dolphinscheduler] chongchongzi opened a new issue #1306: Data quality inspection component(数据质量检测组件)

chongchongzi opened a new issue #1306: Data quality inspection component(数据质量检测组件)
URL: https://github.com/apache/incubator-dolphinscheduler/issues/1306
 
 
   Demand:
   
   Data is an important cornerstone of business development decision-making. High quality data is very important for decision-making.
   
   But most of the current data problems have been found by the production operation and business personnel and fed back to the technology for troubleshooting, which will lead to untimely discovery, time-consuming and labor-intensive data inspection, and high labor cost.
   
   Therefore, we hope to achieve a high demand for data monitoring and alarm prompt through a tool.
   
   The data quality detection component is proposed to deal with the above-mentioned scene problems, timely find the data problems and give alarm prompt, so as to realize the automation of data monitoring.
   
   Implementation plan:
   
   1. Workflow: depend on upstream component - > calculation task component - > data quality detection component - > calculation task component - > data quality detection component
   
   2. Function of data quality detection component: query different SQL according to different data sources, check whether the data is null or interrupted, check whether the data does not meet the expected interruption, check whether it is interrupted beyond the historical comparison threshold, send an email alarm, and drop the detection results into the database each time.
   --------------------------------------------------------------------------------------------
   需求:
   数据是业务发展决策的重要基石,高质量的数据对于决策至关重要。 
   但是当下数据出现问题很多时候都是已经上生产了运营和业务人员发现出来反馈给技术去排查,会造成发现不及时,检查数据耗时耗力,人工成本较高。
    所以希望通过一个工具实现对数据问题的监控以及告警提示有较高的诉求。
    数据质量检测组件,就是为应对上述场景问题而提出,及时发现数据问题进行告警提示,实现数据监控的自动化。
   实现方案:
   1、工作流:依赖上游组件->计算任务组件->数据质量检测组件->计算任务组件->数据质量检测组件…
   2、数据质量检测组件功能:根据不同数据源查询不一样的sql,检测数据为空是否中断,检测数据不符合预期是否中断,检测超出历史对比阀值是否中断,发送邮件告警,每次检测结果落库。

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services