You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@griffin.apache.org by "Zhao Li (Jira)" <ji...@apache.org> on 2019/09/11 12:19:00 UTC
[jira] [Created] (GRIFFIN-289) new feature for griffin COMPLETENESS
dq type
Zhao Li created GRIFFIN-289:
-------------------------------
Summary: new feature for griffin COMPLETENESS dq type
Key: GRIFFIN-289
URL: https://issues.apache.org/jira/browse/GRIFFIN-289
Project: Griffin
Issue Type: New Feature
Components: completeness-batch
Affects Versions: 0.3.1-incubating
Reporter: Zhao Li
Hello
Now we use griffin measure module to check batch data quality. In COMPLETENESS dq type, griffin checks how many incomplete records in table, and griffin only check if one column is 'null' or not.
However, only "null" is not enough to consider whether one column is invalid or not. In our condition, analysts may consider other value is invalid even though they are not "null". For example, one column named "company", if company in ("a", "b", "c"), this record is invalid.
Here we need two ways for user to filter incomplete record, one is "enumeration", users write all invalid values they think for one column; the other is "regular expression", users write regular expression to match invalid values for one column.
Could griffin updates COMPLETENESS dq type to support our "enumeration" and "regular expression" way to filter incomplete records?
Regards
Zhao
--
This message was sent by Atlassian Jira
(v8.3.2#803003)