You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@griffin.apache.org by "Lionel Liu (JIRA)" <ji...@apache.org> on 2018/01/22 05:55:00 UTC
[jira] [Closed] (GRIFFIN-90) [Measure] Add uniqueness measurement
as a new feature.
[ https://issues.apache.org/jira/browse/GRIFFIN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lionel Liu closed GRIFFIN-90.
-----------------------------
Resolution: Done
Fix Version/s: 0.2.0-incubating
Have supported uniqueness and distinctness features.
Uniqueness: the unique records count, which indicates the records occurs exactly once.
Distinctness: the distinct records count, which indicates the pure information in data
There're some differences between them, especially in streaming mode.
For example, in batch mode, we have such data:
1, 1, 2, 2, 3, 4, 5
Uniqueness metric is \{ total: 7, unique: 3 }
Distinctness metric is \{ total: 7, dist: 5 }
Another streaming example, we have such data in every minutes:
1 min: 1, 1, 2, 2, 3, 4
2 min: 2, 2, 3, 3, 5, 5
Uniqueness metrics of the 2 minutes are:
1 min: \{ total: 6, unique: 2 }
2 min: \{ total: 6, unique: 0 }
Distinctness metrics of the 2 minutes are:
1 min: \{ total: 6, dist: 4 }
2 min: \{ total: 6, dist: 1 }
> [Measure] Add uniqueness measurement as a new feature.
> ------------------------------------------------------
>
> Key: GRIFFIN-90
> URL: https://issues.apache.org/jira/browse/GRIFFIN-90
> Project: Griffin (Incubating)
> Issue Type: Improvement
> Reporter: Lionel Liu
> Assignee: Lionel Liu
> Priority: Major
> Fix For: 0.2.0-incubating
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> DoD: uniqueness measurement enable. with test case for batch and streaming
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)