You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@griffin.apache.org by "Lionel Liu (JIRA)" <ji...@apache.org> on 2018/01/22 05:55:00 UTC

[jira] [Closed] (GRIFFIN-90) [Measure] Add uniqueness measurement as a new feature.

     [ https://issues.apache.org/jira/browse/GRIFFIN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lionel Liu closed GRIFFIN-90.
-----------------------------
       Resolution: Done
    Fix Version/s: 0.2.0-incubating

Have supported uniqueness and distinctness features.

 

Uniqueness: the unique records count, which indicates the records occurs exactly once.

Distinctness: the distinct records count, which indicates the pure information in data

There're some differences between them, especially in streaming mode.

 

For example, in batch mode, we have such data:

1, 1, 2, 2, 3, 4, 5

Uniqueness metric is \{ total: 7, unique: 3 }

Distinctness metric is \{ total: 7, dist: 5 }

 

Another streaming example, we have such data in every minutes:

1 min: 1, 1, 2, 2, 3, 4 

2 min: 2, 2, 3, 3, 5, 5

Uniqueness metrics of the 2 minutes are:

1 min: \{ total: 6, unique: 2 }

2 min: \{ total: 6, unique: 0 }

Distinctness metrics of the 2 minutes are:

1 min: \{ total: 6, dist: 4 }

2 min: \{ total: 6, dist: 1 }

> [Measure] Add uniqueness measurement as a new feature.
> ------------------------------------------------------
>
>                 Key: GRIFFIN-90
>                 URL: https://issues.apache.org/jira/browse/GRIFFIN-90
>             Project: Griffin (Incubating)
>          Issue Type: Improvement
>            Reporter: Lionel Liu
>            Assignee: Lionel Liu
>            Priority: Major
>             Fix For: 0.2.0-incubating
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> DoD: uniqueness measurement enable. with test case for batch and streaming



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)