You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@datafu.apache.org by "Eyal Allweil (JIRA)" <ji...@apache.org> on 2016/03/24 16:29:25 UTC
[jira] [Created] (DATAFU-117) New UDF - CountDistinctUpTo
Eyal Allweil created DATAFU-117:
-----------------------------------
Summary: New UDF - CountDistinctUpTo
Key: DATAFU-117
URL: https://issues.apache.org/jira/browse/DATAFU-117
Project: DataFu
Issue Type: New Feature
Reporter: Eyal Allweil
A UDF that counts distinct tuples within a bag, but only up to a preset limit. If the bag contains more distinct tuples than the limit, the UDF returns the limit.
This UDF can run reasonably well even on large bags if the limit chosen is small enough though the count is done in memory.
We use this UDF in PayPal for filtering, when we don't need to use the actual tuples afterward.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)