You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2009/03/25 11:24:57 UTC

[jira] Commented: (PIG-732) Utility UDFs

    [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689069#action_12689069 ] 

Olga Natkovich commented on PIG-732:
------------------------------------

Ankur,

Thanks for contributing UDFs to PiggyBank!

A couple of questions/comments on your patch:

(1) Pig already supports limit operator. Would that serve your needs with TopN or you actually need to project bags of limitted size in foreach?
(2) Filtering UDFs are meant to be used as predicate in filter operators and as such should return Boolean values. I think your TopN should be in evaluation/util group
(3) Each file included needs to have Apache license header. You can just coppy it from one of the other files.




> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.