You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Ankur (JIRA)" <ji...@apache.org> on 2009/03/25 10:50:53 UTC

[jira] Created: (PIG-732) Utility UDFs

Utility UDFs 
-------------

                 Key: PIG-732
                 URL: https://issues.apache.org/jira/browse/PIG-732
             Project: Pig
          Issue Type: New Feature
            Reporter: Ankur
            Priority: Minor
         Attachments: udf.v1.patch

Two utility UDFs and their respective test cases.

1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.

2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-732) Utility UDFs

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-732:
----------------------

    Attachment: udf.v3.patch

> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch, udf.v2.patch, udf.v3.patch, udf.v3.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-732) Utility UDFs

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-732:
-------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Patch committed. Thanks, Ankur for contributing.

> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch, udf.v2.patch, udf.v3.patch, udf.v4.patch, udf.v5.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-732) Utility UDFs

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689153#action_12689153 ] 

Olga Natkovich commented on PIG-732:
------------------------------------

Ankur,

Couple of additional comments:

(1) Top N

- You assume that you are getting data in as bytearrays (for n and fieldNum. It would be better if you assume the actual types (int) andlet Pig to do conversion for you because then your function will be able to work with data of different types. You do that by adding getArgToFuncMapping function. You can see the examples in other functions in the repository and also explanation of usage in the UDF manual. This is also applicable for your second UDF.
- In the exec function, you check for 2 elements in the tuple but you are accessing
- Looks like if you inserted too many elements you will be throwing away the head of the queue. Is that what you want? 
- You are not specifying tuple structure in your schema definition. This could be an issue for some of your queries. 




> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch, udf.v2.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-732) Utility UDFs

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-732:
----------------------

    Status: Patch Available  (was: Open)

> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch, udf.v2.patch, udf.v3.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-732) Utility UDFs

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-732:
----------------------

    Attachment: udf.v5.patch

Minor issue in test case, causing test failure. Fixed in latest upload - udf.v5.patch. Also changed TopN to Top. Should be good to go now.

> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch, udf.v2.patch, udf.v3.patch, udf.v4.patch, udf.v5.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-732) Utility UDFs

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-732:
----------------------

    Attachment: udf.v4.patch

Incorporated the changes in SearchQuery.java as per suggestions.

> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch, udf.v2.patch, udf.v3.patch, udf.v4.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-732) Utility UDFs

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-732:
----------------------

    Attachment:     (was: udf.v3.patch)

> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch, udf.v2.patch, udf.v3.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-732) Utility UDFs

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689069#action_12689069 ] 

Olga Natkovich commented on PIG-732:
------------------------------------

Ankur,

Thanks for contributing UDFs to PiggyBank!

A couple of questions/comments on your patch:

(1) Pig already supports limit operator. Would that serve your needs with TopN or you actually need to project bags of limitted size in foreach?
(2) Filtering UDFs are meant to be used as predicate in filter operators and as such should return Boolean values. I think your TopN should be in evaluation/util group
(3) Each file included needs to have Apache license header. You can just coppy it from one of the other files.




> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-732) Utility UDFs

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-732:
----------------------

    Attachment: udf.v1.patch

Since the UDFs are quite small, I combined them in a single patch instead of opening up a separate jira for each UDF. However if people believe having separate jira for each will help then I can split this up into 2.

> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-732) Utility UDFs

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-732:
---------------------------

    Fix Version/s: 0.3.0

> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.3.0
>
>         Attachments: udf.v1.patch, udf.v2.patch, udf.v3.patch, udf.v4.patch, udf.v5.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-732) Utility UDFs

Posted by "Ankur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696404#action_12696404 ] 

Ankur commented on PIG-732:
---------------------------

Hi Olga, can you please take a look and suggest what's wrong?

> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch, udf.v2.patch, udf.v3.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-732) Utility UDFs

Posted by "Ankur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704014#action_12704014 ] 

Ankur commented on PIG-732:
---------------------------

If there aren't any other issues then can we go ahead and commit these ?

> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch, udf.v2.patch, udf.v3.patch, udf.v4.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-732) Utility UDFs

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates reassigned PIG-732:
------------------------------

    Assignee: Ankur

> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Assignee: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch, udf.v2.patch, udf.v3.patch, udf.v4.patch, udf.v5.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-732) Utility UDFs

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704426#action_12704426 ] 

Olga Natkovich commented on PIG-732:
------------------------------------

Ankur, thanks for your contribution and sorry for the delay.

I integrated your patch and and ran the tests. I see one failure in the unit tests:

 [junit] Test org.apache.pig.piggybank.test.evaluation.util.TestTopN FAILED

The log file contains the following:

Testsuite: org.apache.pig.piggybank.test.evaluation.util.TestTopN
Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.17 sec

Testcase: testTopN took 0.152 sec
    Caused an ERROR
General Exception executing function: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
java.lang.RuntimeException: General Exception executing function: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
    at org.apache.pig.piggybank.evaluation.util.TopN.exec(TopN.java:112)
    at org.apache.pig.piggybank.test.evaluation.util.TestTopN.testTopN(Unknown Source)


> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch, udf.v2.patch, udf.v3.patch, udf.v4.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-732) Utility UDFs

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696602#action_12696602 ] 

Olga Natkovich commented on PIG-732:
------------------------------------

Looks like there are issues with patch process and we are looking into it.

Santhosh, could you take a look at how output schema for the UDF is constructed to make sure it is correct, thanks.

> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch, udf.v2.patch, udf.v3.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-732) Utility UDFs

Posted by "Ankur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689104#action_12689104 ] 

Ankur commented on PIG-732:
---------------------------

Olga,
        Thanks for a quick review. 
> (1) Pig already support limit operator ....
I have a relation where I need to group by field-1 and retain top-N occurrences of field-2. So I group by (field-1, field-2), generate counts and flattened tuple of the form (field-1, field2, <count>). Now I again group on field-1 and just retain top-N tuples. So I actually need to project bags of limited size. I don't think this can be done using LIMIT as it is not allowed inside FOREACH.

> (2) Filtering UDFs are meant to be used as ....
Moved TopN and SearchQuery UDFs to  piggyBank/evaluation/util. Also moved the test cases to the appropriate location.

> (3) Each file included needs to have Apache license header ....
Done.



> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch, udf.v2.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-732) Utility UDFs

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696669#action_12696669 ] 

Santhosh Srinivasan commented on PIG-732:
-----------------------------------------

Review comments for the outputSchema method in the UDFs

Index: contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/util/SearchQuery.java
==========================================================================

SearchQuery is returning String. The outputSchema method should return a Schema with a single column of type CHARARRAY. You could use one of the following two approaches:

1. If you wish to call the column query then use the following.
{code}
Schema s = new Schema();
s.add(new Schema.FieldSchema("query", DataType.CHARARRAY));
return s;
{code}

2. If you wish to use an generated name then use the following:
{code}
Schema s = new Schema();
s.add(new Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), input), DataType.CHARARRAY));
return s;
{code}

The relevant portion of the patch is shown below.

{code}
+  @Override
+  public Schema outputSchema(Schema input) {
+    try {
+      Schema s = new Schema();
+      s.add(new Schema.FieldSchema("query", DataType.CHARARRAY));
+      return new Schema(new Schema.FieldSchema(getSchemaName(this.getClass()
+          .getName().toLowerCase(), input), s, DataType.CHARARRAY));
+    } catch (Exception e) {
+      return null;
+    }
+  }
+}
{code}

> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch, udf.v2.patch, udf.v3.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-732) Utility UDFs

Posted by "Ankur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694036#action_12694036 ] 

Ankur commented on PIG-732:
---------------------------

Not sure why the core tests fail. The automated test output don't seem to suggest anything patch specific or may be I am unable to spot in the console output.

> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch, udf.v2.patch, udf.v3.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-732) Utility UDFs

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-732:
----------------------

    Attachment: udf.v3.patch

> You assume that your are getting ....
1. Fixed. Implemented getArgToFuncMapping() as suggested.

> In the exec function, you check for 2 elements ....
2. Fixed. Changed to check for 3 arguments.

> Looks like if you inserted .....
Yep that is the desired behavior.  The need is to keep top-N tuples and throw away the remaining. The head of the queue in this case would be the minimum element.

> You are not specifying tuple structure....
The output tuple structure is really based upon the tuple structure in the input bag (field(2)). I have changed this slightly as per my understanding. If this can be better written, please suggest.

> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch, udf.v2.patch, udf.v3.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-732) Utility UDFs

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693737#action_12693737 ] 

Hadoop QA commented on PIG-732:
-------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12404118/udf.v3.patch
  against trunk revision 759376.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/17/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/17/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/17/console

This message is automatically generated.

> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch, udf.v2.patch, udf.v3.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-732) Utility UDFs

Posted by "Ankur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur updated PIG-732:
----------------------

    Attachment: udf.v2.patch

> Utility UDFs 
> -------------
>
>                 Key: PIG-732
>                 URL: https://issues.apache.org/jira/browse/PIG-732
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ankur
>            Priority: Minor
>         Attachments: udf.v1.patch, udf.v2.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number (type long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines (Yahoo, Google, AOL, Live) and extracts and normalizes the search query present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.