You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Andrew Hitchcock (JIRA)" <ji...@apache.org> on 2010/08/25 03:54:18 UTC

[jira] Created: (PIG-1565) additional piggybank datetime and string UDFs

additional piggybank datetime and string UDFs
---------------------------------------------

                 Key: PIG-1565
                 URL: https://issues.apache.org/jira/browse/PIG-1565
             Project: Pig
          Issue Type: Improvement
            Reporter: Andrew Hitchcock


Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1565) additional piggybank datetime and string UDFs

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903515#action_12903515 ] 

Alan Gates commented on PIG-1565:
---------------------------------

Unit tests run fine.

When I run contrib tests, one of the tests in this patch fails:

{code}
Testsuite: org.apache.pig.piggybank.test.evaluation.string.TestString
Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 0.751 sec
------------- Standard Error -----------------
10/08/27 11:16:31 WARN string.SUBSTRING: invalid number of arguments to SUBSTRING
------------- ---------------- ---------------

Testcase: testSimple took 0.683 sec
    FAILED
expected:<lo> but was:<null>
junit.framework.AssertionFailedError: expected:<lo> but was:<null>
    at org.apache.pig.piggybank.test.evaluation.string.TestString.go(Unknown Source)
    at org.apache.pig.piggybank.test.evaluation.string.TestString.testSimple(Unknown Source)

Testcase: testFormatTypes took 0.048 sec
{code}

> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
>                 Key: PIG-1565
>                 URL: https://issues.apache.org/jira/browse/PIG-1565
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Andrew Hitchcock
>            Assignee: Andrew Hitchcock
>             Fix For: 0.8.0
>
>         Attachments: PIG-1565-1.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1565) additional piggybank datetime and string UDFs

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903123#action_12903123 ] 

Alan Gates commented on PIG-1565:
---------------------------------

Comments
# ErrorCatchingBase swallows any non-ExecExceptions.  It should print their messages out as warnings.  Warnings are collated and the count reported at the end of the job.  Details are only printed if the user asks for them.  That way the user will still be informed that something unexpected happened and can investigate further if he wants to.
# On the duplication, it looks to me like INDEX_OF and LAST_INDEX_OF are supersets of the functions already in Pig.  You could submit a patch for those two functions (which are now builtins) to extend them to take the optional third argument.  SPLIT_ON_REGEX looks like a subset of the existing SPLIT function that is built into Pig, so other than having it as an alias so that Amazon users who are used to calling SPLIT_ON_REGEX I'm not clear what the value is.

Thanks for contributing all these, this is great.

I'll run test-patch and the unit tests and post the results.


> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
>                 Key: PIG-1565
>                 URL: https://issues.apache.org/jira/browse/PIG-1565
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Andrew Hitchcock
>         Attachments: PIG-1565-1.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1565) additional piggybank datetime and string UDFs

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-1565:
----------------------------

    Status: Open  (was: Patch Available)

> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
>                 Key: PIG-1565
>                 URL: https://issues.apache.org/jira/browse/PIG-1565
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Andrew Hitchcock
>            Assignee: Andrew Hitchcock
>             Fix For: 0.8.0
>
>         Attachments: PIG-1565-1.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1565) additional piggybank datetime and string UDFs

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-1565:
----------------------------

    Fix Version/s: 0.8.0

> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
>                 Key: PIG-1565
>                 URL: https://issues.apache.org/jira/browse/PIG-1565
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Andrew Hitchcock
>            Assignee: Andrew Hitchcock
>             Fix For: 0.8.0
>
>         Attachments: PIG-1565-1.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1565) additional piggybank datetime and string UDFs

Posted by "Andrew Hitchcock (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Hitchcock updated PIG-1565:
----------------------------------

    Attachment: PIG-1565-1.patch

This patch provides a number of UDFs written by the Amazon Elastic MapReduce team that we feel are useful.

A few of these UDFs are duplicates of existing functionality. I am including them because they are consistent with the rest of the UDFs in this patch and because I'd like to start a discussion about the best way to include these UDFs. Here is a list of what I believe to be duplicate UDFs:

INDEX_OF
LAST_INDEX_OF
SPLIT_ON_REGEX

Here are descriptions of the provided UDFs.

datetime/
 These are based on JodaTime and provide a similar model for date handling.

DATE_TIME
 A function that returns a DateTime String, of the form yyyy-MM-dd'T'HH:mm:ss.SSSZZ.
DURATION
 A function that returns a Duration as a long. A duration is a length of time specified in milliseconds.
EXTRACT_DT
 Extracts the integer numeric value of a field of a LocalDate, LocalTime, DateTime, Period or Duration.
FORMAT_DT
 Formats a LocalDate, LocalTime or DateTime given a format string into a string.
LOCAL_DATE
 A function that returns a LocalDate String, of the form yyyy-MM-dd.
LOCAL_TIME
 A function that returns a LocalTime String, of the form HH:mm:ss.SSS.
OFFSET_DT
 Offsets a LocalDate, LocalTime or DateTime by a Period/Duration, returning an object of the same type.
PERIOD
 A function that returns a Period String. A Period is specified in terms of individual duration fields such as years and days.

string/
 String handling functions modeled after Apache Commons StringUtils.

CAPITALIZE
 Capitalizes a String changing the first letter to upper case.
CENTER
 Centers a String in a larger String
CONCAT_WITH
 Joins the arguments with String joiner.
EXTRACT
 Parses input String with a regular expression, and returns all matches groups.
FORMAT
 Formats a list of arguments into a single String
INDEX_OF
 Finds the first index within a String, from a optional start position, handling null
LAST_INDEX_OF
 Finds the last index within a String, from a optional start position, handling null
LEFT_PAD
 Left pads a string to one of size size.
REPEAT
 Repeat a String repeat times to form a new String.
REPLACE_ONCE
 Replaces a String with another String inside a larger String, once.
RIGHT_PAD
 Right pads a string to one of size size.
SPLIT_ON_REGEX
 Splits this string around matches of the given regular expression.
STRIP
 Strips any of a set of characters from the start and end of a String.
STRIP_END
 Strips any of a set of characters from the start of a String.
STRIP_START
 Strips any of a set of characters from the start of a String.
SWAP_CASE
 Swaps the case of a String changing upper and title case to lower case, and lower case to upper case.

> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
>                 Key: PIG-1565
>                 URL: https://issues.apache.org/jira/browse/PIG-1565
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Andrew Hitchcock
>         Attachments: PIG-1565-1.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1565) additional piggybank datetime and string UDFs

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1565:
--------------------------------

    Fix Version/s:     (was: 0.8.0)
                   0.9.0

Pushing to the next release since the patch is not quite ready to be committed

> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
>                 Key: PIG-1565
>                 URL: https://issues.apache.org/jira/browse/PIG-1565
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Andrew Hitchcock
>            Assignee: Andrew Hitchcock
>             Fix For: 0.9.0
>
>         Attachments: PIG-1565-1.patch, PIG-1565-2.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1565) additional piggybank datetime and string UDFs

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903174#action_12903174 ] 

Alan Gates commented on PIG-1565:
---------------------------------

     [exec] +1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     +1 tests included.  The patch appears to include 5 new or modified tests.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
     [exec]
     [exec]

> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
>                 Key: PIG-1565
>                 URL: https://issues.apache.org/jira/browse/PIG-1565
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Andrew Hitchcock
>            Assignee: Andrew Hitchcock
>             Fix For: 0.8.0
>
>         Attachments: PIG-1565-1.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1565) additional piggybank datetime and string UDFs

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-1565:
----------------------------

    Status: Open  (was: Patch Available)

> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
>                 Key: PIG-1565
>                 URL: https://issues.apache.org/jira/browse/PIG-1565
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Andrew Hitchcock
>            Assignee: Andrew Hitchcock
>             Fix For: 0.8.0
>
>         Attachments: PIG-1565-1.patch, PIG-1565-2.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1565) additional piggybank datetime and string UDFs

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903140#action_12903140 ] 

Dmitriy V. Ryaboy commented on PIG-1565:
----------------------------------------

Please note that there's an outstanding patch for INDEX_OF and LAST_INDEX_OF in PIG-1563

> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
>                 Key: PIG-1565
>                 URL: https://issues.apache.org/jira/browse/PIG-1565
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Andrew Hitchcock
>            Assignee: Andrew Hitchcock
>             Fix For: 0.8.0
>
>         Attachments: PIG-1565-1.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1565) additional piggybank datetime and string UDFs

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917081#action_12917081 ] 

Alan Gates commented on PIG-1565:
---------------------------------

     [exec] -1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     +1 tests included.  The patch appears to include 8 new or modified tests.
     [exec]
     [exec]     -1 javadoc.  The javadoc tool appears to have generated 1 warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
     [exec]
     [exec]

The javadoc warning is:

  [javadoc] /home/gates/src/pig/PIG-1565/trunk/src/org/apache/pig/builtin/INDEXOF.java:78: warning - Tag @link: can't find INDEX_OF(int, int) in java.lang.String

Building Piggybank now fails as well, since some of the ErrorCatchingBase class was moved into main Pig.

Also, the patch fails a couple of unit tests in TestStringUDFs.  It fails testIndexOf and testLastIndexOf() because it doesn't properly handle the null case.

I'll attach the output from running the tests.

> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
>                 Key: PIG-1565
>                 URL: https://issues.apache.org/jira/browse/PIG-1565
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Andrew Hitchcock
>            Assignee: Andrew Hitchcock
>             Fix For: 0.8.0
>
>         Attachments: PIG-1565-1.patch, PIG-1565-2.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1565) additional piggybank datetime and string UDFs

Posted by "Andrew Hitchcock (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Hitchcock updated PIG-1565:
----------------------------------

    Status: Patch Available  (was: Open)

> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
>                 Key: PIG-1565
>                 URL: https://issues.apache.org/jira/browse/PIG-1565
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Andrew Hitchcock
>         Attachments: PIG-1565-1.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1565) additional piggybank datetime and string UDFs

Posted by "Andrew Hitchcock (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Hitchcock updated PIG-1565:
----------------------------------

    Attachment: PIG-1565-2.patch

Made changes to LAST_INDEX_OF, INDEXOF, and SPLIT_ON_REGEX as per request. Also fixed the test case bug, which was caused by a missing change (this patch now extends SUBSTRING with more functionality).

> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
>                 Key: PIG-1565
>                 URL: https://issues.apache.org/jira/browse/PIG-1565
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Andrew Hitchcock
>            Assignee: Andrew Hitchcock
>             Fix For: 0.8.0
>
>         Attachments: PIG-1565-1.patch, PIG-1565-2.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1565) additional piggybank datetime and string UDFs

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913869#action_12913869 ] 

Alan Gates commented on PIG-1565:
---------------------------------

I'll review this patch.

> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
>                 Key: PIG-1565
>                 URL: https://issues.apache.org/jira/browse/PIG-1565
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Andrew Hitchcock
>            Assignee: Andrew Hitchcock
>             Fix For: 0.8.0
>
>         Attachments: PIG-1565-1.patch, PIG-1565-2.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-1565) additional piggybank datetime and string UDFs

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates reassigned PIG-1565:
-------------------------------

    Assignee: Andrew Hitchcock

> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
>                 Key: PIG-1565
>                 URL: https://issues.apache.org/jira/browse/PIG-1565
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Andrew Hitchcock
>            Assignee: Andrew Hitchcock
>             Fix For: 0.8.0
>
>         Attachments: PIG-1565-1.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1565) additional piggybank datetime and string UDFs

Posted by "Andrew Hitchcock (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Hitchcock updated PIG-1565:
----------------------------------

    Status: Patch Available  (was: Open)

> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
>                 Key: PIG-1565
>                 URL: https://issues.apache.org/jira/browse/PIG-1565
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Andrew Hitchcock
>            Assignee: Andrew Hitchcock
>             Fix For: 0.8.0
>
>         Attachments: PIG-1565-1.patch, PIG-1565-2.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.