You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Andrew Hitchcock (JIRA)" <ji...@apache.org> on 2010/08/25 03:54:18 UTC
[jira] Created: (PIG-1565) additional piggybank datetime and string
UDFs
additional piggybank datetime and string UDFs
---------------------------------------------
Key: PIG-1565
URL: https://issues.apache.org/jira/browse/PIG-1565
Project: Pig
Issue Type: Improvement
Reporter: Andrew Hitchcock
Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1565) additional piggybank datetime and
string UDFs
Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903515#action_12903515 ]
Alan Gates commented on PIG-1565:
---------------------------------
Unit tests run fine.
When I run contrib tests, one of the tests in this patch fails:
{code}
Testsuite: org.apache.pig.piggybank.test.evaluation.string.TestString
Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 0.751 sec
------------- Standard Error -----------------
10/08/27 11:16:31 WARN string.SUBSTRING: invalid number of arguments to SUBSTRING
------------- ---------------- ---------------
Testcase: testSimple took 0.683 sec
FAILED
expected:<lo> but was:<null>
junit.framework.AssertionFailedError: expected:<lo> but was:<null>
at org.apache.pig.piggybank.test.evaluation.string.TestString.go(Unknown Source)
at org.apache.pig.piggybank.test.evaluation.string.TestString.testSimple(Unknown Source)
Testcase: testFormatTypes took 0.048 sec
{code}
> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
> Key: PIG-1565
> URL: https://issues.apache.org/jira/browse/PIG-1565
> Project: Pig
> Issue Type: Improvement
> Reporter: Andrew Hitchcock
> Assignee: Andrew Hitchcock
> Fix For: 0.8.0
>
> Attachments: PIG-1565-1.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1565) additional piggybank datetime and
string UDFs
Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903123#action_12903123 ]
Alan Gates commented on PIG-1565:
---------------------------------
Comments
# ErrorCatchingBase swallows any non-ExecExceptions. It should print their messages out as warnings. Warnings are collated and the count reported at the end of the job. Details are only printed if the user asks for them. That way the user will still be informed that something unexpected happened and can investigate further if he wants to.
# On the duplication, it looks to me like INDEX_OF and LAST_INDEX_OF are supersets of the functions already in Pig. You could submit a patch for those two functions (which are now builtins) to extend them to take the optional third argument. SPLIT_ON_REGEX looks like a subset of the existing SPLIT function that is built into Pig, so other than having it as an alias so that Amazon users who are used to calling SPLIT_ON_REGEX I'm not clear what the value is.
Thanks for contributing all these, this is great.
I'll run test-patch and the unit tests and post the results.
> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
> Key: PIG-1565
> URL: https://issues.apache.org/jira/browse/PIG-1565
> Project: Pig
> Issue Type: Improvement
> Reporter: Andrew Hitchcock
> Attachments: PIG-1565-1.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1565) additional piggybank datetime and string
UDFs
Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Gates updated PIG-1565:
----------------------------
Status: Open (was: Patch Available)
> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
> Key: PIG-1565
> URL: https://issues.apache.org/jira/browse/PIG-1565
> Project: Pig
> Issue Type: Improvement
> Reporter: Andrew Hitchcock
> Assignee: Andrew Hitchcock
> Fix For: 0.8.0
>
> Attachments: PIG-1565-1.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1565) additional piggybank datetime and string
UDFs
Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Gates updated PIG-1565:
----------------------------
Fix Version/s: 0.8.0
> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
> Key: PIG-1565
> URL: https://issues.apache.org/jira/browse/PIG-1565
> Project: Pig
> Issue Type: Improvement
> Reporter: Andrew Hitchcock
> Assignee: Andrew Hitchcock
> Fix For: 0.8.0
>
> Attachments: PIG-1565-1.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1565) additional piggybank datetime and string
UDFs
Posted by "Andrew Hitchcock (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Hitchcock updated PIG-1565:
----------------------------------
Attachment: PIG-1565-1.patch
This patch provides a number of UDFs written by the Amazon Elastic MapReduce team that we feel are useful.
A few of these UDFs are duplicates of existing functionality. I am including them because they are consistent with the rest of the UDFs in this patch and because I'd like to start a discussion about the best way to include these UDFs. Here is a list of what I believe to be duplicate UDFs:
INDEX_OF
LAST_INDEX_OF
SPLIT_ON_REGEX
Here are descriptions of the provided UDFs.
datetime/
These are based on JodaTime and provide a similar model for date handling.
DATE_TIME
A function that returns a DateTime String, of the form yyyy-MM-dd'T'HH:mm:ss.SSSZZ.
DURATION
A function that returns a Duration as a long. A duration is a length of time specified in milliseconds.
EXTRACT_DT
Extracts the integer numeric value of a field of a LocalDate, LocalTime, DateTime, Period or Duration.
FORMAT_DT
Formats a LocalDate, LocalTime or DateTime given a format string into a string.
LOCAL_DATE
A function that returns a LocalDate String, of the form yyyy-MM-dd.
LOCAL_TIME
A function that returns a LocalTime String, of the form HH:mm:ss.SSS.
OFFSET_DT
Offsets a LocalDate, LocalTime or DateTime by a Period/Duration, returning an object of the same type.
PERIOD
A function that returns a Period String. A Period is specified in terms of individual duration fields such as years and days.
string/
String handling functions modeled after Apache Commons StringUtils.
CAPITALIZE
Capitalizes a String changing the first letter to upper case.
CENTER
Centers a String in a larger String
CONCAT_WITH
Joins the arguments with String joiner.
EXTRACT
Parses input String with a regular expression, and returns all matches groups.
FORMAT
Formats a list of arguments into a single String
INDEX_OF
Finds the first index within a String, from a optional start position, handling null
LAST_INDEX_OF
Finds the last index within a String, from a optional start position, handling null
LEFT_PAD
Left pads a string to one of size size.
REPEAT
Repeat a String repeat times to form a new String.
REPLACE_ONCE
Replaces a String with another String inside a larger String, once.
RIGHT_PAD
Right pads a string to one of size size.
SPLIT_ON_REGEX
Splits this string around matches of the given regular expression.
STRIP
Strips any of a set of characters from the start and end of a String.
STRIP_END
Strips any of a set of characters from the start of a String.
STRIP_START
Strips any of a set of characters from the start of a String.
SWAP_CASE
Swaps the case of a String changing upper and title case to lower case, and lower case to upper case.
> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
> Key: PIG-1565
> URL: https://issues.apache.org/jira/browse/PIG-1565
> Project: Pig
> Issue Type: Improvement
> Reporter: Andrew Hitchcock
> Attachments: PIG-1565-1.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1565) additional piggybank datetime and string
UDFs
Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich updated PIG-1565:
--------------------------------
Fix Version/s: (was: 0.8.0)
0.9.0
Pushing to the next release since the patch is not quite ready to be committed
> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
> Key: PIG-1565
> URL: https://issues.apache.org/jira/browse/PIG-1565
> Project: Pig
> Issue Type: Improvement
> Reporter: Andrew Hitchcock
> Assignee: Andrew Hitchcock
> Fix For: 0.9.0
>
> Attachments: PIG-1565-1.patch, PIG-1565-2.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1565) additional piggybank datetime and
string UDFs
Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903174#action_12903174 ]
Alan Gates commented on PIG-1565:
---------------------------------
[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 5 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
[exec]
[exec]
> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
> Key: PIG-1565
> URL: https://issues.apache.org/jira/browse/PIG-1565
> Project: Pig
> Issue Type: Improvement
> Reporter: Andrew Hitchcock
> Assignee: Andrew Hitchcock
> Fix For: 0.8.0
>
> Attachments: PIG-1565-1.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1565) additional piggybank datetime and string
UDFs
Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Gates updated PIG-1565:
----------------------------
Status: Open (was: Patch Available)
> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
> Key: PIG-1565
> URL: https://issues.apache.org/jira/browse/PIG-1565
> Project: Pig
> Issue Type: Improvement
> Reporter: Andrew Hitchcock
> Assignee: Andrew Hitchcock
> Fix For: 0.8.0
>
> Attachments: PIG-1565-1.patch, PIG-1565-2.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1565) additional piggybank datetime and
string UDFs
Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903140#action_12903140 ]
Dmitriy V. Ryaboy commented on PIG-1565:
----------------------------------------
Please note that there's an outstanding patch for INDEX_OF and LAST_INDEX_OF in PIG-1563
> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
> Key: PIG-1565
> URL: https://issues.apache.org/jira/browse/PIG-1565
> Project: Pig
> Issue Type: Improvement
> Reporter: Andrew Hitchcock
> Assignee: Andrew Hitchcock
> Fix For: 0.8.0
>
> Attachments: PIG-1565-1.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1565) additional piggybank datetime and
string UDFs
Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917081#action_12917081 ]
Alan Gates commented on PIG-1565:
---------------------------------
[exec] -1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 8 new or modified tests.
[exec]
[exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
[exec]
[exec]
The javadoc warning is:
[javadoc] /home/gates/src/pig/PIG-1565/trunk/src/org/apache/pig/builtin/INDEXOF.java:78: warning - Tag @link: can't find INDEX_OF(int, int) in java.lang.String
Building Piggybank now fails as well, since some of the ErrorCatchingBase class was moved into main Pig.
Also, the patch fails a couple of unit tests in TestStringUDFs. It fails testIndexOf and testLastIndexOf() because it doesn't properly handle the null case.
I'll attach the output from running the tests.
> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
> Key: PIG-1565
> URL: https://issues.apache.org/jira/browse/PIG-1565
> Project: Pig
> Issue Type: Improvement
> Reporter: Andrew Hitchcock
> Assignee: Andrew Hitchcock
> Fix For: 0.8.0
>
> Attachments: PIG-1565-1.patch, PIG-1565-2.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1565) additional piggybank datetime and string
UDFs
Posted by "Andrew Hitchcock (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Hitchcock updated PIG-1565:
----------------------------------
Status: Patch Available (was: Open)
> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
> Key: PIG-1565
> URL: https://issues.apache.org/jira/browse/PIG-1565
> Project: Pig
> Issue Type: Improvement
> Reporter: Andrew Hitchcock
> Attachments: PIG-1565-1.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1565) additional piggybank datetime and string
UDFs
Posted by "Andrew Hitchcock (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Hitchcock updated PIG-1565:
----------------------------------
Attachment: PIG-1565-2.patch
Made changes to LAST_INDEX_OF, INDEXOF, and SPLIT_ON_REGEX as per request. Also fixed the test case bug, which was caused by a missing change (this patch now extends SUBSTRING with more functionality).
> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
> Key: PIG-1565
> URL: https://issues.apache.org/jira/browse/PIG-1565
> Project: Pig
> Issue Type: Improvement
> Reporter: Andrew Hitchcock
> Assignee: Andrew Hitchcock
> Fix For: 0.8.0
>
> Attachments: PIG-1565-1.patch, PIG-1565-2.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1565) additional piggybank datetime and
string UDFs
Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913869#action_12913869 ]
Alan Gates commented on PIG-1565:
---------------------------------
I'll review this patch.
> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
> Key: PIG-1565
> URL: https://issues.apache.org/jira/browse/PIG-1565
> Project: Pig
> Issue Type: Improvement
> Reporter: Andrew Hitchcock
> Assignee: Andrew Hitchcock
> Fix For: 0.8.0
>
> Attachments: PIG-1565-1.patch, PIG-1565-2.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1565) additional piggybank datetime and
string UDFs
Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Gates reassigned PIG-1565:
-------------------------------
Assignee: Andrew Hitchcock
> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
> Key: PIG-1565
> URL: https://issues.apache.org/jira/browse/PIG-1565
> Project: Pig
> Issue Type: Improvement
> Reporter: Andrew Hitchcock
> Assignee: Andrew Hitchcock
> Fix For: 0.8.0
>
> Attachments: PIG-1565-1.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1565) additional piggybank datetime and string
UDFs
Posted by "Andrew Hitchcock (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Hitchcock updated PIG-1565:
----------------------------------
Status: Patch Available (was: Open)
> additional piggybank datetime and string UDFs
> ---------------------------------------------
>
> Key: PIG-1565
> URL: https://issues.apache.org/jira/browse/PIG-1565
> Project: Pig
> Issue Type: Improvement
> Reporter: Andrew Hitchcock
> Assignee: Andrew Hitchcock
> Fix For: 0.8.0
>
> Attachments: PIG-1565-1.patch, PIG-1565-2.patch
>
>
> Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.