You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Jonathan Coveney (JIRA)" <ji...@apache.org> on 2012/10/26 20:33:13 UTC
[jira] [Created] (PIG-3010) Allow UDF's to flatten themselves
Jonathan Coveney created PIG-3010:
-------------------------------------
Summary: Allow UDF's to flatten themselves
Key: PIG-3010
URL: https://issues.apache.org/jira/browse/PIG-3010
Project: Pig
Issue Type: Improvement
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
Fix For: 0.12
This is something I thought would be cool for a while, so I sat down and did it because I think there are some useful debugging tools it'd help with.
The idea is that if you attach an annotation to a UDF, the Tuple or DataBag you output will be flattened. This is quite powerful. A very common pattern is:
a = foreach data generate Flatten(MyUdf(thing)) as (a,b,c);
This would let you just do:
a = foreach data generate MyUdf(thing);
With the exact same result!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3010) Allow UDF's to flatten themselves
Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Coveney updated PIG-3010:
----------------------------------
Attachment: PIG-3010-1.patch
> Allow UDF's to flatten themselves
> ---------------------------------
>
> Key: PIG-3010
> URL: https://issues.apache.org/jira/browse/PIG-3010
> Project: Pig
> Issue Type: Improvement
> Reporter: Jonathan Coveney
> Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: PIG-3010-0.patch, PIG-3010-1.patch
>
>
> This is something I thought would be cool for a while, so I sat down and did it because I think there are some useful debugging tools it'd help with.
> The idea is that if you attach an annotation to a UDF, the Tuple or DataBag you output will be flattened. This is quite powerful. A very common pattern is:
> a = foreach data generate Flatten(MyUdf(thing)) as (a,b,c);
> This would let you just do:
> a = foreach data generate MyUdf(thing);
> With the exact same result!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3010) Allow UDF's to flatten themselves
Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Coveney updated PIG-3010:
----------------------------------
Attachment: PIG-3010-0.patch
Here is a patch that does this. The changes are further reaching than they otherwise might need to be, but this is because this is a good time to futureproof flatten by using an enum approach instead.
A nice side effect is that you can implement FLATTEN as a UDF (though this isn't necessarily desirable as it is going to add some overhead...still, the fact that it _can be done_ is quite powerful). That UDF is src/org/apache/pig/builtin/UdfFlatten.java
This let's you do a lot of really neat stuff, such as:
{code}
a = load 'data2' as (x:int,y:int);
b = foreach a generate UdfFlatten(x,y);
describe b;
{code}
which results in:
{code}
b: {x: int,y: int}
{code}
Woah! Previously, this was impossible. What happens if you dump? The result is
{code}
(1,10)
(4,11)
(5,10)
{code}
Woah!
You can even do the following:
{code}
a = load 'data2' as (x:int,y:int);
b = foreach a generate UdfFlatten(TOTUPLE(x,y));
dump b;
{code}
And it works for bags as well. The uses are obvious IMHO.
> Allow UDF's to flatten themselves
> ---------------------------------
>
> Key: PIG-3010
> URL: https://issues.apache.org/jira/browse/PIG-3010
> Project: Pig
> Issue Type: Improvement
> Reporter: Jonathan Coveney
> Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: PIG-3010-0.patch
>
>
> This is something I thought would be cool for a while, so I sat down and did it because I think there are some useful debugging tools it'd help with.
> The idea is that if you attach an annotation to a UDF, the Tuple or DataBag you output will be flattened. This is quite powerful. A very common pattern is:
> a = foreach data generate Flatten(MyUdf(thing)) as (a,b,c);
> This would let you just do:
> a = foreach data generate MyUdf(thing);
> With the exact same result!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3010) Allow UDF's to flatten themselves
Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Coveney updated PIG-3010:
----------------------------------
Status: Patch Available (was: Open)
> Allow UDF's to flatten themselves
> ---------------------------------
>
> Key: PIG-3010
> URL: https://issues.apache.org/jira/browse/PIG-3010
> Project: Pig
> Issue Type: Improvement
> Reporter: Jonathan Coveney
> Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: PIG-3010-0.patch
>
>
> This is something I thought would be cool for a while, so I sat down and did it because I think there are some useful debugging tools it'd help with.
> The idea is that if you attach an annotation to a UDF, the Tuple or DataBag you output will be flattened. This is quite powerful. A very common pattern is:
> a = foreach data generate Flatten(MyUdf(thing)) as (a,b,c);
> This would let you just do:
> a = foreach data generate MyUdf(thing);
> With the exact same result!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira