You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Jonathan Coveney (JIRA)" <ji...@apache.org> on 2012/10/26 20:33:13 UTC

[jira] [Created] (PIG-3010) Allow UDF's to flatten themselves

Jonathan Coveney created PIG-3010:
-------------------------------------

             Summary: Allow UDF's to flatten themselves
                 Key: PIG-3010
                 URL: https://issues.apache.org/jira/browse/PIG-3010
             Project: Pig
          Issue Type: Improvement
            Reporter: Jonathan Coveney
            Assignee: Jonathan Coveney
             Fix For: 0.12


This is something I thought would be cool for a while, so I sat down and did it because I think there are some useful debugging tools it'd help with.

The idea is that if you attach an annotation to a UDF, the Tuple or DataBag you output will be flattened. This is quite powerful. A very common pattern is:

a = foreach data generate Flatten(MyUdf(thing)) as (a,b,c);

This would let you just do:

a = foreach data generate MyUdf(thing);

With the exact same result!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3010) Allow UDF's to flatten themselves

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Coveney updated PIG-3010:
----------------------------------

    Attachment: PIG-3010-1.patch
    
> Allow UDF's to flatten themselves
> ---------------------------------
>
>                 Key: PIG-3010
>                 URL: https://issues.apache.org/jira/browse/PIG-3010
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3010-0.patch, PIG-3010-1.patch
>
>
> This is something I thought would be cool for a while, so I sat down and did it because I think there are some useful debugging tools it'd help with.
> The idea is that if you attach an annotation to a UDF, the Tuple or DataBag you output will be flattened. This is quite powerful. A very common pattern is:
> a = foreach data generate Flatten(MyUdf(thing)) as (a,b,c);
> This would let you just do:
> a = foreach data generate MyUdf(thing);
> With the exact same result!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3010) Allow UDF's to flatten themselves

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Coveney updated PIG-3010:
----------------------------------

    Attachment: PIG-3010-0.patch

Here is a patch that does this. The changes are further reaching than they otherwise might need to be, but this is because this is a good time to futureproof flatten by using an enum approach instead.

A nice side effect is that you can implement FLATTEN as a UDF (though this isn't necessarily desirable as it is going to add some overhead...still, the fact that it _can be done_ is quite powerful). That UDF is src/org/apache/pig/builtin/UdfFlatten.java

This let's you do a lot of really neat stuff, such as:

{code}
a = load 'data2' as (x:int,y:int);
b = foreach a generate UdfFlatten(x,y);
describe b;
{code}

which results in:
{code}
b: {x: int,y: int}
{code}

Woah! Previously, this was impossible. What happens if you dump? The result is
{code}
(1,10)
(4,11)
(5,10)
{code}

Woah!

You can even do the following:

{code}
a = load 'data2' as (x:int,y:int);
b = foreach a generate UdfFlatten(TOTUPLE(x,y));
dump b;
{code}

And it works for bags as well. The uses are obvious IMHO.
                
> Allow UDF's to flatten themselves
> ---------------------------------
>
>                 Key: PIG-3010
>                 URL: https://issues.apache.org/jira/browse/PIG-3010
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3010-0.patch
>
>
> This is something I thought would be cool for a while, so I sat down and did it because I think there are some useful debugging tools it'd help with.
> The idea is that if you attach an annotation to a UDF, the Tuple or DataBag you output will be flattened. This is quite powerful. A very common pattern is:
> a = foreach data generate Flatten(MyUdf(thing)) as (a,b,c);
> This would let you just do:
> a = foreach data generate MyUdf(thing);
> With the exact same result!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3010) Allow UDF's to flatten themselves

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Coveney updated PIG-3010:
----------------------------------

    Status: Patch Available  (was: Open)
    
> Allow UDF's to flatten themselves
> ---------------------------------
>
>                 Key: PIG-3010
>                 URL: https://issues.apache.org/jira/browse/PIG-3010
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3010-0.patch
>
>
> This is something I thought would be cool for a while, so I sat down and did it because I think there are some useful debugging tools it'd help with.
> The idea is that if you attach an annotation to a UDF, the Tuple or DataBag you output will be flattened. This is quite powerful. A very common pattern is:
> a = foreach data generate Flatten(MyUdf(thing)) as (a,b,c);
> This would let you just do:
> a = foreach data generate MyUdf(thing);
> With the exact same result!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira