You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Mathias Herberts (JIRA)" <ji...@apache.org> on 2011/02/23 21:34:38 UTC

[jira] Created: (PIG-1867) Allow UDFs that can generate multiple output tuples from a single input tuple

Allow UDFs that can generate multiple output tuples from a single input tuple
-----------------------------------------------------------------------------

                 Key: PIG-1867
                 URL: https://issues.apache.org/jira/browse/PIG-1867
             Project: Pig
          Issue Type: New Feature
            Reporter: Mathias Herberts


Hive offers this kind of thing using UDTF (User Defined Table generating Functions), it would be very useful for Pig to offer something similar, thus allowing more complex processing.

One example of such use could be an n-gram generating function.

I guess EvalFunc could be adapted/morped so exec returns an Iterator<T> instead of T.

In a first approach, the iterator scanning could be restricted to cases when the UDF is used alone in a generate clause.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Resolved: (PIG-1867) Allow UDFs that can generate multiple output tuples from a single input tuple

Posted by "Mathias Herberts (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mathias Herberts resolved PIG-1867.
-----------------------------------

    Resolution: Not A Problem

> Allow UDFs that can generate multiple output tuples from a single input tuple
> -----------------------------------------------------------------------------
>
>                 Key: PIG-1867
>                 URL: https://issues.apache.org/jira/browse/PIG-1867
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Mathias Herberts
>
> Hive offers this kind of thing using UDTF (User Defined Table generating Functions), it would be very useful for Pig to offer something similar, thus allowing more complex processing.
> One example of such use could be an n-gram generating function.
> I guess EvalFunc could be adapted/morped so exec returns an Iterator<T> instead of T.
> In a first approach, the iterator scanning could be restricted to cases when the UDF is used alone in a generate clause.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (PIG-1867) Allow UDFs that can generate multiple output tuples from a single input tuple

Posted by "Mathias Herberts (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12998580#comment-12998580 ] 

Mathias Herberts commented on PIG-1867:
---------------------------------------

You're right, I had never thought of returning a bag!

I guess this makes this issue a non-sense.

Thanks for the head up.

> Allow UDFs that can generate multiple output tuples from a single input tuple
> -----------------------------------------------------------------------------
>
>                 Key: PIG-1867
>                 URL: https://issues.apache.org/jira/browse/PIG-1867
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Mathias Herberts
>
> Hive offers this kind of thing using UDTF (User Defined Table generating Functions), it would be very useful for Pig to offer something similar, thus allowing more complex processing.
> One example of such use could be an n-gram generating function.
> I guess EvalFunc could be adapted/morped so exec returns an Iterator<T> instead of T.
> In a first approach, the iterator scanning could be restricted to cases when the UDF is used alone in a generate clause.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (PIG-1867) Allow UDFs that can generate multiple output tuples from a single input tuple

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12998570#comment-12998570 ] 

Alan Gates commented on PIG-1867:
---------------------------------

Pig already offers this.  Have your UDF return a bag.  You can then use flatten to iterate through that bag.

> Allow UDFs that can generate multiple output tuples from a single input tuple
> -----------------------------------------------------------------------------
>
>                 Key: PIG-1867
>                 URL: https://issues.apache.org/jira/browse/PIG-1867
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Mathias Herberts
>
> Hive offers this kind of thing using UDTF (User Defined Table generating Functions), it would be very useful for Pig to offer something similar, thus allowing more complex processing.
> One example of such use could be an n-gram generating function.
> I guess EvalFunc could be adapted/morped so exec returns an Iterator<T> instead of T.
> In a first approach, the iterator scanning could be restricted to cases when the UDF is used alone in a generate clause.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira