You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2009/11/23 18:26:45 UTC

[jira] Resolved: (PIG-844) PERFORMANCE: streaming data to the UDFs in foreach

     [ https://issues.apache.org/jira/browse/PIG-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich resolved PIG-844.
--------------------------------


accumulate interface took care of this.

> PERFORMANCE: streaming data to the UDFs in foreach
> --------------------------------------------------
>
>                 Key: PIG-844
>                 URL: https://issues.apache.org/jira/browse/PIG-844
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>
> Currently, Pig places the data passed to UDFs into a bag. This can cause the process to use more memory than actually needed as in many cases it would be better to push the data one tuple at a time to the UDFs.
> For the case where combiner is invoked, this might not be that important; however, for non-algebraic UDFs as well as other cases where combiner can't be used, this can provide significant memory improvement.
> Another possible use case is where the data is already grouped going into pig and we don't need to group it again.
> How this will effect UDF interface needs to be further discussed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.