You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@datafu.apache.org by "Matthew Hayes (JIRA)" <ji...@apache.org> on 2016/03/08 16:12:40 UTC

[jira] [Commented] (DATAFU-116) Make SetIntersect and SetDifference implement Accumulator

    [ https://issues.apache.org/jira/browse/DATAFU-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15185046#comment-15185046 ] 

Matthew Hayes commented on DATAFU-116:
--------------------------------------

I don't think an efficient accumulator implementation is possible for these UDFs. We have no control over how the data from each bag is fed into the accumulate method. You'd be forced to hold values from the bags in memory, which makes memory usage worse.

> Make SetIntersect and SetDifference implement Accumulator
> ---------------------------------------------------------
>
>                 Key: DATAFU-116
>                 URL: https://issues.apache.org/jira/browse/DATAFU-116
>             Project: DataFu
>          Issue Type: Improvement
>    Affects Versions: 1.3.0
>            Reporter: Eyal Allweil
>
> SetIntersect and SetDifference accept only sorted bags, and the output is always smaller than the inputs. Therefore an accumulator implementation should be possible and it will improve memory usage (somewhat) and allow Pig to optimize loops with these operations better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)