You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2008/09/04 02:49:44 UTC

[jira] Updated: (PIG-409) PERFORMANCE: Removing Union from map side of query with COGROUP

     [ https://issues.apache.org/jira/browse/PIG-409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-409:
-------------------------------

        Fix Version/s: types_branch
    Affects Version/s: types_branch

> PERFORMANCE: Removing Union from map side of query with COGROUP
> ---------------------------------------------------------------
>
>                 Key: PIG-409
>                 URL: https://issues.apache.org/jira/browse/PIG-409
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>             Fix For: types_branch
>
>
> Currently, the map side code is not aware which side of the cogroup it is processing so it assumes that it processes all by putting a union at the end of the pipeline. This is fairly inefficient.
> A better approach would be to figure out which file is processed in confiugre call. There seems to be away to do this with hadoop but it is not documented so might not be guaranteed - need to follow up with somebody from hadoop project.
> Another approach is to check it the first time map is called and to pick the execution plan that matches that part.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.