You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2015/04/06 02:02:07 UTC

[jira] [Updated] (PIG-4495) Better multi-query planning in case of multiple edges

     [ https://issues.apache.org/jira/browse/PIG-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rohini Palaniswamy updated PIG-4495:
------------------------------------
    Summary: Better multi-query planning in case of multiple edges  (was: Better multi-query planning in case of union and multiple edges)

Review board link - https://reviews.apache.org/r/32868/

> Better multi-query planning in case of multiple edges
> -----------------------------------------------------
>
>                 Key: PIG-4495
>                 URL: https://issues.apache.org/jira/browse/PIG-4495
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>    Affects Versions: 0.14.0
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.15.0
>
>
> Details in https://issues.apache.org/jira/browse/TEZ-1190?focusedCommentId=14393033&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14393033
> People split the data, perform some foreach transformations/filter, union them and then do some operation like group by or join with other data. In those cases it creates multiple edges from same Split, so we do not merge them, but  
> write out the data to another dummy vertex to avoid multiple edges and this adds overhead and affects performance. Vertex groups accept multiple edges from same vertex. So if the multiple edges end up in a vertex group (and not a vertex which is the case in self join) we can avoid the dummy vertex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)