You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "William Watson (JIRA)" <ji...@apache.org> on 2016/12/08 14:30:58 UTC

[jira] [Resolved] (PIG-5071) MapReduce concurrency Could Be Better

     [ https://issues.apache.org/jira/browse/PIG-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

William Watson resolved PIG-5071.
---------------------------------
    Resolution: Won't Fix

Marking as "won't fix" since the solution would be quite difficult or maybe impossible and the Tez execution engine provides a sufficient work around.

> MapReduce concurrency Could Be Better
> -------------------------------------
>
>                 Key: PIG-5071
>                 URL: https://issues.apache.org/jira/browse/PIG-5071
>             Project: Pig
>          Issue Type: Wish
>            Reporter: William Watson
>
> We have a job that launches, after optimization, about 20 MapReduce jobs. Some of these are quite long running and while pig does an okay job of running jobs concurrently, it could do better at least in this very specific case.
> The pig job can be divided up amongst 4 major sections like so:
> A1 -> A2 -> A3 -> A4 -> A
> B1 -> B2 -> B
> C1 -> C2 -> C3 -> C
> D1 -> D2 -> D3 -> D4 -> D
> and the sections are joined at the end:
> A + B -> AB
> AB + C -> ABC
> ABC + D -> ABCD
> In short, if C2 finishes very quickly, C3 won't be started until A2, B2, and D2 are all also complete. This is a problem if say, D2 takes an hour and there are unused cluster resources that could be made available to C3 (and by extension A3 and B3 if their prerequisites also finish before D2).
> One possible work around is to scale D2 better, but that's besides the point. I think pig is capable of knowing that the prerequisites are done for certain jobs, but since it only kicks off jobs in "phases", it won't kick off jobs as soon as possible.
> I've taken a look at the code and I'm having a hard time working out where the issue is or else I would be glad to contribute a patch. 
> Is this a desirable feature and is this directly controlled by pig? If so, could someone help point me in the right direction so I can contribute a patch?
> Note: We can change this from a "wish" to an "improvement" if this feature is desired...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)