You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Praveen Sripati <pr...@gmail.com> on 2011/09/25 18:42:09 UTC

Calculations of the InputSplits

Hi,

There was a query in StackOverflow regarding high CPU on the client after
submitting jobs (upto 200 jobs in batch and 150MB jar file size).
Calculation of the InputSplit may be one of the reason for the high CPU on
the client. Why should the calculation of the InputSplit happen on the
client? JobTracker is a high-end machine, can't the calculation happen on
the JobTracker?

http://stackoverflow.com/questions/7546064/hadoop-high-cpu-load-on-client-side-after-committing-jobs

Thanks,
Praveen

Re: Calculations of the InputSplits

Posted by Arun C Murthy <ac...@hortonworks.com>.
The reason it isn't done in JobTracker is to not run user-code within the framework - InputSplit.getSplits() is user code.

In MRv1 is was highly complicated - in MRv2 it's trivial to do it the MR ApplicationMaster, I'll get to it some wknd soon - patches welcome! :)

Arun

On Sep 25, 2011, at 9:42 AM, Praveen Sripati wrote:

> Hi,
> 
> There was a query in StackOverflow regarding high CPU on the client after
> submitting jobs (upto 200 jobs in batch and 150MB jar file size).
> Calculation of the InputSplit may be one of the reason for the high CPU on
> the client. Why should the calculation of the InputSplit happen on the
> client? JobTracker is a high-end machine, can't the calculation happen on
> the JobTracker?
> 
> http://stackoverflow.com/questions/7546064/hadoop-high-cpu-load-on-client-side-after-committing-jobs
> 
> Thanks,
> Praveen


Re: Calculations of the InputSplits

Posted by Harsh J <ha...@cloudera.com>.
Hello Praveen,

That is a valid point. Besides, it can even be a task that computes
the splits (Safer this way, instead of running _inside_ the
JobTracker).

Lets continue the discussion on
https://issues.apache.org/jira/browse/MAPREDUCE-207 which seems very
relevant to this.

On Sun, Sep 25, 2011 at 10:12 PM, Praveen Sripati
<pr...@gmail.com> wrote:
> Hi,
>
> There was a query in StackOverflow regarding high CPU on the client after
> submitting jobs (upto 200 jobs in batch and 150MB jar file size).
> Calculation of the InputSplit may be one of the reason for the high CPU on
> the client. Why should the calculation of the InputSplit happen on the
> client? JobTracker is a high-end machine, can't the calculation happen on
> the JobTracker?
>
> http://stackoverflow.com/questions/7546064/hadoop-high-cpu-load-on-client-side-after-committing-jobs
>
> Thanks,
> Praveen
>



-- 
Harsh J