You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jason Venner <ja...@attributor.com> on 2008/03/02 06:40:52 UTC

Re: Bugs in 0.16.0? - short job runs

We find we have to force the input splits to be much larger, to ensure 
that more of our compute time is spent running the job, vrs starting up 
the jvm.
Reuse would be nice.

Owen O'Malley wrote:
>
> On Mar 1, 2008, at 12:05 PM, Amar Kamat wrote:
>
>>> 3) Lastly, it would seem beneficial for jobs that have significant 
>>> startup overhead and memory requirements to not be run in separate 
>>> JVMs for each task.  Along these lines, it looks like someone 
>>> submitted a patch for JVM-reuse a while back, but it wasn't 
>>> commited? https://issues.apache.org/jira/browse/HADOOP-249
>
> Most of the ideas in the patch for 249 were committed as other 
> patches, but that bug has been left open precisely because the idea 
> still has merit. The patch was never stable enough to commit and now 
> is hopelessly out of date. There are lots of little issues that would 
> need to be addressed for this to happen.
>
>>> Probably a question for the dev mailing list, but if I wanted to 
>>> modify hadoop to allow threading tasks, rather than running 
>>> independent JVMs, is there any reason someone hasn't done this yet?  
>>> Or am I overlooking something?
>> This is done to keep user code separate from the framework code.
>
> Precisely. We don't want to go through the security manager in the 
> servers, so it is far easier to keep user code out of the servers.
>
>> So if the user code develops a fault the framework and rest of the 
>> jobs function normally. Most of the jobs have a longer run time and 
>> hence the startup time is never a concern.
>
> As long as the tasks belong to the same job (and therefore user), 
> sharing a jvm should be fine. One concern is that currently each task 
> gets its own working directory. Since Java can't change working 
> directory in a running process, it would have to clean up the working 
> directory. That will interact badly with debugging settings that let 
> you keep the task files. However, as we speed things up, it will 
> become more important. Already we are starting to see sort maps that 
> finish in 17 seconds,  which means the 1 second of jvm startup is a 5% 
> overhead...
>
> -- Owen
>
-- 
Jason Venner
Attributor - Publish with Confidence <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers, contact if interested