You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Holden Robbins <h....@paritycomputing.com> on 2008/03/01 20:23:20 UTC

Bugs in 0.16.0?

Hello,
 
I'm just starting to dig into Hadoop and testing it's feasibility for large scale development work.  
I was wondering if anyone else being affected by these issues using hadoop 0.16.0?
I searched Jira, and I'm not sure if I saw anything that specifically fit some of these:
 
1) The symlinks for the distributed cache in the task directory are being created as 'null' directory links (stated another way, the name of the symbolic link in the directory is the string literal "null").  Am I doing something wrong to cause this, or do not many people use this functionality?
 
2) I'm running into an issue where the job is giving errors in the form:
08/03/01 09:44:25 INFO mapred.JobClient: Task Id : task_200803010908_0001_r_000002_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
08/03/01 09:44:25 WARN mapred.JobClient: Error reading task outputGo-Box4
08/03/01 09:44:25 WARN mapred.JobClient: Error reading task outputGo-Box4

The jobs appear to never finish the reducing once this happens.   The tasks themselves are long running map tasks (up to 10 minutes per input), as far as I understand from the Jira posts this  is related to the MAX_FAILED_UNIQUE_FETCHES being hard coded to 4?  Is there a known work around or fix in the pipeline?
 
Possible related jira post: https://issues.apache.org/jira/browse/HADOOP-2220
Improving the way the shuffling mechanism works may also help? https://issues.apache.org/jira/browse/HADOOP-1339
 
I've tried setting:
<property>
  <name>mapred.reduce.copy.backoff</name>
  <value>1440</value>
  <description>The maximum amount of time (in seconds) a reducer spends on  fetching one map output before declaring it as failed.</description>
</property>
 which should be 24 minutes, with no effect.
 
 
3) Lastly, it would seem beneficial for jobs that have significant startup overhead and memory requirements to not be run in separate JVMs for each task.  Along these lines, it looks like someone submitted a patch for JVM-reuse a while back, but it wasn't commited? https://issues.apache.org/jira/browse/HADOOP-249
 
Probably a question for the dev mailing list, but if I wanted to modify hadoop to allow threading tasks, rather than running independent JVMs, is there any reason someone hasn't done this yet?  Or am I overlooking something?
 
 
Thanks,
-Holden

RE: Bugs in 0.16.0?

Posted by Holden Robbins <h....@paritycomputing.com>.

Thanks to all for the responses.
 
1) I think I may have assumed it would default to the name of the file itself.  Might be a worthwhile default behavior?
2) Looks like it might be a host based firewall issue.
3) I started working on a patch to support setting # of threads per task, testing it now.
I've tried not to affect the way it runs non-threaded, by making synchronized wrappers for each of the input, output, and reporter classes which are only used when threading is used.  Not sure if it's worth the effort to maintain these instead of just making the other classes thread safe?
 
 
________________________________

From: Amareshwari Sri Ramadasu [mailto:amarsri@yahoo-inc.com]
Sent: Sun 3/2/2008 8:09 PM
To: core-user@hadoop.apache.org
Subject: Re: Bugs in 0.16.0?



Holden Robbins wrote:
> Hello,
> 
> I'm just starting to dig into Hadoop and testing it's feasibility for large scale development work. 
> I was wondering if anyone else being affected by these issues using hadoop 0.16.0?
> I searched Jira, and I'm not sure if I saw anything that specifically fit some of these:
> 
> 1) The symlinks for the distributed cache in the task directory are being created as 'null' directory links (stated another way, the name of the symbolic link in the directory is the string literal "null").  Am I doing something wrong to cause this, or do not many people use this functionality?
>  
If you want create symlinks for distributed cache the url has to have
symlink field like hdfs://host:port/<absolute-path>#<link>. And
mapred.create.symlink must be set to "yes".
If mapred.create.symlink is yes and link field is not provided,
distributed cache will create a symlink with literal "null" as you said.
> 
> 2) I'm running into an issue where the job is giving errors in the form:
> 08/03/01 09:44:25 INFO mapred.JobClient: Task Id : task_200803010908_0001_r_000002_0, Status : FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 08/03/01 09:44:25 WARN mapred.JobClient: Error reading task outputGo-Box4
> 08/03/01 09:44:25 WARN mapred.JobClient: Error reading task outputGo-Box4
>
> The jobs appear to never finish the reducing once this happens.   The tasks themselves are long running map tasks (up to 10 minutes per input), as far as I understand from the Jira posts this  is related to the MAX_FAILED_UNIQUE_FETCHES being hard coded to 4?  Is there a known work around or fix in the pipeline?
> 
> Possible related jira post: https://issues.apache.org/jira/browse/HADOOP-2220
> Improving the way the shuffling mechanism works may also help? https://issues.apache.org/jira/browse/HADOOP-1339
> 
> I've tried setting:
> <property>
>   <name>mapred.reduce.copy.backoff</name>
>   <value>1440</value>
>   <description>The maximum amount of time (in seconds) a reducer spends on  fetching one map output before declaring it as failed.</description>
> </property>
>  which should be 24 minutes, with no effect.
> 
> 
> 3) Lastly, it would seem beneficial for jobs that have significant startup overhead and memory requirements to not be run in separate JVMs for each task.  Along these lines, it looks like someone submitted a patch for JVM-reuse a while back, but it wasn't commited? https://issues.apache.org/jira/browse/HADOOP-249
> 
> Probably a question for the dev mailing list, but if I wanted to modify hadoop to allow threading tasks, rather than running independent JVMs, is there any reason someone hasn't done this yet?  Or am I overlooking something?
> 
> 
> Thanks,
> -Holden
>
>

Re: Bugs in 0.16.0?

Posted by Amareshwari Sri Ramadasu <am...@yahoo-inc.com>.

Holden Robbins wrote:
> Hello,
>  
> I'm just starting to dig into Hadoop and testing it's feasibility for large scale development work.  
> I was wondering if anyone else being affected by these issues using hadoop 0.16.0?
> I searched Jira, and I'm not sure if I saw anything that specifically fit some of these:
>  
> 1) The symlinks for the distributed cache in the task directory are being created as 'null' directory links (stated another way, the name of the symbolic link in the directory is the string literal "null").  Am I doing something wrong to cause this, or do not many people use this functionality?
>   
If you want create symlinks for distributed cache the url has to have 
symlink field like hdfs://host:port/<absolute-path>#<link>. And 
mapred.create.symlink must be set to "yes".
If mapred.create.symlink is yes and link field is not provided, 
distributed cache will create a symlink with literal "null" as you said.
>  
> 2) I'm running into an issue where the job is giving errors in the form:
> 08/03/01 09:44:25 INFO mapred.JobClient: Task Id : task_200803010908_0001_r_000002_0, Status : FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 08/03/01 09:44:25 WARN mapred.JobClient: Error reading task outputGo-Box4
> 08/03/01 09:44:25 WARN mapred.JobClient: Error reading task outputGo-Box4
>
> The jobs appear to never finish the reducing once this happens.   The tasks themselves are long running map tasks (up to 10 minutes per input), as far as I understand from the Jira posts this  is related to the MAX_FAILED_UNIQUE_FETCHES being hard coded to 4?  Is there a known work around or fix in the pipeline?
>  
> Possible related jira post: https://issues.apache.org/jira/browse/HADOOP-2220
> Improving the way the shuffling mechanism works may also help? https://issues.apache.org/jira/browse/HADOOP-1339
>  
> I've tried setting:
> <property>
>   <name>mapred.reduce.copy.backoff</name>
>   <value>1440</value>
>   <description>The maximum amount of time (in seconds) a reducer spends on  fetching one map output before declaring it as failed.</description>
> </property>
>  which should be 24 minutes, with no effect.
>  
>  
> 3) Lastly, it would seem beneficial for jobs that have significant startup overhead and memory requirements to not be run in separate JVMs for each task.  Along these lines, it looks like someone submitted a patch for JVM-reuse a while back, but it wasn't commited? https://issues.apache.org/jira/browse/HADOOP-249
>  
> Probably a question for the dev mailing list, but if I wanted to modify hadoop to allow threading tasks, rather than running independent JVMs, is there any reason someone hasn't done this yet?  Or am I overlooking something?
>  
>  
> Thanks,
> -Holden
>
>

Re: Bugs in 0.16.0? - short job runs

Posted by Jason Venner <ja...@attributor.com>.

We find we have to force the input splits to be much larger, to ensure 
that more of our compute time is spent running the job, vrs starting up 
the jvm.
Reuse would be nice.

Owen O'Malley wrote:
>
> On Mar 1, 2008, at 12:05 PM, Amar Kamat wrote:
>
>>> 3) Lastly, it would seem beneficial for jobs that have significant 
>>> startup overhead and memory requirements to not be run in separate 
>>> JVMs for each task.  Along these lines, it looks like someone 
>>> submitted a patch for JVM-reuse a while back, but it wasn't 
>>> commited? https://issues.apache.org/jira/browse/HADOOP-249
>
> Most of the ideas in the patch for 249 were committed as other 
> patches, but that bug has been left open precisely because the idea 
> still has merit. The patch was never stable enough to commit and now 
> is hopelessly out of date. There are lots of little issues that would 
> need to be addressed for this to happen.
>
>>> Probably a question for the dev mailing list, but if I wanted to 
>>> modify hadoop to allow threading tasks, rather than running 
>>> independent JVMs, is there any reason someone hasn't done this yet?  
>>> Or am I overlooking something?
>> This is done to keep user code separate from the framework code.
>
> Precisely. We don't want to go through the security manager in the 
> servers, so it is far easier to keep user code out of the servers.
>
>> So if the user code develops a fault the framework and rest of the 
>> jobs function normally. Most of the jobs have a longer run time and 
>> hence the startup time is never a concern.
>
> As long as the tasks belong to the same job (and therefore user), 
> sharing a jvm should be fine. One concern is that currently each task 
> gets its own working directory. Since Java can't change working 
> directory in a running process, it would have to clean up the working 
> directory. That will interact badly with debugging settings that let 
> you keep the task files. However, as we speed things up, it will 
> become more important. Already we are starting to see sort maps that 
> finish in 17 seconds,  which means the 1 second of jvm startup is a 5% 
> overhead...
>
> -- Owen
>
-- 
Jason Venner
Attributor - Publish with Confidence <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers, contact if interested

Re: Bugs in 0.16.0?

Posted by Arun C Murthy <ac...@yahoo-inc.com>.

On Mar 3, 2008, at 9:20 AM, Ted Dunning wrote:

>
> Hard-coded delays in order to make a protocol work are almost never  
> correct
> in the long run.  This isn't a function of real-time or batch, it  
> is simply
> a matter of the fact that hard-coded delays don't scale correctly  
> as problem
> sizes/durations change.  *Adaptive* delays such a progressive back- 
> off can
> work correctly under scale changes, but *fixed* delays are almost  
> never
> correct.
>

+1

We should remove those hard-coded ones, I'll file a jira.

Arun

> Delays may work as a band-aid in the short run, but eventually you  
> have to
> take the band-aid off.
>
>
> On 3/3/08 8:46 AM, "Amar Kamat" <am...@yahoo-inc.com> wrote:
>
>> HADOOP is not meant for real time applications. Its more or less  
>> designed
>> for long running applications like crawlers/indexers.
>> Amar
>> On Mon, 3 Mar 2008, Spiros Papadimitriou wrote:
>>
>>> Hi
>>>
>>> I'd be interested to know if you've tried to use Hadoop for a  
>>> large number
>>> of short jobs.  Perhaps I am missing something, but I've found  
>>> that the
>>> hardcoded Thread.sleep() calls, esp. those for 5 seconds in
>>> mapred.ReduceTask (primarily) and mapred.JobClient, cause more of  
>>> a problem
>>> than the 0.3 sec or so that it takes to fire up a JVM.
>>>
>>> Agreed that for long running jobs that is not a concern, but *if*  
>>> we'd want
>>> to speed things up for shorter running jobs  (say < 1 min) is a  
>>> goal, then
>>> JVM reuse would seem to be a lower priority?  Would doing  
>>> something about
>>> those sleep()s seem worthwhile?
>

Re: Bugs in 0.16.0?

Posted by Spiros Papadimitriou <sp...@gmail.com>.

Thanks both for the reply.

I understand that the primary focus is long-running jobs (as least for now)
but that was not my question.  However, if someone wanted map-reduce on
shorter jobs (I would not go as far as calling them "real time" -- that's a
different story), I've found that Hadoop works pretty well, _except_ for
that bit there.  So, since I have very limited experience with Hadoop
internals, I was asking if that seems like a reasonable starting point to
accomodate short jobs as well -- or if there might be something more
important that I'm missing.

More specifically, we have some jobs that can be expressed as multiple
map-reduce passes over a not-so-big (i.e., O(10GB) but not O(TB++))
dataset.  We still scan TBs (so it's not really "real time" response that we
expect), but that's scanning 10-100x the same 10GB data with 10-100
map-reduce jobs -- for example, think k-means with each iteration == one
map-reduce job.  With a moderate number of nodes (say O(50)), each pass
takes under a minute.  However, if we have to pay a penalty of 5+ sec for
each pass, scalability suffers.  In some sense, this is no different that
the JVM penalty per TIP: I was merely pointing out that, in the setting I'm
considering, that sleep penalty is no longer paid once...

Yes -- you are correct, the primary problem was that sleep:
  if (numInFlight == 0 && numScheduled == 0) {
    // we should indicate progress as we don't want TT to think we're stuck
and kill us
    reporter.progress();
    Thread.sleep(5000);
  }
in ReduceTask.java.  BTW, I tried reducing this to 0.5sec and then found
other sleeps that were affecting scalability.  I'm suspecting the
JobClient#waitforCompletion() next, but I thought I'd better ask first about
removing those bandaids rather than keep changing them (to re-use the
metaphor :-).  If you grep for hardcoded sleeps, there are about 33 of them.

I hope this makes some sense...

Thanks!
Spiros

On Mon, Mar 3, 2008 at 12:43 PM, Amar Kamat <am...@yahoo-inc.com> wrote:

> I guess the 5 sec you are talking about is when the shuffle phase has
> nothing to fetch. Seems like a heuristic to me. I guess its still there
> because no one raised any issue against it. Also that on an average
> there are lot of maps and hence the shuffle phase delay is governed by the
> network delay. Hence doesnt seem to be a big issue to me. Other sleeps are
> in
> the order of millisec (busy waiting). My comment was more on < 1 min jobs.
> But in general I think Adaptive delays will help. Feel free to raise an
> issue or comment on jira.
> Amar
>   On Mon, 3 Mar 2008, Ted Dunning wrote:
>
> >
> > Hard-coded delays in order to make a protocol work are almost never
> correct
> > in the long run.  This isn't a function of real-time or batch, it is
> simply
> > a matter of the fact that hard-coded delays don't scale correctly as
> problem
> > sizes/durations change.  *Adaptive* delays such a progressive back-off
> can
> > work correctly under scale changes, but *fixed* delays are almost never
> > correct.
> >
> > Delays may work as a band-aid in the short run, but eventually you have
> to
> > take the band-aid off.
> >
> >
> > On 3/3/08 8:46 AM, "Amar Kamat" <am...@yahoo-inc.com> wrote:
> >
> >> HADOOP is not meant for real time applications. Its more or less
> designed
> >> for long running applications like crawlers/indexers.
> >> Amar
> >> On Mon, 3 Mar 2008, Spiros Papadimitriou wrote:
> >>
> >>> Hi
> >>>
> >>> I'd be interested to know if you've tried to use Hadoop for a large
> number
> >>> of short jobs.  Perhaps I am missing something, but I've found that
> the
> >>> hardcoded Thread.sleep() calls, esp. those for 5 seconds in
> >>> mapred.ReduceTask (primarily) and mapred.JobClient, cause more of a
> problem
> >>> than the 0.3 sec or so that it takes to fire up a JVM.
> >>>
> >>> Agreed that for long running jobs that is not a concern, but *if* we'd
> want
> >>> to speed things up for shorter running jobs  (say < 1 min) is a goal,
> then
> >>> JVM reuse would seem to be a lower priority?  Would doing something
> about
> >>> those sleep()s seem worthwhile?
> >
> >
>

Re: Bugs in 0.16.0?

Posted by Amar Kamat <am...@yahoo-inc.com>.

I guess the 5 sec you are talking about is when the shuffle phase has 
nothing to fetch. Seems like a heuristic to me. I guess its still there 
because no one raised any issue against it. Also that on an average 
there are lot of maps and hence the shuffle phase delay is governed by the 
network delay. Hence doesnt seem to be a big issue to me. Other sleeps are in 
the order of millisec (busy waiting). My comment was more on < 1 min jobs. 
But in general I think Adaptive delays will help. Feel free to raise an 
issue or comment on jira.
Amar
  On Mon, 3 Mar 2008, Ted Dunning wrote:

>
> Hard-coded delays in order to make a protocol work are almost never correct
> in the long run.  This isn't a function of real-time or batch, it is simply
> a matter of the fact that hard-coded delays don't scale correctly as problem
> sizes/durations change.  *Adaptive* delays such a progressive back-off can
> work correctly under scale changes, but *fixed* delays are almost never
> correct.
>
> Delays may work as a band-aid in the short run, but eventually you have to
> take the band-aid off.
>
>
> On 3/3/08 8:46 AM, "Amar Kamat" <am...@yahoo-inc.com> wrote:
>
>> HADOOP is not meant for real time applications. Its more or less designed
>> for long running applications like crawlers/indexers.
>> Amar
>> On Mon, 3 Mar 2008, Spiros Papadimitriou wrote:
>>
>>> Hi
>>>
>>> I'd be interested to know if you've tried to use Hadoop for a large number
>>> of short jobs.  Perhaps I am missing something, but I've found that the
>>> hardcoded Thread.sleep() calls, esp. those for 5 seconds in
>>> mapred.ReduceTask (primarily) and mapred.JobClient, cause more of a problem
>>> than the 0.3 sec or so that it takes to fire up a JVM.
>>>
>>> Agreed that for long running jobs that is not a concern, but *if* we'd want
>>> to speed things up for shorter running jobs  (say < 1 min) is a goal, then
>>> JVM reuse would seem to be a lower priority?  Would doing something about
>>> those sleep()s seem worthwhile?
>
>

Re: Bugs in 0.16.0?

Posted by Ted Dunning <td...@veoh.com>.

Hard-coded delays in order to make a protocol work are almost never correct
in the long run.  This isn't a function of real-time or batch, it is simply
a matter of the fact that hard-coded delays don't scale correctly as problem
sizes/durations change.  *Adaptive* delays such a progressive back-off can
work correctly under scale changes, but *fixed* delays are almost never
correct. 

Delays may work as a band-aid in the short run, but eventually you have to
take the band-aid off.

On 3/3/08 8:46 AM, "Amar Kamat" <am...@yahoo-inc.com> wrote:

> HADOOP is not meant for real time applications. Its more or less designed
> for long running applications like crawlers/indexers.
> Amar
> On Mon, 3 Mar 2008, Spiros Papadimitriou wrote:
> 
>> Hi
>> 
>> I'd be interested to know if you've tried to use Hadoop for a large number
>> of short jobs.  Perhaps I am missing something, but I've found that the
>> hardcoded Thread.sleep() calls, esp. those for 5 seconds in
>> mapred.ReduceTask (primarily) and mapred.JobClient, cause more of a problem
>> than the 0.3 sec or so that it takes to fire up a JVM.
>> 
>> Agreed that for long running jobs that is not a concern, but *if* we'd want
>> to speed things up for shorter running jobs  (say < 1 min) is a goal, then
>> JVM reuse would seem to be a lower priority?  Would doing something about
>> those sleep()s seem worthwhile?

Re: Bugs in 0.16.0?

Posted by Amar Kamat <am...@yahoo-inc.com>.

HADOOP is not meant for real time applications. Its more or less designed 
for long running applications like crawlers/indexers.
Amar
On Mon, 3 Mar 2008, Spiros Papadimitriou wrote:

> Hi
>
> I'd be interested to know if you've tried to use Hadoop for a large number
> of short jobs.  Perhaps I am missing something, but I've found that the
> hardcoded Thread.sleep() calls, esp. those for 5 seconds in
> mapred.ReduceTask (primarily) and mapred.JobClient, cause more of a problem
> than the 0.3 sec or so that it takes to fire up a JVM.
>
> Agreed that for long running jobs that is not a concern, but *if* we'd want
> to speed things up for shorter running jobs  (say < 1 min) is a goal, then
> JVM reuse would seem to be a lower priority?  Would doing something about
> those sleep()s seem worthwhile?
>
> Thanks,
> Spiros
>
> On Sat, Mar 1, 2008 at 4:33 PM, Owen O'Malley <oo...@yahoo-inc.com> wrote:
>
>>
>> On Mar 1, 2008, at 12:05 PM, Amar Kamat wrote:
>>
>>>> 3) Lastly, it would seem beneficial for jobs that have significant
>>>> startup overhead and memory requirements to not be run in separate
>>>> JVMs for each task.  Along these lines, it looks like someone
>>>> submitted a patch for JVM-reuse a while back, but it wasn't
>>>> commited? https://issues.apache.org/jira/browse/HADOOP-249
>>
>> Most of the ideas in the patch for 249 were committed as other
>> patches, but that bug has been left open precisely because the idea
>> still has merit. The patch was never stable enough to commit and now
>> is hopelessly out of date. There are lots of little issues that would
>> need to be addressed for this to happen.
>>
>>>> Probably a question for the dev mailing list, but if I wanted to
>>>> modify hadoop to allow threading tasks, rather than running
>>>> independent JVMs, is there any reason someone hasn't done this
>>>> yet?  Or am I overlooking something?
>>> This is done to keep user code separate from the framework code.
>>
>> Precisely. We don't want to go through the security manager in the
>> servers, so it is far easier to keep user code out of the servers.
>>
>>> So if the user code develops a fault the framework and rest of the
>>> jobs function normally. Most of the jobs have a longer run time and
>>> hence the startup time is never a concern.
>>
>> As long as the tasks belong to the same job (and therefore user),
>> sharing a jvm should be fine. One concern is that currently each task
>> gets its own working directory. Since Java can't change working
>> directory in a running process, it would have to clean up the working
>> directory. That will interact badly with debugging settings that let
>> you keep the task files. However, as we speed things up, it will
>> become more important. Already we are starting to see sort maps that
>> finish in 17 seconds,  which means the 1 second of jvm startup is a
>> 5% overhead...
>>
>> -- Owen
>>
>>
>

Re: Bugs in 0.16.0?

Posted by Spiros Papadimitriou <sp...@gmail.com>.

Hi

I'd be interested to know if you've tried to use Hadoop for a large number
of short jobs.  Perhaps I am missing something, but I've found that the
hardcoded Thread.sleep() calls, esp. those for 5 seconds in
mapred.ReduceTask (primarily) and mapred.JobClient, cause more of a problem
than the 0.3 sec or so that it takes to fire up a JVM.

Agreed that for long running jobs that is not a concern, but *if* we'd want
to speed things up for shorter running jobs  (say < 1 min) is a goal, then
JVM reuse would seem to be a lower priority?  Would doing something about
those sleep()s seem worthwhile?

Thanks,
Spiros

On Sat, Mar 1, 2008 at 4:33 PM, Owen O'Malley <oo...@yahoo-inc.com> wrote:

>
> On Mar 1, 2008, at 12:05 PM, Amar Kamat wrote:
>
> >> 3) Lastly, it would seem beneficial for jobs that have significant
> >> startup overhead and memory requirements to not be run in separate
> >> JVMs for each task.  Along these lines, it looks like someone
> >> submitted a patch for JVM-reuse a while back, but it wasn't
> >> commited? https://issues.apache.org/jira/browse/HADOOP-249
>
> Most of the ideas in the patch for 249 were committed as other
> patches, but that bug has been left open precisely because the idea
> still has merit. The patch was never stable enough to commit and now
> is hopelessly out of date. There are lots of little issues that would
> need to be addressed for this to happen.
>
> >> Probably a question for the dev mailing list, but if I wanted to
> >> modify hadoop to allow threading tasks, rather than running
> >> independent JVMs, is there any reason someone hasn't done this
> >> yet?  Or am I overlooking something?
> > This is done to keep user code separate from the framework code.
>
> Precisely. We don't want to go through the security manager in the
> servers, so it is far easier to keep user code out of the servers.
>
> > So if the user code develops a fault the framework and rest of the
> > jobs function normally. Most of the jobs have a longer run time and
> > hence the startup time is never a concern.
>
> As long as the tasks belong to the same job (and therefore user),
> sharing a jvm should be fine. One concern is that currently each task
> gets its own working directory. Since Java can't change working
> directory in a running process, it would have to clean up the working
> directory. That will interact badly with debugging settings that let
> you keep the task files. However, as we speed things up, it will
> become more important. Already we are starting to see sort maps that
> finish in 17 seconds,  which means the 1 second of jvm startup is a
> 5% overhead...
>
> -- Owen
>
>

Re: Bugs in 0.16.0?

Posted by Owen O'Malley <oo...@yahoo-inc.com>.

On Mar 1, 2008, at 12:05 PM, Amar Kamat wrote:

>> 3) Lastly, it would seem beneficial for jobs that have significant  
>> startup overhead and memory requirements to not be run in separate  
>> JVMs for each task.  Along these lines, it looks like someone  
>> submitted a patch for JVM-reuse a while back, but it wasn't  
>> commited? https://issues.apache.org/jira/browse/HADOOP-249

Most of the ideas in the patch for 249 were committed as other  
patches, but that bug has been left open precisely because the idea  
still has merit. The patch was never stable enough to commit and now  
is hopelessly out of date. There are lots of little issues that would  
need to be addressed for this to happen.

>> Probably a question for the dev mailing list, but if I wanted to  
>> modify hadoop to allow threading tasks, rather than running  
>> independent JVMs, is there any reason someone hasn't done this  
>> yet?  Or am I overlooking something?
> This is done to keep user code separate from the framework code.

Precisely. We don't want to go through the security manager in the  
servers, so it is far easier to keep user code out of the servers.

> So if the user code develops a fault the framework and rest of the  
> jobs function normally. Most of the jobs have a longer run time and  
> hence the startup time is never a concern.

As long as the tasks belong to the same job (and therefore user),  
sharing a jvm should be fine. One concern is that currently each task  
gets its own working directory. Since Java can't change working  
directory in a running process, it would have to clean up the working  
directory. That will interact badly with debugging settings that let  
you keep the task files. However, as we speed things up, it will  
become more important. Already we are starting to see sort maps that  
finish in 17 seconds,  which means the 1 second of jvm startup is a  
5% overhead...

-- Owen

Re: Bugs in 0.16.0?

Posted by Amar Kamat <am...@yahoo-inc.com>.

On Sat, 1 Mar 2008, Holden Robbins wrote:
Find the comments inline.
Amar
> Hello,
>
> I'm just starting to dig into Hadoop and testing it's feasibility for large scale development work.
> I was wondering if anyone else being affected by these issues using hadoop 0.16.0?
> I searched Jira, and I'm not sure if I saw anything that specifically fit some of these:
>
> 1) The symlinks for the distributed cache in the task directory are being created as 'null' directory links (stated another way, the name of the symbolic link in the directory is the string literal "null").  Am I doing something wrong to cause this, or do not many people use this functionality?
>
> 2) I'm running into an issue where the job is giving errors in the form:
> 08/03/01 09:44:25 INFO mapred.JobClient: Task Id : task_200803010908_0001_r_000002_0, Status : FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 08/03/01 09:44:25 WARN mapred.JobClient: Error reading task outputGo-Box4
> 08/03/01 09:44:25 WARN mapred.JobClient: Error reading task outputGo-Box4
>
> The jobs appear to never finish the reducing once this happens.   The tasks themselves are long running map tasks (up to 10 minutes per input), as far as I understand from the Jira posts this  is related to the MAX_FAILED_UNIQUE_FETCHES being hard coded to 4?  Is there a known work around or fix in the pipeline?
What is the reducer's progress (percentage done)? Are you seeing this for 
all reducers or for some? Check the reducer logs and see if its the same 
host that is faulty. The issue here is that the reducer failed (atleast 
once) to fetch multiple map outputs.
>
> Possible related jira post: https://issues.apache.org/jira/browse/HADOOP-2220
> Improving the way the shuffling mechanism works may also help? https://issues.apache.org/jira/browse/HADOOP-1339
>
> I've tried setting:
> <property>
>  <name>mapred.reduce.copy.backoff</name>
>  <value>1440</value>
>  <description>The maximum amount of time (in seconds) a reducer spends on  fetching one map output before declaring it as failed.</description>
> </property>
> which should be 24 minutes, with no effect.
>
This parameter is for declaring a map output as a failure at a reducer 
and is taken as a hint for re-execution by the job tracker. This should 
not change anything except it will delay the hints. So HADOOP should 
start speculating this reducer. It will try couple of time before giving up.
>
> 3) Lastly, it would seem beneficial for jobs that have significant startup overhead and memory requirements to not be run in separate JVMs for each task.  Along these lines, it looks like someone submitted a patch for JVM-reuse a while back, but it wasn't commited? https://issues.apache.org/jira/browse/HADOOP-249
>
> Probably a question for the dev mailing list, but if I wanted to modify hadoop to allow threading tasks, rather than running independent JVMs, is there any reason someone hasn't done this yet?  Or am I overlooking something?
This is done to keep user code separate from the framework code. So if the 
user code develops a fault the framework and rest of the jobs function 
normally. Most of the jobs have a longer run time and hence the startup 
time is never a concern.
>
>
> Thanks,
> -Holden
>