You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by hc busy <hc...@gmail.com> on 2008/12/11 01:37:49 UTC

hadoop mapper 100% but cannot complete?

Guys, I've just configured a hadoop cluster for the first time, and I'm
running a null map-reduction over the streaming interface. (/bin/cat for
both map and reducer). So I noticed that the mapper and reducer complete
100% in the web ui within a reasonable amount of time, but the job does not
complete. On command line it displays

...INFO streaming.StreamJob: map 100% reduce 100%

In the web ui, it shows map completion graph is 100%, but does not display a
reduce completion graph. The four machines are well equiped to handle the
size of data (30gb). Looking at the task tracker on each of the machines, I
noticed that it is ticking down the percents very very slowly:

2008-12-10 16:18:55,265 INFO org.apache.hadoop.mapred.TaskTracker:
task_200812101532_0001_r_000002_0 46.684883% Records R/W=149326846/149326834
> reduce
2008-12-10 16:18:57,055 INFO org.apache.hadoop.mapred.TaskTracker:
task_200812101532_0001_r_000006_0 47.566963% Records R/W=151739348/151739342
> reduce
2008-12-10 16:18:58,268 INFO org.apache.hadoop.mapred.TaskTracker:
task_200812101532_0001_r_000002_0 46.826576% Records R/W=149326846/149326834
> reduce
2008-12-10 16:19:00,058 INFO org.apache.hadoop.mapred.TaskTracker:
task_200812101532_0001_r_000006_0 47.741756% Records R/W=153377016/153376990
> reduce
2008-12-10 16:19:01,271 INFO org.apache.hadoop.mapred.TaskTracker:
task_200812101532_0001_r_000002_0 46.9636% Records R/W=149326846/149326834 >
reduce
2008-12-10 16:19:03,061 INFO org.apache.hadoop.mapred.TaskTracker:
task_200812101532_0001_r_000006_0 47.94259% Records R/W=153377016/153376990
> reduce
2008-12-10 16:19:04,274 INFO org.apache.hadoop.mapred.TaskTracker:
task_200812101532_0001_r_000002_0 47.110992% Records R/W=150960648/150960644
> reduce

so it would continue like this for hours and hours. What buffer am I setting
too small, or what could possiblly make it go so slow?? I've worked on
hadoop clusters before and it had always performed great on similar sized or
larger data sets, so I suspect it's just a configuration some where that is
making it do this?

thanks in advance.

Re: hadoop mapper 100% but cannot complete?

Posted by hc busy <hc...@gmail.com>.

ahhh, apologies for badmouthing hadoop...

So, I finally disocvered one problem that may have caused this kind of
degradation of performance. After growing the data set even larger to 35gb,
Hadoop crashed with disk full error. It would appear that the system will
actually continue to work when the disk is almost full, but there appear to
be some thing that causes it to slow down. Does hdfs juggle blocks around
when there isn't enough space on a slave machine? That would explain why it
was slowing down so much when the fs is almost full... Another part of this
is that I've updated by expectation of the speedup, accounting for the sort
that is happening, it is indeed faster.

I've upgraded to 0.18.2, and I now see the exception that is slowing down
the reducer near the end of the run(pasted below),any suggestions on this
one?

  at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

2008-12-12 12:06:28,403 WARN /:
/mapOutput?job=job_200812121139_0001&map=attempt_200812121139_0001_m_000114_0&reduce=2:
java.lang.IllegalStateException: Committed
  at
org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212)
  at
org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375)
  at
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2504)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
  at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
  at
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
  at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
  at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
  at
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
  at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
  at org.mortbay.http.HttpServer.service(HttpServer.java:954)
  at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
  at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
  at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
  at
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
  at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
  at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
2008-12-12 12:06:31,107 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200812121139_0001_r_000007_0 0.29801327% reduce > copy (135 of 151
at 1.42 MB/s) >


On Thu, Dec 11, 2008 at 12:13 PM, hc busy <hc...@gmail.com> wrote:

> And aside from refusing to declare task complete after everything is 100%,
> I also notied that the mapper seems too slow. It's taking the same amount of
> time for 4 machines to read and write through the 30gb file as if I did it
> with a /bin/cat on one machine. Do you guys have any suggestions with
> regards to these two problems?
> On Wed, Dec 10, 2008 at 4:37 PM, hc busy <hc...@gmail.com> wrote:
>
>> Guys, I've just configured a hadoop cluster for the first time, and I'm
>> running a null map-reduction over the streaming interface. (/bin/cat for
>> both map and reducer). So I noticed that the mapper and reducer complete
>> 100% in the web ui within a reasonable amount of time, but the job does not
>> complete. On command line it displays
>>
>> ...INFO streaming.StreamJob: map 100% reduce 100%
>>
>> In the web ui, it shows map completion graph is 100%, but does not display
>> a reduce completion graph. The four machines are well equiped to handle the
>> size of data (30gb). Looking at the task tracker on each of the machines, I
>> noticed that it is ticking down the percents very very slowly:
>>
>> 2008-12-10 16:18:55,265 INFO org.apache.hadoop.mapred.TaskTracker:
>> task_200812101532_0001_r_000002_0 46.684883% Records R/W=149326846/149326834
>> > reduce
>> 2008-12-10 16:18:57,055 INFO org.apache.hadoop.mapred.TaskTracker:
>> task_200812101532_0001_r_000006_0 47.566963% Records R/W=151739348/151739342
>> > reduce
>> 2008-12-10 16:18:58,268 INFO org.apache.hadoop.mapred.TaskTracker:
>> task_200812101532_0001_r_000002_0 46.826576% Records R/W=149326846/149326834
>> > reduce
>> 2008-12-10 16:19:00,058 INFO org.apache.hadoop.mapred.TaskTracker:
>> task_200812101532_0001_r_000006_0 47.741756% Records R/W=153377016/153376990
>> > reduce
>> 2008-12-10 16:19:01,271 INFO org.apache.hadoop.mapred.TaskTracker:
>> task_200812101532_0001_r_000002_0 46.9636% Records R/W=149326846/149326834 >
>> reduce
>> 2008-12-10 16:19:03,061 INFO org.apache.hadoop.mapred.TaskTracker:
>> task_200812101532_0001_r_000006_0 47.94259% Records R/W=153377016/153376990
>> > reduce
>> 2008-12-10 16:19:04,274 INFO org.apache.hadoop.mapred.TaskTracker:
>> task_200812101532_0001_r_000002_0 47.110992% Records R/W=150960648/150960644
>> > reduce
>>
>> so it would continue like this for hours and hours. What buffer am I
>> setting too small, or what could possiblly make it go so slow?? I've worked
>> on hadoop clusters before and it had always performed great on similar sized
>> or larger data sets, so I suspect it's just a configuration some where that
>> is making it do this?
>>
>> thanks in advance.
>>
>>
>>
>

Re: hadoop mapper 100% but cannot complete?

Posted by hc busy <hc...@gmail.com>.

And aside from refusing to declare task complete after everything is 100%, I
also notied that the mapper seems too slow. It's taking the same amount of
time for 4 machines to read and write through the 30gb file as if I did it
with a /bin/cat on one machine. Do you guys have any suggestions with
regards to these two problems?

On Wed, Dec 10, 2008 at 4:37 PM, hc busy <hc...@gmail.com> wrote:

> Guys, I've just configured a hadoop cluster for the first time, and I'm
> running a null map-reduction over the streaming interface. (/bin/cat for
> both map and reducer). So I noticed that the mapper and reducer complete
> 100% in the web ui within a reasonable amount of time, but the job does not
> complete. On command line it displays
>
> ...INFO streaming.StreamJob: map 100% reduce 100%
>
> In the web ui, it shows map completion graph is 100%, but does not display
> a reduce completion graph. The four machines are well equiped to handle the
> size of data (30gb). Looking at the task tracker on each of the machines, I
> noticed that it is ticking down the percents very very slowly:
>
> 2008-12-10 16:18:55,265 INFO org.apache.hadoop.mapred.TaskTracker:
> task_200812101532_0001_r_000002_0 46.684883% Records R/W=149326846/149326834
> > reduce
> 2008-12-10 16:18:57,055 INFO org.apache.hadoop.mapred.TaskTracker:
> task_200812101532_0001_r_000006_0 47.566963% Records R/W=151739348/151739342
> > reduce
> 2008-12-10 16:18:58,268 INFO org.apache.hadoop.mapred.TaskTracker:
> task_200812101532_0001_r_000002_0 46.826576% Records R/W=149326846/149326834
> > reduce
> 2008-12-10 16:19:00,058 INFO org.apache.hadoop.mapred.TaskTracker:
> task_200812101532_0001_r_000006_0 47.741756% Records R/W=153377016/153376990
> > reduce
> 2008-12-10 16:19:01,271 INFO org.apache.hadoop.mapred.TaskTracker:
> task_200812101532_0001_r_000002_0 46.9636% Records R/W=149326846/149326834 >
> reduce
> 2008-12-10 16:19:03,061 INFO org.apache.hadoop.mapred.TaskTracker:
> task_200812101532_0001_r_000006_0 47.94259% Records R/W=153377016/153376990
> > reduce
> 2008-12-10 16:19:04,274 INFO org.apache.hadoop.mapred.TaskTracker:
> task_200812101532_0001_r_000002_0 47.110992% Records R/W=150960648/150960644
> > reduce
>
> so it would continue like this for hours and hours. What buffer am I
> setting too small, or what could possiblly make it go so slow?? I've worked
> on hadoop clusters before and it had always performed great on similar sized
> or larger data sets, so I suspect it's just a configuration some where that
> is making it do this?
>
> thanks in advance.
>
>
>