You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by YouPeng Yang <yy...@gmail.com> on 2013/03/02 03:36:52 UTC

Re: map stucks at 99.99%

Hi Patai
   I found a similar explanation on the google mapreduce publication.

http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/zh-CN//archive/mapreduce-osdi04.pdf

   Please refere to the chapter:3.6 Backup Tasks

Hope to be helpful

regards



2013/3/1 Matt Davies <ma...@mattdavies.net>

> I've seen this before if the input data stream changes suddenly and does
> not lend itself to parallelization such as counting the number of tuples in
> a bag.
>
> One think that may be interesting are the job counters from a previous job
> vs this job that just completed.  Do they differ? Is there a particular
> mapper that seems to have counts that are way out of whack?
>
> Has someone tweaked the production job in one way or another?
>
>
>
>
> On Thu, Feb 28, 2013 at 1:28 PM, Patai Sangbutsarakum <
> silvianhadoop@gmail.com> wrote:
>
>> > What type of CPU is on the box ? load average seems pretty high for a
>> 8-core
>> > box.
>> Xeon 3.07GHz, 24 cores
>>
>> > Do you have ganglia on these boxes ? Is the load average always so high?
>> > What's the memory usage for the task and overall on the box ?
>> From top -p pid of the task
>> CPU 143.2%  MEM 1.7%
>> So, it is not mem dried up on her, cpu is pretty pecked.
>>
>> >
>> > How long has the map task been running in that stuck state ?
>> --> at least 2 hours.
>>
>>
>> It finally just finished after hours, it double on time used today.. T_T
>>
>>
>>
>>
>>
>>
>> On Thu, Feb 28, 2013 at 1:18 PM, Viral Bajaria <vi...@gmail.com>
>> wrote:
>> > What type of CPU is on the box ? load average seems pretty high for a
>> 8-core
>> > box. Do you have ganglia on these boxes ? Is the load average always so
>> high
>> > ? What's the memory usage for the task and overall on the box ?
>> >
>> > How long has the map task been running in that stuck state ? If it's
>> been a
>> > few minutes, I am surprised that the JT didn't try to run it on another
>> node
>> > or have you switched off speculative execution ?
>> >
>> > Sorry too many questions !!
>> >
>> > You can try jstack, jmap. That will atleast tell you about what's
>> getting
>> > blocked.
>> >
>> > On Thu, Feb 28, 2013 at 1:04 PM, Patai Sangbutsarakum
>> > <si...@gmail.com> wrote:
>> >>
>> >> - Check the box on which the task is running, is it under heavy load ?
>> >> Is there high amount of I/O wait ?
>> >> CPU, very warm load average: 47.47, 48.56, 49.00
>> >> I/O, chill on io 0.1x % on iowait, less than 20 tps, rarely upto
>> >> 100tps, on 10 disks jbod.
>> >>
>> >>
>> >> - You could check the task logs and see if they say anything about
>> >> what is going wrong ?
>> >> I would say no.. pretty much all of them is INFO
>> >>
>> >> - Did the task get pre-empted to other task trackers ? If yes, is it
>> >> stuck at the same spot on those ?
>> >> Nope.
>> >>
>> >> - What kind of work are you doing in the mapper ? Just reading from
>> >> HDFS and compute something or reading/writing from HBase ?
>> >> HDFS + compute, R/W
>> >> Absolutely no HBase.
>> >>
>> >> Would jstack, jmap be any useful ?
>> >>
>> >>
>> >> > - You could check the task logs and see if they say anything about
>> what
>> >> > is
>> >> > going wrong ?
>> >> > - Did the task get pre-empted to other task trackers ? If yes, is it
>> >> > stuck
>> >> > at the same spot on those ?
>> >> > - What kind of work are you doing in the mapper ? Just reading from
>> HDFS
>> >> > and
>> >> > compute something or reading/writing from HBase ?
>> >>
>> >> On Thu, Feb 28, 2013 at 12:25 PM, Viral Bajaria <
>> viral.bajaria@gmail.com>
>> >> wrote:
>> >> > You could start off doing the following:
>> >> >
>> >> > - Check the box on which the task is running, is it under heavy load
>> ?
>> >> > Is
>> >> > there high amount of I/O wait ?
>> >> > - You could check the task logs and see if they say anything about
>> what
>> >> > is
>> >> > going wrong ?
>> >> > - Did the task get pre-empted to other task trackers ? If yes, is it
>> >> > stuck
>> >> > at the same spot on those ?
>> >> > - What kind of work are you doing in the mapper ? Just reading from
>> HDFS
>> >> > and
>> >> > compute something or reading/writing from HBase ?
>> >> >
>> >> > Thanks,
>> >> > Viral
>> >> >
>> >> > On Thu, Feb 28, 2013 at 12:06 PM, Patai Sangbutsarakum
>> >> > <si...@gmail.com> wrote:
>> >> >>
>> >> >> Hadoopers!!
>> >> >>
>> >> >> Need input from you guys,
>> >> >> i am looking at a critical job in production. it stucks at 99.99% in
>> >> >> map phrase for much longer than it used to be..
>> >> >>
>> >> >> what to do to debug what is going on with those map why it is not
>> pass
>> >> >> through
>> >> >> even though tasks and task attempts saying 100% progress but there
>> is
>> >> >> not finish time...
>> >> >>
>> >> >> Please suggest
>> >> >> Patai
>> >> >
>> >> >
>> >
>> >
>>
>
>