You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by "Zhang Bingjun (Eddy)" <ed...@gmail.com> on 2009/11/02 09:32:46 UTC

too many 100% mapper does not complete / finish / commit

Dear hadoop fellows,

We have been using Hadoop-0.20.1 MapReduce to crawl some web data. In this
case, we only have mappers to crawl data and save data into HDFS in a
distributed way. No reducers is specified in the job conf.

The problem is that for every job we have about one third mappers stuck with
100% progress but never complete. If we look at the the tasktracker log of
those mappers, the last log was the key input INFO log line and no others
logs were output after that.

>From the stdout log of a specific attempt of one of those mappers, we can
see that the map function of the mapper has been finished completely and the
control of the execution should be somewhere in the MapReduce framework
part.

Does anyone have any clue about this problem? Is it because we didn't use
any reducers? Since two thirds of the mappers could complete successfully
and commit their output data into HDFS, I suspect the stuck mappers has
something to do with the MapReduce framework code?

Any input will be appreciated. Thanks a lot!

Best regards,
Zhang Bingjun (Eddy)

E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
Tel No: +65-96188110 (M)

Re: too many 100% mapper does not complete / finish / commit

Posted by Amandeep Khurana <am...@gmail.com>.

Did you try to add any logging and see what keys are they getting stuck on
or whats the last keys it processed? Do the same number of mappers get stuck
every time?

Not having reducers is not a problem. Its pretty normal to do that.

On Mon, Nov 2, 2009 at 12:32 AM, Zhang Bingjun (Eddy) <ed...@gmail.com>wrote:

> Dear hadoop fellows,
>
> We have been using Hadoop-0.20.1 MapReduce to crawl some web data. In this
> case, we only have mappers to crawl data and save data into HDFS in a
> distributed way. No reducers is specified in the job conf.
>
> The problem is that for every job we have about one third mappers stuck
> with
> 100% progress but never complete. If we look at the the tasktracker log of
> those mappers, the last log was the key input INFO log line and no others
> logs were output after that.
>
> From the stdout log of a specific attempt of one of those mappers, we can
> see that the map function of the mapper has been finished completely and
> the
> control of the execution should be somewhere in the MapReduce framework
> part.
>
> Does anyone have any clue about this problem? Is it because we didn't use
> any reducers? Since two thirds of the mappers could complete successfully
> and commit their output data into HDFS, I suspect the stuck mappers has
> something to do with the MapReduce framework code?
>
> Any input will be appreciated. Thanks a lot!
>
> Best regards,
> Zhang Bingjun (Eddy)
>
> E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
> Tel No: +65-96188110 (M)
>

Re: too many 100% mapper does not complete / finish / commit

Posted by Jason Venner <ja...@gmail.com>.

Nominally, when the map is done, the close is fired, and all framework
opened output files are flushed and the task waits for all of the ack's from
the block hosting datanodes, then the output committer stages files into the
task output directory.

It sounds like there may be an issue with the close when your output has
exactly 1 full block of data buffered.

On Mon, Nov 2, 2009 at 4:20 AM, Amandeep Khurana <am...@gmail.com> wrote:

> inline
>
> On Mon, Nov 2, 2009 at 3:15 AM, Zhang Bingjun (Eddy) <eddymier@gmail.com
> >wrote:
>
> > Dear Khurana,
> >
> > We didn't use MapRunnable. In stead, we used directly the package
> > org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper and passed our
> > normal Mapper Class to it using its getMapperClass() interface. We set
> the
> > number of threads using its setNumberOfThreads(). Is this one correct way
> > of
> > doing multithreaded mapper?
> >
>
> I was just curious on how you did it. This is the right way afaik
>
>
> >
> > We noticed in hadoop-0.20.1 there is another
> > MultithreadedMapper,
> org.apache.hadoop.mapred.lib.map.MultithreadedMapper,
> > but we didn't touch it.
> >
>
> Thats the deprecated package. You used the correct one.
>
>
> >
> > It might be the reason that some thread didn't return. We need to do some
> > work to confirm that. We will also try to enable DEBUG mode of hadoop.
> > Could
> > you share some info on starting an hadoop deamon or the whole hadoop
> > cluster
> > in debug mode?
> >
>
> You'll have to edit the log4jproperties file in $HADOOP_HOME/conf/
> After editing, you'll have to restart the daemons (or the entire cluster).
>
> The DEBUG logs might give some more info of whats happening.
>
>
> >
> > Thanks a lot!
> >
> > Best regards,
> > Zhang Bingjun (Eddy)
> >
> > E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
> > Tel No: +65-96188110 (M)
> >
> >
> > On Mon, Nov 2, 2009 at 6:58 PM, Zhang Bingjun (Eddy) <eddymier@gmail.com
> > >wrote:
> >
> > > Hi all,
> > >
> > > An important observation. The 100% mapper without completion all have
> > > temporary files of 64MB exactly, which means the output of the mapper
> is
> > cut
> > > off at the block boundary. However, we do have some successfully
> > completed
> > > mappers having output files larger than 64MB and we also have less than
> > 100%
> > > mappers have temporary files larger than 64MB.
> > >
> > > Here is the info returned by "hadoop fs -ls
> > >
> >
> /hadoop/music/track/audio/track_1/_temporary/_attempt_200911021416_0001_m_000091_0
> > > -rw-r--r--   3 hadoop supergroup   67108864 2009-11-02 14:29
> > >
> >
> /hadoop/music/track/audio/track_1/_temporary/_attempt_200911021416_0001_m_000091_0/part-m-00091
> > >
> > > This is the temporary file of a 100% mapper without completion.
> > >
> > > Any clues on this?
> > >
> > > Best regards,
> > > Zhang Bingjun (Eddy)
> > >
> > > E-mail: eddymier@gmail.com, bingjun@nus.edu.sg,
> bingjun@comp.nus.edu.sg
> > > Tel No: +65-96188110 (M)
> > >
> > >
> > > On Mon, Nov 2, 2009 at 6:52 PM, Amandeep Khurana <am...@gmail.com>
> > wrote:
> > >
> > >> On Mon, Nov 2, 2009 at 2:40 AM, Zhang Bingjun (Eddy) <
> > eddymier@gmail.com
> > >> >wrote:
> > >>
> > >> > Hi Pallavi, Khurana, and Vasekar,
> > >> >
> > >> > Thanks a lot for your reply. To make up, the mapper we are using is
> > the
> > >> > multithreaded mapper.
> > >> >
> > >>
> > >> How are you doing this? Did you your own MapRunnable?
> > >>
> > >>
> > >
> > >> >
> > >> > To answer your questions:
> > >> >
> > >> > Pallavi, Khurana: I have checked the logs. The key it got stuck on
> is
> > >> the
> > >> > last key it reads in. Since the progress is 100% I suppose the key
> is
> > >> the
> > >> > last key? From the stdout log of our mapper, we are confirmed that
> the
> > >> map
> > >> > function of the mapper has completed. After that, no more key was
> read
> > >> in
> > >> > and no other progress is made by the mapper, which means it didn't
> > >> complete
> > >> > / commit being 100%. For each job, we have different number of
> mapper
> > >> got
> > >> > stuck. But it is roughly about one third to half mappers. From the
> > >> stdout
> > >> > logs of our mapper, we are also confirmed that the map function of
> the
> > >> > mapper has finished. That's why we started to suspect the MapReduce
> > >> > framework has something to do with the stuck problem.
> > >> >
> > >> > Here is log from the stdout:
> > >> > [entry] [293419] <track><name>i bealive</name><artist>Simian Mobile
> > >> > Disco</artist></track>
> > >> > [0] [293419] start creating objects
> > >> > [1] [293419] start parsing xml
> > >> > [2] [293419] start updating data
> > >> > [sleep] [228312]
> > >> > [error] [228312] java.io.IOException: [error] [228312] reaches the
> > >> maximum
> > >> > number of attempts whiling updating
> > >> > [3] [228312] start collecting output228312
> > >> > [3.1 done with null] [228312] done228312
> > >> > [fail] [228312] java.io.IOException: 3.1 throw null228312
> > >> > [done] [228312] done228312
> > >> > [sleep] [293419]
> > >> > [error] [293419] java.io.IOException: [error] [293419] reaches the
> > >> maximum
> > >> > number of attempts whiling updating
> > >> > [3] [293419] start collecting output293419
> > >> > [3.1 done with null] [293419] done293419
> > >> > [fail] [293419] java.io.IOException: 3.1 throw null293419
> > >> > [done] [293419] done293419
> > >> >
> > >> > Here is the log from tasktracker:
> > >> > 2009-11-02 16:58:23,518 INFO org.apache.hadoop.mapred.TaskTracker:
> > >> > attempt_200911021416_0001_m_000047_1 1.0% name: æ¢Ÿ artist: Plastic
> > Tree
> > >> > 2009-11-02 16:58:50,527 INFO org.apache.hadoop.mapred.TaskTracker:
> > >> > attempt_200911021416_0001_m_000047_1 1.0% name: Zydeko artist:
> Cirque
> > du
> > >> > Soleil
> > >> > 2009-11-02 16:59:23,539 INFO org.apache.hadoop.mapred.TaskTracker:
> > >> > attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ieartist:
> > >> > www.China.ie
> > >> > 2009-11-02 16:59:50,550 INFO org.apache.hadoop.mapred.TaskTracker:
> > >> > attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ieartist:
> > >> > www.China.ie
> > >> > 2009-11-02 17:00:11,560 INFO org.apache.hadoop.mapred.TaskTracker:
> > >> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist:
> > Simian
> > >> > Mobile Disco
> > >> > 2009-11-02 17:00:23,565 INFO org.apache.hadoop.mapred.TaskTracker:
> > >> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist:
> > Simian
> > >> > Mobile Disco
> > >> > 2009-11-02 17:01:11,585 INFO org.apache.hadoop.mapred.TaskTracker:
> > >> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist:
> > Simian
> > >> > Mobile Disco
> > >> >
> > >> > From these logs, we can see that the last read in entry is "i
> bealive
> > >> > artist: Simian Mobile Disco" the last process entry in the mapper is
> > the
> > >> > same as this entry and from the stdout log, we can see the map
> > function
> > >> has
> > >> > finished....
> > >> >
> > >>
> > >> Put some stdout or logging code towards the end of the mapper and also
> > >> check
> > >> if all threads are coming back. Do you think it could be some issue
> with
> > >> the
> > >> threads?
> > >>
> > >>
> > >> > Vasekar: The HDFS is healthy. We didn't store too many small files
> in
> > it
> > >> > yet. The return of command "hadoop fsck /" is like follows:
> > >> > Total size:    89114318394 B (Total open files size: 19845943808 B)
> > >> >  Total dirs:    430
> > >> >  Total files:   1761 (Files currently being written: 137)
> > >> >  Total blocks (validated):      2691 (avg. block size 33115688 B)
> > (Total
> > >> > open file blocks (not validated): 309)
> > >> >  Minimally replicated blocks:   2691 (100.0 %)
> > >> >  Over-replicated blocks:        0 (0.0 %)
> > >> >  Under-replicated blocks:       0 (0.0 %)
> > >> >  Mis-replicated blocks:         0 (0.0 %)
> > >> >  Default replication factor:    3
> > >> >  Average block replication:     2.802304
> > >> >  Corrupt blocks:                0
> > >> >  Missing replicas:              0 (0.0 %)
> > >> >  Number of data-nodes:          76
> > >> >  Number of racks:               1
> > >> >
> > >> > Is this problem possibly due to the stuck communication between the
> > >> actual
> > >> > task (the mapper) and the tasktracker? From the logs, we cannot see
> > >> > anything
> > >> > after the stuck.
> > >> >
> > >>
> > >> The TT and JT logs would show if there is a lost communication. Enable
> > >> DEBUG
> > >> logging for the processes and keep a tab.
> > >>
> > >>
> > >> >
> > >> >
> > >> > fromAmandeep Khurana <am...@gmail.com>
> > >> > reply-tocommon-user@hadoop.apache.org
> > >> > tocommon-user@hadoop.apache.org
> > >> > dateMon, Nov 2, 2009 at 4:36 PMsubjectRe: too many 100% mapper does
> > not
> > >> > complete / finish / commitmailing list<
> common-user.hadoop.apache.org>
> > >> > Filter
> > >> > messages from this mailing
> > >> > listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
> > >> > from this mailing-list
> > >> > hide details 4:36 PM (1 hour ago)
> > >> > Did you try to add any logging and see what keys are they getting
> > stuck
> > >> on
> > >> > or whats the last keys it processed? Do the same number of mappers
> get
> > >> > stuck
> > >> > every time?
> > >> >
> > >> > Not having reducers is not a problem. Its pretty normal to do that.
> > >> >
> > >> > fromAmogh Vasekar <am...@yahoo-inc.com>
> > >> > reply-tocommon-user@hadoop.apache.org
> > >> > to"common-user@hadoop.apache.org" <co...@hadoop.apache.org>
> > >> > dateMon, Nov 2, 2009 at 4:50 PMsubjectRe: too many 100% mapper does
> > not
> > >> > complete / finish / commitmailing list<
> common-user.hadoop.apache.org>
> > >> > Filter
> > >> > messages from this mailing
> > >> > listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
> > >> > from this mailing-list
> > >> > hide details 4:50 PM (1 hour ago)
> > >> >
> > >> > Hi,
> > >> > Quick questions...
> > >> > Are you creating too many small files?
> > >> > Are there any task side files being created?
> > >> > Is the heap for NN having enough space to list metadata? Any details
> > on
> > >> its
> > >> > general health will probably be helpful to people on the list.
> > >> >
> > >> > Amogh
> > >> > Best regards,
> > >> > Zhang Bingjun (Eddy)
> > >> >
> > >> > E-mail: eddymier@gmail.com, bingjun@nus.edu.sg,
> > bingjun@comp.nus.edu.sg
> > >> > Tel No: +65-96188110 (M)
> > >> >
> > >> >
> > >> > On Mon, Nov 2, 2009 at 4:51 PM, Palleti, Pallavi <
> > >> > pallavi.palleti@corp.aol.com> wrote:
> > >> >
> > >> > > Hi Eddy,
> > >> > >
> > >> > > I faced similar issue when I used pig script for fetching webpages
> > for
> > >> > > certain urls. I could see the map phase showing100% and it is
> still
> > >> > > running. As I was logging the page that it is currently fetching,
> I
> > >> > > could see the process hasn't yet finished. It might be the same
> > issue.
> > >> > > So, you can add logging to check whether it is actually stuck or
> the
> > >> > > process is still going on.
> > >> > >
> > >> > > Thanks
> > >> > > Pallavi
> > >> > >
> > >> > > ________________________________
> > >> > >
> > >> > > From: Zhang Bingjun (Eddy) [mailto:eddymier@gmail.com]
> > >> > > Sent: Monday, November 02, 2009 2:03 PM
> > >> > > To: common-user@hadoop.apache.org; hdfs-user@hadoop.apache.org;
> > >> > > mapreduce-user@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
> > >> > > Subject: too many 100% mapper does not complete / finish / commit
> > >> > >
> > >> > >
> > >> > > Dear hadoop fellows,
> > >> > >
> > >> > > We have been using Hadoop-0.20.1 MapReduce to crawl some web data.
> > In
> > >> > > this case, we only have mappers to crawl data and save data into
> > HDFS
> > >> in
> > >> > > a distributed way. No reducers is specified in the job conf.
> > >> > >
> > >> > > The problem is that for every job we have about one third mappers
> > >> stuck
> > >> > > with 100% progress but never complete. If we look at the the
> > >> tasktracker
> > >> > > log of those mappers, the last log was the key input INFO log line
> > and
> > >> > > no others logs were output after that.
> > >> > >
> > >> > > From the stdout log of a specific attempt of one of those mappers,
> > we
> > >> > > can see that the map function of the mapper has been finished
> > >> completely
> > >> > > and the control of the execution should be somewhere in the
> > MapReduce
> > >> > > framework part.
> > >> > >
> > >> > > Does anyone have any clue about this problem? Is it because we
> > didn't
> > >> > > use any reducers? Since two thirds of the mappers could complete
> > >> > > successfully and commit their output data into HDFS, I suspect the
> > >> stuck
> > >> > > mappers has something to do with the MapReduce framework code?
> > >> > >
> > >> > > Any input will be appreciated. Thanks a lot!
> > >> > >
> > >> > > Best regards,
> > >> > > Zhang Bingjun (Eddy)
> > >> > >
> > >> > > E-mail: eddymier@gmail.com, bingjun@nus.edu.sg,
> > >> bingjun@comp.nus.edu.sg
> > >> > > Tel No: +65-96188110 (M)
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals