You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "Zhang Bingjun (Eddy)" <ed...@gmail.com> on 2009/11/02 09:32:46 UTC

too many 100% mapper does not complete / finish / commit

Dear hadoop fellows,

We have been using Hadoop-0.20.1 MapReduce to crawl some web data. In this
case, we only have mappers to crawl data and save data into HDFS in a
distributed way. No reducers is specified in the job conf.

The problem is that for every job we have about one third mappers stuck with
100% progress but never complete. If we look at the the tasktracker log of
those mappers, the last log was the key input INFO log line and no others
logs were output after that.

>From the stdout log of a specific attempt of one of those mappers, we can
see that the map function of the mapper has been finished completely and the
control of the execution should be somewhere in the MapReduce framework
part.

Does anyone have any clue about this problem? Is it because we didn't use
any reducers? Since two thirds of the mappers could complete successfully
and commit their output data into HDFS, I suspect the stuck mappers has
something to do with the MapReduce framework code?

Any input will be appreciated. Thanks a lot!

Best regards,
Zhang Bingjun (Eddy)

E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
Tel No: +65-96188110 (M)

Re: too many 100% mapper does not complete / finish / commit

Posted by Amandeep Khurana <am...@gmail.com>.
Did you try to add any logging and see what keys are they getting stuck on
or whats the last keys it processed? Do the same number of mappers get stuck
every time?

Not having reducers is not a problem. Its pretty normal to do that.

On Mon, Nov 2, 2009 at 12:32 AM, Zhang Bingjun (Eddy) <ed...@gmail.com>wrote:

> Dear hadoop fellows,
>
> We have been using Hadoop-0.20.1 MapReduce to crawl some web data. In this
> case, we only have mappers to crawl data and save data into HDFS in a
> distributed way. No reducers is specified in the job conf.
>
> The problem is that for every job we have about one third mappers stuck
> with
> 100% progress but never complete. If we look at the the tasktracker log of
> those mappers, the last log was the key input INFO log line and no others
> logs were output after that.
>
> From the stdout log of a specific attempt of one of those mappers, we can
> see that the map function of the mapper has been finished completely and
> the
> control of the execution should be somewhere in the MapReduce framework
> part.
>
> Does anyone have any clue about this problem? Is it because we didn't use
> any reducers? Since two thirds of the mappers could complete successfully
> and commit their output data into HDFS, I suspect the stuck mappers has
> something to do with the MapReduce framework code?
>
> Any input will be appreciated. Thanks a lot!
>
> Best regards,
> Zhang Bingjun (Eddy)
>
> E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
> Tel No: +65-96188110 (M)
>

Re: too many 100% mapper does not complete / finish / commit

Posted by Jason Venner <ja...@gmail.com>.
Nominally, when the map is done, the close is fired, and all framework
opened output files are flushed and the task waits for all of the ack's from
the block hosting datanodes, then the output committer stages files into the
task output directory.

It sounds like there may be an issue with the close when your output has
exactly 1 full block of data buffered.

On Mon, Nov 2, 2009 at 4:20 AM, Amandeep Khurana <am...@gmail.com> wrote:

> inline
>
> On Mon, Nov 2, 2009 at 3:15 AM, Zhang Bingjun (Eddy) <eddymier@gmail.com
> >wrote:
>
> > Dear Khurana,
> >
> > We didn't use MapRunnable. In stead, we used directly the package
> > org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper and passed our
> > normal Mapper Class to it using its getMapperClass() interface. We set
> the
> > number of threads using its setNumberOfThreads(). Is this one correct way
> > of
> > doing multithreaded mapper?
> >
>
> I was just curious on how you did it. This is the right way afaik
>
>
> >
> > We noticed in hadoop-0.20.1 there is another
> > MultithreadedMapper,
> org.apache.hadoop.mapred.lib.map.MultithreadedMapper,
> > but we didn't touch it.
> >
>
> Thats the deprecated package. You used the correct one.
>
>
> >
> > It might be the reason that some thread didn't return. We need to do some
> > work to confirm that. We will also try to enable DEBUG mode of hadoop.
> > Could
> > you share some info on starting an hadoop deamon or the whole hadoop
> > cluster
> > in debug mode?
> >
>
> You'll have to edit the log4jproperties file in $HADOOP_HOME/conf/
> After editing, you'll have to restart the daemons (or the entire cluster).
>
> The DEBUG logs might give some more info of whats happening.
>
>
> >
> > Thanks a lot!
> >
> > Best regards,
> > Zhang Bingjun (Eddy)
> >
> > E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
> > Tel No: +65-96188110 (M)
> >
> >
> > On Mon, Nov 2, 2009 at 6:58 PM, Zhang Bingjun (Eddy) <eddymier@gmail.com
> > >wrote:
> >
> > > Hi all,
> > >
> > > An important observation. The 100% mapper without completion all have
> > > temporary files of 64MB exactly, which means the output of the mapper
> is
> > cut
> > > off at the block boundary. However, we do have some successfully
> > completed
> > > mappers having output files larger than 64MB and we also have less than
> > 100%
> > > mappers have temporary files larger than 64MB.
> > >
> > > Here is the info returned by "hadoop fs -ls
> > >
> >
> /hadoop/music/track/audio/track_1/_temporary/_attempt_200911021416_0001_m_000091_0
> > > -rw-r--r--   3 hadoop supergroup   67108864 2009-11-02 14:29
> > >
> >
> /hadoop/music/track/audio/track_1/_temporary/_attempt_200911021416_0001_m_000091_0/part-m-00091
> > >
> > > This is the temporary file of a 100% mapper without completion.
> > >
> > > Any clues on this?
> > >
> > > Best regards,
> > > Zhang Bingjun (Eddy)
> > >
> > > E-mail: eddymier@gmail.com, bingjun@nus.edu.sg,
> bingjun@comp.nus.edu.sg
> > > Tel No: +65-96188110 (M)
> > >
> > >
> > > On Mon, Nov 2, 2009 at 6:52 PM, Amandeep Khurana <am...@gmail.com>
> > wrote:
> > >
> > >> On Mon, Nov 2, 2009 at 2:40 AM, Zhang Bingjun (Eddy) <
> > eddymier@gmail.com
> > >> >wrote:
> > >>
> > >> > Hi Pallavi, Khurana, and Vasekar,
> > >> >
> > >> > Thanks a lot for your reply. To make up, the mapper we are using is
> > the
> > >> > multithreaded mapper.
> > >> >
> > >>
> > >> How are you doing this? Did you your own MapRunnable?
> > >>
> > >>
> > >
> > >> >
> > >> > To answer your questions:
> > >> >
> > >> > Pallavi, Khurana: I have checked the logs. The key it got stuck on
> is
> > >> the
> > >> > last key it reads in. Since the progress is 100% I suppose the key
> is
> > >> the
> > >> > last key? From the stdout log of our mapper, we are confirmed that
> the
> > >> map
> > >> > function of the mapper has completed. After that, no more key was
> read
> > >> in
> > >> > and no other progress is made by the mapper, which means it didn't
> > >> complete
> > >> > / commit being 100%. For each job, we have different number of
> mapper
> > >> got
> > >> > stuck. But it is roughly about one third to half mappers. From the
> > >> stdout
> > >> > logs of our mapper, we are also confirmed that the map function of
> the
> > >> > mapper has finished. That's why we started to suspect the MapReduce
> > >> > framework has something to do with the stuck problem.
> > >> >
> > >> > Here is log from the stdout:
> > >> > [entry] [293419] <track><name>i bealive</name><artist>Simian Mobile
> > >> > Disco</artist></track>
> > >> > [0] [293419] start creating objects
> > >> > [1] [293419] start parsing xml
> > >> > [2] [293419] start updating data
> > >> > [sleep] [228312]
> > >> > [error] [228312] java.io.IOException: [error] [228312] reaches the
> > >> maximum
> > >> > number of attempts whiling updating
> > >> > [3] [228312] start collecting output228312
> > >> > [3.1 done with null] [228312] done228312
> > >> > [fail] [228312] java.io.IOException: 3.1 throw null228312
> > >> > [done] [228312] done228312
> > >> > [sleep] [293419]
> > >> > [error] [293419] java.io.IOException: [error] [293419] reaches the
> > >> maximum
> > >> > number of attempts whiling updating
> > >> > [3] [293419] start collecting output293419
> > >> > [3.1 done with null] [293419] done293419
> > >> > [fail] [293419] java.io.IOException: 3.1 throw null293419
> > >> > [done] [293419] done293419
> > >> >
> > >> > Here is the log from tasktracker:
> > >> > 2009-11-02 16:58:23,518 INFO org.apache.hadoop.mapred.TaskTracker:
> > >> > attempt_200911021416_0001_m_000047_1 1.0% name: 梟 artist: Plastic
> > Tree
> > >> > 2009-11-02 16:58:50,527 INFO org.apache.hadoop.mapred.TaskTracker:
> > >> > attempt_200911021416_0001_m_000047_1 1.0% name: Zydeko artist:
> Cirque
> > du
> > >> > Soleil
> > >> > 2009-11-02 16:59:23,539 INFO org.apache.hadoop.mapred.TaskTracker:
> > >> > attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ieartist:
> > >> > www.China.ie
> > >> > 2009-11-02 16:59:50,550 INFO org.apache.hadoop.mapred.TaskTracker:
> > >> > attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ieartist:
> > >> > www.China.ie
> > >> > 2009-11-02 17:00:11,560 INFO org.apache.hadoop.mapred.TaskTracker:
> > >> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist:
> > Simian
> > >> > Mobile Disco
> > >> > 2009-11-02 17:00:23,565 INFO org.apache.hadoop.mapred.TaskTracker:
> > >> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist:
> > Simian
> > >> > Mobile Disco
> > >> > 2009-11-02 17:01:11,585 INFO org.apache.hadoop.mapred.TaskTracker:
> > >> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist:
> > Simian
> > >> > Mobile Disco
> > >> >
> > >> > From these logs, we can see that the last read in entry is "i
> bealive
> > >> > artist: Simian Mobile Disco" the last process entry in the mapper is
> > the
> > >> > same as this entry and from the stdout log, we can see the map
> > function
> > >> has
> > >> > finished....
> > >> >
> > >>
> > >> Put some stdout or logging code towards the end of the mapper and also
> > >> check
> > >> if all threads are coming back. Do you think it could be some issue
> with
> > >> the
> > >> threads?
> > >>
> > >>
> > >> > Vasekar: The HDFS is healthy. We didn't store too many small files
> in
> > it
> > >> > yet. The return of command "hadoop fsck /" is like follows:
> > >> > Total size:    89114318394 B (Total open files size: 19845943808 B)
> > >> >  Total dirs:    430
> > >> >  Total files:   1761 (Files currently being written: 137)
> > >> >  Total blocks (validated):      2691 (avg. block size 33115688 B)
> > (Total
> > >> > open file blocks (not validated): 309)
> > >> >  Minimally replicated blocks:   2691 (100.0 %)
> > >> >  Over-replicated blocks:        0 (0.0 %)
> > >> >  Under-replicated blocks:       0 (0.0 %)
> > >> >  Mis-replicated blocks:         0 (0.0 %)
> > >> >  Default replication factor:    3
> > >> >  Average block replication:     2.802304
> > >> >  Corrupt blocks:                0
> > >> >  Missing replicas:              0 (0.0 %)
> > >> >  Number of data-nodes:          76
> > >> >  Number of racks:               1
> > >> >
> > >> > Is this problem possibly due to the stuck communication between the
> > >> actual
> > >> > task (the mapper) and the tasktracker? From the logs, we cannot see
> > >> > anything
> > >> > after the stuck.
> > >> >
> > >>
> > >> The TT and JT logs would show if there is a lost communication. Enable
> > >> DEBUG
> > >> logging for the processes and keep a tab.
> > >>
> > >>
> > >> >
> > >> >
> > >> > fromAmandeep Khurana <am...@gmail.com>
> > >> > reply-tocommon-user@hadoop.apache.org
> > >> > tocommon-user@hadoop.apache.org
> > >> > dateMon, Nov 2, 2009 at 4:36 PMsubjectRe: too many 100% mapper does
> > not
> > >> > complete / finish / commitmailing list<
> common-user.hadoop.apache.org>
> > >> > Filter
> > >> > messages from this mailing
> > >> > listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
> > >> > from this mailing-list
> > >> > hide details 4:36 PM (1 hour ago)
> > >> > Did you try to add any logging and see what keys are they getting
> > stuck
> > >> on
> > >> > or whats the last keys it processed? Do the same number of mappers
> get
> > >> > stuck
> > >> > every time?
> > >> >
> > >> > Not having reducers is not a problem. Its pretty normal to do that.
> > >> >
> > >> > fromAmogh Vasekar <am...@yahoo-inc.com>
> > >> > reply-tocommon-user@hadoop.apache.org
> > >> > to"common-user@hadoop.apache.org" <co...@hadoop.apache.org>
> > >> > dateMon, Nov 2, 2009 at 4:50 PMsubjectRe: too many 100% mapper does
> > not
> > >> > complete / finish / commitmailing list<
> common-user.hadoop.apache.org>
> > >> > Filter
> > >> > messages from this mailing
> > >> > listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
> > >> > from this mailing-list
> > >> > hide details 4:50 PM (1 hour ago)
> > >> >
> > >> > Hi,
> > >> > Quick questions...
> > >> > Are you creating too many small files?
> > >> > Are there any task side files being created?
> > >> > Is the heap for NN having enough space to list metadata? Any details
> > on
> > >> its
> > >> > general health will probably be helpful to people on the list.
> > >> >
> > >> > Amogh
> > >> > Best regards,
> > >> > Zhang Bingjun (Eddy)
> > >> >
> > >> > E-mail: eddymier@gmail.com, bingjun@nus.edu.sg,
> > bingjun@comp.nus.edu.sg
> > >> > Tel No: +65-96188110 (M)
> > >> >
> > >> >
> > >> > On Mon, Nov 2, 2009 at 4:51 PM, Palleti, Pallavi <
> > >> > pallavi.palleti@corp.aol.com> wrote:
> > >> >
> > >> > > Hi Eddy,
> > >> > >
> > >> > > I faced similar issue when I used pig script for fetching webpages
> > for
> > >> > > certain urls. I could see the map phase showing100% and it is
> still
> > >> > > running. As I was logging the page that it is currently fetching,
> I
> > >> > > could see the process hasn't yet finished. It might be the same
> > issue.
> > >> > > So, you can add logging to check whether it is actually stuck or
> the
> > >> > > process is still going on.
> > >> > >
> > >> > > Thanks
> > >> > > Pallavi
> > >> > >
> > >> > > ________________________________
> > >> > >
> > >> > > From: Zhang Bingjun (Eddy) [mailto:eddymier@gmail.com]
> > >> > > Sent: Monday, November 02, 2009 2:03 PM
> > >> > > To: common-user@hadoop.apache.org; hdfs-user@hadoop.apache.org;
> > >> > > mapreduce-user@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
> > >> > > Subject: too many 100% mapper does not complete / finish / commit
> > >> > >
> > >> > >
> > >> > > Dear hadoop fellows,
> > >> > >
> > >> > > We have been using Hadoop-0.20.1 MapReduce to crawl some web data.
> > In
> > >> > > this case, we only have mappers to crawl data and save data into
> > HDFS
> > >> in
> > >> > > a distributed way. No reducers is specified in the job conf.
> > >> > >
> > >> > > The problem is that for every job we have about one third mappers
> > >> stuck
> > >> > > with 100% progress but never complete. If we look at the the
> > >> tasktracker
> > >> > > log of those mappers, the last log was the key input INFO log line
> > and
> > >> > > no others logs were output after that.
> > >> > >
> > >> > > From the stdout log of a specific attempt of one of those mappers,
> > we
> > >> > > can see that the map function of the mapper has been finished
> > >> completely
> > >> > > and the control of the execution should be somewhere in the
> > MapReduce
> > >> > > framework part.
> > >> > >
> > >> > > Does anyone have any clue about this problem? Is it because we
> > didn't
> > >> > > use any reducers? Since two thirds of the mappers could complete
> > >> > > successfully and commit their output data into HDFS, I suspect the
> > >> stuck
> > >> > > mappers has something to do with the MapReduce framework code?
> > >> > >
> > >> > > Any input will be appreciated. Thanks a lot!
> > >> > >
> > >> > > Best regards,
> > >> > > Zhang Bingjun (Eddy)
> > >> > >
> > >> > > E-mail: eddymier@gmail.com, bingjun@nus.edu.sg,
> > >> bingjun@comp.nus.edu.sg
> > >> > > Tel No: +65-96188110 (M)
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Re: too many 100% mapper does not complete / finish / commit

Posted by Amandeep Khurana <am...@gmail.com>.
inline

On Mon, Nov 2, 2009 at 3:15 AM, Zhang Bingjun (Eddy) <ed...@gmail.com>wrote:

> Dear Khurana,
>
> We didn't use MapRunnable. In stead, we used directly the package
> org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper and passed our
> normal Mapper Class to it using its getMapperClass() interface. We set the
> number of threads using its setNumberOfThreads(). Is this one correct way
> of
> doing multithreaded mapper?
>

I was just curious on how you did it. This is the right way afaik


>
> We noticed in hadoop-0.20.1 there is another
> MultithreadedMapper, org.apache.hadoop.mapred.lib.map.MultithreadedMapper,
> but we didn't touch it.
>

Thats the deprecated package. You used the correct one.


>
> It might be the reason that some thread didn't return. We need to do some
> work to confirm that. We will also try to enable DEBUG mode of hadoop.
> Could
> you share some info on starting an hadoop deamon or the whole hadoop
> cluster
> in debug mode?
>

You'll have to edit the log4jproperties file in $HADOOP_HOME/conf/
After editing, you'll have to restart the daemons (or the entire cluster).

The DEBUG logs might give some more info of whats happening.


>
> Thanks a lot!
>
> Best regards,
> Zhang Bingjun (Eddy)
>
> E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
> Tel No: +65-96188110 (M)
>
>
> On Mon, Nov 2, 2009 at 6:58 PM, Zhang Bingjun (Eddy) <eddymier@gmail.com
> >wrote:
>
> > Hi all,
> >
> > An important observation. The 100% mapper without completion all have
> > temporary files of 64MB exactly, which means the output of the mapper is
> cut
> > off at the block boundary. However, we do have some successfully
> completed
> > mappers having output files larger than 64MB and we also have less than
> 100%
> > mappers have temporary files larger than 64MB.
> >
> > Here is the info returned by "hadoop fs -ls
> >
> /hadoop/music/track/audio/track_1/_temporary/_attempt_200911021416_0001_m_000091_0
> > -rw-r--r--   3 hadoop supergroup   67108864 2009-11-02 14:29
> >
> /hadoop/music/track/audio/track_1/_temporary/_attempt_200911021416_0001_m_000091_0/part-m-00091
> >
> > This is the temporary file of a 100% mapper without completion.
> >
> > Any clues on this?
> >
> > Best regards,
> > Zhang Bingjun (Eddy)
> >
> > E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
> > Tel No: +65-96188110 (M)
> >
> >
> > On Mon, Nov 2, 2009 at 6:52 PM, Amandeep Khurana <am...@gmail.com>
> wrote:
> >
> >> On Mon, Nov 2, 2009 at 2:40 AM, Zhang Bingjun (Eddy) <
> eddymier@gmail.com
> >> >wrote:
> >>
> >> > Hi Pallavi, Khurana, and Vasekar,
> >> >
> >> > Thanks a lot for your reply. To make up, the mapper we are using is
> the
> >> > multithreaded mapper.
> >> >
> >>
> >> How are you doing this? Did you your own MapRunnable?
> >>
> >>
> >
> >> >
> >> > To answer your questions:
> >> >
> >> > Pallavi, Khurana: I have checked the logs. The key it got stuck on is
> >> the
> >> > last key it reads in. Since the progress is 100% I suppose the key is
> >> the
> >> > last key? From the stdout log of our mapper, we are confirmed that the
> >> map
> >> > function of the mapper has completed. After that, no more key was read
> >> in
> >> > and no other progress is made by the mapper, which means it didn't
> >> complete
> >> > / commit being 100%. For each job, we have different number of mapper
> >> got
> >> > stuck. But it is roughly about one third to half mappers. From the
> >> stdout
> >> > logs of our mapper, we are also confirmed that the map function of the
> >> > mapper has finished. That's why we started to suspect the MapReduce
> >> > framework has something to do with the stuck problem.
> >> >
> >> > Here is log from the stdout:
> >> > [entry] [293419] <track><name>i bealive</name><artist>Simian Mobile
> >> > Disco</artist></track>
> >> > [0] [293419] start creating objects
> >> > [1] [293419] start parsing xml
> >> > [2] [293419] start updating data
> >> > [sleep] [228312]
> >> > [error] [228312] java.io.IOException: [error] [228312] reaches the
> >> maximum
> >> > number of attempts whiling updating
> >> > [3] [228312] start collecting output228312
> >> > [3.1 done with null] [228312] done228312
> >> > [fail] [228312] java.io.IOException: 3.1 throw null228312
> >> > [done] [228312] done228312
> >> > [sleep] [293419]
> >> > [error] [293419] java.io.IOException: [error] [293419] reaches the
> >> maximum
> >> > number of attempts whiling updating
> >> > [3] [293419] start collecting output293419
> >> > [3.1 done with null] [293419] done293419
> >> > [fail] [293419] java.io.IOException: 3.1 throw null293419
> >> > [done] [293419] done293419
> >> >
> >> > Here is the log from tasktracker:
> >> > 2009-11-02 16:58:23,518 INFO org.apache.hadoop.mapred.TaskTracker:
> >> > attempt_200911021416_0001_m_000047_1 1.0% name: 梟 artist: Plastic
> Tree
> >> > 2009-11-02 16:58:50,527 INFO org.apache.hadoop.mapred.TaskTracker:
> >> > attempt_200911021416_0001_m_000047_1 1.0% name: Zydeko artist: Cirque
> du
> >> > Soleil
> >> > 2009-11-02 16:59:23,539 INFO org.apache.hadoop.mapred.TaskTracker:
> >> > attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
> >> > www.China.ie
> >> > 2009-11-02 16:59:50,550 INFO org.apache.hadoop.mapred.TaskTracker:
> >> > attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
> >> > www.China.ie
> >> > 2009-11-02 17:00:11,560 INFO org.apache.hadoop.mapred.TaskTracker:
> >> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist:
> Simian
> >> > Mobile Disco
> >> > 2009-11-02 17:00:23,565 INFO org.apache.hadoop.mapred.TaskTracker:
> >> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist:
> Simian
> >> > Mobile Disco
> >> > 2009-11-02 17:01:11,585 INFO org.apache.hadoop.mapred.TaskTracker:
> >> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist:
> Simian
> >> > Mobile Disco
> >> >
> >> > From these logs, we can see that the last read in entry is "i bealive
> >> > artist: Simian Mobile Disco" the last process entry in the mapper is
> the
> >> > same as this entry and from the stdout log, we can see the map
> function
> >> has
> >> > finished....
> >> >
> >>
> >> Put some stdout or logging code towards the end of the mapper and also
> >> check
> >> if all threads are coming back. Do you think it could be some issue with
> >> the
> >> threads?
> >>
> >>
> >> > Vasekar: The HDFS is healthy. We didn't store too many small files in
> it
> >> > yet. The return of command "hadoop fsck /" is like follows:
> >> > Total size:    89114318394 B (Total open files size: 19845943808 B)
> >> >  Total dirs:    430
> >> >  Total files:   1761 (Files currently being written: 137)
> >> >  Total blocks (validated):      2691 (avg. block size 33115688 B)
> (Total
> >> > open file blocks (not validated): 309)
> >> >  Minimally replicated blocks:   2691 (100.0 %)
> >> >  Over-replicated blocks:        0 (0.0 %)
> >> >  Under-replicated blocks:       0 (0.0 %)
> >> >  Mis-replicated blocks:         0 (0.0 %)
> >> >  Default replication factor:    3
> >> >  Average block replication:     2.802304
> >> >  Corrupt blocks:                0
> >> >  Missing replicas:              0 (0.0 %)
> >> >  Number of data-nodes:          76
> >> >  Number of racks:               1
> >> >
> >> > Is this problem possibly due to the stuck communication between the
> >> actual
> >> > task (the mapper) and the tasktracker? From the logs, we cannot see
> >> > anything
> >> > after the stuck.
> >> >
> >>
> >> The TT and JT logs would show if there is a lost communication. Enable
> >> DEBUG
> >> logging for the processes and keep a tab.
> >>
> >>
> >> >
> >> >
> >> > fromAmandeep Khurana <am...@gmail.com>
> >> > reply-tocommon-user@hadoop.apache.org
> >> > tocommon-user@hadoop.apache.org
> >> > dateMon, Nov 2, 2009 at 4:36 PMsubjectRe: too many 100% mapper does
> not
> >> > complete / finish / commitmailing list<common-user.hadoop.apache.org>
> >> > Filter
> >> > messages from this mailing
> >> > listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
> >> > from this mailing-list
> >> > hide details 4:36 PM (1 hour ago)
> >> > Did you try to add any logging and see what keys are they getting
> stuck
> >> on
> >> > or whats the last keys it processed? Do the same number of mappers get
> >> > stuck
> >> > every time?
> >> >
> >> > Not having reducers is not a problem. Its pretty normal to do that.
> >> >
> >> > fromAmogh Vasekar <am...@yahoo-inc.com>
> >> > reply-tocommon-user@hadoop.apache.org
> >> > to"common-user@hadoop.apache.org" <co...@hadoop.apache.org>
> >> > dateMon, Nov 2, 2009 at 4:50 PMsubjectRe: too many 100% mapper does
> not
> >> > complete / finish / commitmailing list<common-user.hadoop.apache.org>
> >> > Filter
> >> > messages from this mailing
> >> > listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
> >> > from this mailing-list
> >> > hide details 4:50 PM (1 hour ago)
> >> >
> >> > Hi,
> >> > Quick questions...
> >> > Are you creating too many small files?
> >> > Are there any task side files being created?
> >> > Is the heap for NN having enough space to list metadata? Any details
> on
> >> its
> >> > general health will probably be helpful to people on the list.
> >> >
> >> > Amogh
> >> > Best regards,
> >> > Zhang Bingjun (Eddy)
> >> >
> >> > E-mail: eddymier@gmail.com, bingjun@nus.edu.sg,
> bingjun@comp.nus.edu.sg
> >> > Tel No: +65-96188110 (M)
> >> >
> >> >
> >> > On Mon, Nov 2, 2009 at 4:51 PM, Palleti, Pallavi <
> >> > pallavi.palleti@corp.aol.com> wrote:
> >> >
> >> > > Hi Eddy,
> >> > >
> >> > > I faced similar issue when I used pig script for fetching webpages
> for
> >> > > certain urls. I could see the map phase showing100% and it is still
> >> > > running. As I was logging the page that it is currently fetching, I
> >> > > could see the process hasn't yet finished. It might be the same
> issue.
> >> > > So, you can add logging to check whether it is actually stuck or the
> >> > > process is still going on.
> >> > >
> >> > > Thanks
> >> > > Pallavi
> >> > >
> >> > > ________________________________
> >> > >
> >> > > From: Zhang Bingjun (Eddy) [mailto:eddymier@gmail.com]
> >> > > Sent: Monday, November 02, 2009 2:03 PM
> >> > > To: common-user@hadoop.apache.org; hdfs-user@hadoop.apache.org;
> >> > > mapreduce-user@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
> >> > > Subject: too many 100% mapper does not complete / finish / commit
> >> > >
> >> > >
> >> > > Dear hadoop fellows,
> >> > >
> >> > > We have been using Hadoop-0.20.1 MapReduce to crawl some web data.
> In
> >> > > this case, we only have mappers to crawl data and save data into
> HDFS
> >> in
> >> > > a distributed way. No reducers is specified in the job conf.
> >> > >
> >> > > The problem is that for every job we have about one third mappers
> >> stuck
> >> > > with 100% progress but never complete. If we look at the the
> >> tasktracker
> >> > > log of those mappers, the last log was the key input INFO log line
> and
> >> > > no others logs were output after that.
> >> > >
> >> > > From the stdout log of a specific attempt of one of those mappers,
> we
> >> > > can see that the map function of the mapper has been finished
> >> completely
> >> > > and the control of the execution should be somewhere in the
> MapReduce
> >> > > framework part.
> >> > >
> >> > > Does anyone have any clue about this problem? Is it because we
> didn't
> >> > > use any reducers? Since two thirds of the mappers could complete
> >> > > successfully and commit their output data into HDFS, I suspect the
> >> stuck
> >> > > mappers has something to do with the MapReduce framework code?
> >> > >
> >> > > Any input will be appreciated. Thanks a lot!
> >> > >
> >> > > Best regards,
> >> > > Zhang Bingjun (Eddy)
> >> > >
> >> > > E-mail: eddymier@gmail.com, bingjun@nus.edu.sg,
> >> bingjun@comp.nus.edu.sg
> >> > > Tel No: +65-96188110 (M)
> >> > >
> >> > >
> >> >
> >>
> >
> >
>

Re: too many 100% mapper does not complete / finish / commit

Posted by "Zhang Bingjun (Eddy)" <ed...@gmail.com>.
Dear Khurana,

We didn't use MapRunnable. In stead, we used directly the package
org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper and passed our
normal Mapper Class to it using its getMapperClass() interface. We set the
number of threads using its setNumberOfThreads(). Is this one correct way of
doing multithreaded mapper?

We noticed in hadoop-0.20.1 there is another
MultithreadedMapper, org.apache.hadoop.mapred.lib.map.MultithreadedMapper,
but we didn't touch it.

It might be the reason that some thread didn't return. We need to do some
work to confirm that. We will also try to enable DEBUG mode of hadoop. Could
you share some info on starting an hadoop deamon or the whole hadoop cluster
in debug mode?

Thanks a lot!

Best regards,
Zhang Bingjun (Eddy)

E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
Tel No: +65-96188110 (M)


On Mon, Nov 2, 2009 at 6:58 PM, Zhang Bingjun (Eddy) <ed...@gmail.com>wrote:

> Hi all,
>
> An important observation. The 100% mapper without completion all have
> temporary files of 64MB exactly, which means the output of the mapper is cut
> off at the block boundary. However, we do have some successfully completed
> mappers having output files larger than 64MB and we also have less than 100%
> mappers have temporary files larger than 64MB.
>
> Here is the info returned by "hadoop fs -ls
> /hadoop/music/track/audio/track_1/_temporary/_attempt_200911021416_0001_m_000091_0
> -rw-r--r--   3 hadoop supergroup   67108864 2009-11-02 14:29
> /hadoop/music/track/audio/track_1/_temporary/_attempt_200911021416_0001_m_000091_0/part-m-00091
>
> This is the temporary file of a 100% mapper without completion.
>
> Any clues on this?
>
> Best regards,
> Zhang Bingjun (Eddy)
>
> E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
> Tel No: +65-96188110 (M)
>
>
> On Mon, Nov 2, 2009 at 6:52 PM, Amandeep Khurana <am...@gmail.com> wrote:
>
>> On Mon, Nov 2, 2009 at 2:40 AM, Zhang Bingjun (Eddy) <eddymier@gmail.com
>> >wrote:
>>
>> > Hi Pallavi, Khurana, and Vasekar,
>> >
>> > Thanks a lot for your reply. To make up, the mapper we are using is the
>> > multithreaded mapper.
>> >
>>
>> How are you doing this? Did you your own MapRunnable?
>>
>>
>
>> >
>> > To answer your questions:
>> >
>> > Pallavi, Khurana: I have checked the logs. The key it got stuck on is
>> the
>> > last key it reads in. Since the progress is 100% I suppose the key is
>> the
>> > last key? From the stdout log of our mapper, we are confirmed that the
>> map
>> > function of the mapper has completed. After that, no more key was read
>> in
>> > and no other progress is made by the mapper, which means it didn't
>> complete
>> > / commit being 100%. For each job, we have different number of mapper
>> got
>> > stuck. But it is roughly about one third to half mappers. From the
>> stdout
>> > logs of our mapper, we are also confirmed that the map function of the
>> > mapper has finished. That's why we started to suspect the MapReduce
>> > framework has something to do with the stuck problem.
>> >
>> > Here is log from the stdout:
>> > [entry] [293419] <track><name>i bealive</name><artist>Simian Mobile
>> > Disco</artist></track>
>> > [0] [293419] start creating objects
>> > [1] [293419] start parsing xml
>> > [2] [293419] start updating data
>> > [sleep] [228312]
>> > [error] [228312] java.io.IOException: [error] [228312] reaches the
>> maximum
>> > number of attempts whiling updating
>> > [3] [228312] start collecting output228312
>> > [3.1 done with null] [228312] done228312
>> > [fail] [228312] java.io.IOException: 3.1 throw null228312
>> > [done] [228312] done228312
>> > [sleep] [293419]
>> > [error] [293419] java.io.IOException: [error] [293419] reaches the
>> maximum
>> > number of attempts whiling updating
>> > [3] [293419] start collecting output293419
>> > [3.1 done with null] [293419] done293419
>> > [fail] [293419] java.io.IOException: 3.1 throw null293419
>> > [done] [293419] done293419
>> >
>> > Here is the log from tasktracker:
>> > 2009-11-02 16:58:23,518 INFO org.apache.hadoop.mapred.TaskTracker:
>> > attempt_200911021416_0001_m_000047_1 1.0% name: 梟 artist: Plastic Tree
>> > 2009-11-02 16:58:50,527 INFO org.apache.hadoop.mapred.TaskTracker:
>> > attempt_200911021416_0001_m_000047_1 1.0% name: Zydeko artist: Cirque du
>> > Soleil
>> > 2009-11-02 16:59:23,539 INFO org.apache.hadoop.mapred.TaskTracker:
>> > attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
>> > www.China.ie
>> > 2009-11-02 16:59:50,550 INFO org.apache.hadoop.mapred.TaskTracker:
>> > attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
>> > www.China.ie
>> > 2009-11-02 17:00:11,560 INFO org.apache.hadoop.mapred.TaskTracker:
>> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
>> > Mobile Disco
>> > 2009-11-02 17:00:23,565 INFO org.apache.hadoop.mapred.TaskTracker:
>> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
>> > Mobile Disco
>> > 2009-11-02 17:01:11,585 INFO org.apache.hadoop.mapred.TaskTracker:
>> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
>> > Mobile Disco
>> >
>> > From these logs, we can see that the last read in entry is "i bealive
>> > artist: Simian Mobile Disco" the last process entry in the mapper is the
>> > same as this entry and from the stdout log, we can see the map function
>> has
>> > finished....
>> >
>>
>> Put some stdout or logging code towards the end of the mapper and also
>> check
>> if all threads are coming back. Do you think it could be some issue with
>> the
>> threads?
>>
>>
>> > Vasekar: The HDFS is healthy. We didn't store too many small files in it
>> > yet. The return of command "hadoop fsck /" is like follows:
>> > Total size:    89114318394 B (Total open files size: 19845943808 B)
>> >  Total dirs:    430
>> >  Total files:   1761 (Files currently being written: 137)
>> >  Total blocks (validated):      2691 (avg. block size 33115688 B) (Total
>> > open file blocks (not validated): 309)
>> >  Minimally replicated blocks:   2691 (100.0 %)
>> >  Over-replicated blocks:        0 (0.0 %)
>> >  Under-replicated blocks:       0 (0.0 %)
>> >  Mis-replicated blocks:         0 (0.0 %)
>> >  Default replication factor:    3
>> >  Average block replication:     2.802304
>> >  Corrupt blocks:                0
>> >  Missing replicas:              0 (0.0 %)
>> >  Number of data-nodes:          76
>> >  Number of racks:               1
>> >
>> > Is this problem possibly due to the stuck communication between the
>> actual
>> > task (the mapper) and the tasktracker? From the logs, we cannot see
>> > anything
>> > after the stuck.
>> >
>>
>> The TT and JT logs would show if there is a lost communication. Enable
>> DEBUG
>> logging for the processes and keep a tab.
>>
>>
>> >
>> >
>> > fromAmandeep Khurana <am...@gmail.com>
>> > reply-tocommon-user@hadoop.apache.org
>> > tocommon-user@hadoop.apache.org
>> > dateMon, Nov 2, 2009 at 4:36 PMsubjectRe: too many 100% mapper does not
>> > complete / finish / commitmailing list<common-user.hadoop.apache.org>
>> > Filter
>> > messages from this mailing
>> > listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
>> > from this mailing-list
>> > hide details 4:36 PM (1 hour ago)
>> > Did you try to add any logging and see what keys are they getting stuck
>> on
>> > or whats the last keys it processed? Do the same number of mappers get
>> > stuck
>> > every time?
>> >
>> > Not having reducers is not a problem. Its pretty normal to do that.
>> >
>> > fromAmogh Vasekar <am...@yahoo-inc.com>
>> > reply-tocommon-user@hadoop.apache.org
>> > to"common-user@hadoop.apache.org" <co...@hadoop.apache.org>
>> > dateMon, Nov 2, 2009 at 4:50 PMsubjectRe: too many 100% mapper does not
>> > complete / finish / commitmailing list<common-user.hadoop.apache.org>
>> > Filter
>> > messages from this mailing
>> > listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
>> > from this mailing-list
>> > hide details 4:50 PM (1 hour ago)
>> >
>> > Hi,
>> > Quick questions...
>> > Are you creating too many small files?
>> > Are there any task side files being created?
>> > Is the heap for NN having enough space to list metadata? Any details on
>> its
>> > general health will probably be helpful to people on the list.
>> >
>> > Amogh
>> > Best regards,
>> > Zhang Bingjun (Eddy)
>> >
>> > E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
>> > Tel No: +65-96188110 (M)
>> >
>> >
>> > On Mon, Nov 2, 2009 at 4:51 PM, Palleti, Pallavi <
>> > pallavi.palleti@corp.aol.com> wrote:
>> >
>> > > Hi Eddy,
>> > >
>> > > I faced similar issue when I used pig script for fetching webpages for
>> > > certain urls. I could see the map phase showing100% and it is still
>> > > running. As I was logging the page that it is currently fetching, I
>> > > could see the process hasn't yet finished. It might be the same issue.
>> > > So, you can add logging to check whether it is actually stuck or the
>> > > process is still going on.
>> > >
>> > > Thanks
>> > > Pallavi
>> > >
>> > > ________________________________
>> > >
>> > > From: Zhang Bingjun (Eddy) [mailto:eddymier@gmail.com]
>> > > Sent: Monday, November 02, 2009 2:03 PM
>> > > To: common-user@hadoop.apache.org; hdfs-user@hadoop.apache.org;
>> > > mapreduce-user@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
>> > > Subject: too many 100% mapper does not complete / finish / commit
>> > >
>> > >
>> > > Dear hadoop fellows,
>> > >
>> > > We have been using Hadoop-0.20.1 MapReduce to crawl some web data. In
>> > > this case, we only have mappers to crawl data and save data into HDFS
>> in
>> > > a distributed way. No reducers is specified in the job conf.
>> > >
>> > > The problem is that for every job we have about one third mappers
>> stuck
>> > > with 100% progress but never complete. If we look at the the
>> tasktracker
>> > > log of those mappers, the last log was the key input INFO log line and
>> > > no others logs were output after that.
>> > >
>> > > From the stdout log of a specific attempt of one of those mappers, we
>> > > can see that the map function of the mapper has been finished
>> completely
>> > > and the control of the execution should be somewhere in the MapReduce
>> > > framework part.
>> > >
>> > > Does anyone have any clue about this problem? Is it because we didn't
>> > > use any reducers? Since two thirds of the mappers could complete
>> > > successfully and commit their output data into HDFS, I suspect the
>> stuck
>> > > mappers has something to do with the MapReduce framework code?
>> > >
>> > > Any input will be appreciated. Thanks a lot!
>> > >
>> > > Best regards,
>> > > Zhang Bingjun (Eddy)
>> > >
>> > > E-mail: eddymier@gmail.com, bingjun@nus.edu.sg,
>> bingjun@comp.nus.edu.sg
>> > > Tel No: +65-96188110 (M)
>> > >
>> > >
>> >
>>
>
>

Re: too many 100% mapper does not complete / finish / commit

Posted by "Zhang Bingjun (Eddy)" <ed...@gmail.com>.
Hi all,

An important observation. The 100% mapper without completion all have
temporary files of 64MB exactly, which means the output of the mapper is cut
off at the block boundary. However, we do have some successfully completed
mappers having output files larger than 64MB and we also have less than 100%
mappers have temporary files larger than 64MB.

Here is the info returned by "hadoop fs -ls
/hadoop/music/track/audio/track_1/_temporary/_attempt_200911021416_0001_m_000091_0
-rw-r--r--   3 hadoop supergroup   67108864 2009-11-02 14:29
/hadoop/music/track/audio/track_1/_temporary/_attempt_200911021416_0001_m_000091_0/part-m-00091

This is the temporary file of a 100% mapper without completion.

Any clues on this?

Best regards,
Zhang Bingjun (Eddy)

E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
Tel No: +65-96188110 (M)


On Mon, Nov 2, 2009 at 6:52 PM, Amandeep Khurana <am...@gmail.com> wrote:

> On Mon, Nov 2, 2009 at 2:40 AM, Zhang Bingjun (Eddy) <eddymier@gmail.com
> >wrote:
>
> > Hi Pallavi, Khurana, and Vasekar,
> >
> > Thanks a lot for your reply. To make up, the mapper we are using is the
> > multithreaded mapper.
> >
>
> How are you doing this? Did you your own MapRunnable?
>
>
> >
> > To answer your questions:
> >
> > Pallavi, Khurana: I have checked the logs. The key it got stuck on is the
> > last key it reads in. Since the progress is 100% I suppose the key is the
> > last key? From the stdout log of our mapper, we are confirmed that the
> map
> > function of the mapper has completed. After that, no more key was read in
> > and no other progress is made by the mapper, which means it didn't
> complete
> > / commit being 100%. For each job, we have different number of mapper got
> > stuck. But it is roughly about one third to half mappers. From the stdout
> > logs of our mapper, we are also confirmed that the map function of the
> > mapper has finished. That's why we started to suspect the MapReduce
> > framework has something to do with the stuck problem.
> >
> > Here is log from the stdout:
> > [entry] [293419] <track><name>i bealive</name><artist>Simian Mobile
> > Disco</artist></track>
> > [0] [293419] start creating objects
> > [1] [293419] start parsing xml
> > [2] [293419] start updating data
> > [sleep] [228312]
> > [error] [228312] java.io.IOException: [error] [228312] reaches the
> maximum
> > number of attempts whiling updating
> > [3] [228312] start collecting output228312
> > [3.1 done with null] [228312] done228312
> > [fail] [228312] java.io.IOException: 3.1 throw null228312
> > [done] [228312] done228312
> > [sleep] [293419]
> > [error] [293419] java.io.IOException: [error] [293419] reaches the
> maximum
> > number of attempts whiling updating
> > [3] [293419] start collecting output293419
> > [3.1 done with null] [293419] done293419
> > [fail] [293419] java.io.IOException: 3.1 throw null293419
> > [done] [293419] done293419
> >
> > Here is the log from tasktracker:
> > 2009-11-02 16:58:23,518 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_200911021416_0001_m_000047_1 1.0% name: 梟 artist: Plastic Tree
> > 2009-11-02 16:58:50,527 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_200911021416_0001_m_000047_1 1.0% name: Zydeko artist: Cirque du
> > Soleil
> > 2009-11-02 16:59:23,539 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
> > www.China.ie
> > 2009-11-02 16:59:50,550 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
> > www.China.ie
> > 2009-11-02 17:00:11,560 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
> > Mobile Disco
> > 2009-11-02 17:00:23,565 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
> > Mobile Disco
> > 2009-11-02 17:01:11,585 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
> > Mobile Disco
> >
> > From these logs, we can see that the last read in entry is "i bealive
> > artist: Simian Mobile Disco" the last process entry in the mapper is the
> > same as this entry and from the stdout log, we can see the map function
> has
> > finished....
> >
>
> Put some stdout or logging code towards the end of the mapper and also
> check
> if all threads are coming back. Do you think it could be some issue with
> the
> threads?
>
>
> > Vasekar: The HDFS is healthy. We didn't store too many small files in it
> > yet. The return of command "hadoop fsck /" is like follows:
> > Total size:    89114318394 B (Total open files size: 19845943808 B)
> >  Total dirs:    430
> >  Total files:   1761 (Files currently being written: 137)
> >  Total blocks (validated):      2691 (avg. block size 33115688 B) (Total
> > open file blocks (not validated): 309)
> >  Minimally replicated blocks:   2691 (100.0 %)
> >  Over-replicated blocks:        0 (0.0 %)
> >  Under-replicated blocks:       0 (0.0 %)
> >  Mis-replicated blocks:         0 (0.0 %)
> >  Default replication factor:    3
> >  Average block replication:     2.802304
> >  Corrupt blocks:                0
> >  Missing replicas:              0 (0.0 %)
> >  Number of data-nodes:          76
> >  Number of racks:               1
> >
> > Is this problem possibly due to the stuck communication between the
> actual
> > task (the mapper) and the tasktracker? From the logs, we cannot see
> > anything
> > after the stuck.
> >
>
> The TT and JT logs would show if there is a lost communication. Enable
> DEBUG
> logging for the processes and keep a tab.
>
>
> >
> >
> > fromAmandeep Khurana <am...@gmail.com>
> > reply-tocommon-user@hadoop.apache.org
> > tocommon-user@hadoop.apache.org
> > dateMon, Nov 2, 2009 at 4:36 PMsubjectRe: too many 100% mapper does not
> > complete / finish / commitmailing list<common-user.hadoop.apache.org>
> > Filter
> > messages from this mailing
> > listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
> > from this mailing-list
> > hide details 4:36 PM (1 hour ago)
> > Did you try to add any logging and see what keys are they getting stuck
> on
> > or whats the last keys it processed? Do the same number of mappers get
> > stuck
> > every time?
> >
> > Not having reducers is not a problem. Its pretty normal to do that.
> >
> > fromAmogh Vasekar <am...@yahoo-inc.com>
> > reply-tocommon-user@hadoop.apache.org
> > to"common-user@hadoop.apache.org" <co...@hadoop.apache.org>
> > dateMon, Nov 2, 2009 at 4:50 PMsubjectRe: too many 100% mapper does not
> > complete / finish / commitmailing list<common-user.hadoop.apache.org>
> > Filter
> > messages from this mailing
> > listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
> > from this mailing-list
> > hide details 4:50 PM (1 hour ago)
> >
> > Hi,
> > Quick questions...
> > Are you creating too many small files?
> > Are there any task side files being created?
> > Is the heap for NN having enough space to list metadata? Any details on
> its
> > general health will probably be helpful to people on the list.
> >
> > Amogh
> > Best regards,
> > Zhang Bingjun (Eddy)
> >
> > E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
> > Tel No: +65-96188110 (M)
> >
> >
> > On Mon, Nov 2, 2009 at 4:51 PM, Palleti, Pallavi <
> > pallavi.palleti@corp.aol.com> wrote:
> >
> > > Hi Eddy,
> > >
> > > I faced similar issue when I used pig script for fetching webpages for
> > > certain urls. I could see the map phase showing100% and it is still
> > > running. As I was logging the page that it is currently fetching, I
> > > could see the process hasn't yet finished. It might be the same issue.
> > > So, you can add logging to check whether it is actually stuck or the
> > > process is still going on.
> > >
> > > Thanks
> > > Pallavi
> > >
> > > ________________________________
> > >
> > > From: Zhang Bingjun (Eddy) [mailto:eddymier@gmail.com]
> > > Sent: Monday, November 02, 2009 2:03 PM
> > > To: common-user@hadoop.apache.org; hdfs-user@hadoop.apache.org;
> > > mapreduce-user@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
> > > Subject: too many 100% mapper does not complete / finish / commit
> > >
> > >
> > > Dear hadoop fellows,
> > >
> > > We have been using Hadoop-0.20.1 MapReduce to crawl some web data. In
> > > this case, we only have mappers to crawl data and save data into HDFS
> in
> > > a distributed way. No reducers is specified in the job conf.
> > >
> > > The problem is that for every job we have about one third mappers stuck
> > > with 100% progress but never complete. If we look at the the
> tasktracker
> > > log of those mappers, the last log was the key input INFO log line and
> > > no others logs were output after that.
> > >
> > > From the stdout log of a specific attempt of one of those mappers, we
> > > can see that the map function of the mapper has been finished
> completely
> > > and the control of the execution should be somewhere in the MapReduce
> > > framework part.
> > >
> > > Does anyone have any clue about this problem? Is it because we didn't
> > > use any reducers? Since two thirds of the mappers could complete
> > > successfully and commit their output data into HDFS, I suspect the
> stuck
> > > mappers has something to do with the MapReduce framework code?
> > >
> > > Any input will be appreciated. Thanks a lot!
> > >
> > > Best regards,
> > > Zhang Bingjun (Eddy)
> > >
> > > E-mail: eddymier@gmail.com, bingjun@nus.edu.sg,
> bingjun@comp.nus.edu.sg
> > > Tel No: +65-96188110 (M)
> > >
> > >
> >
>

Re: too many 100% mapper does not complete / finish / commit

Posted by Amandeep Khurana <am...@gmail.com>.
On Mon, Nov 2, 2009 at 2:40 AM, Zhang Bingjun (Eddy) <ed...@gmail.com>wrote:

> Hi Pallavi, Khurana, and Vasekar,
>
> Thanks a lot for your reply. To make up, the mapper we are using is the
> multithreaded mapper.
>

How are you doing this? Did you your own MapRunnable?


>
> To answer your questions:
>
> Pallavi, Khurana: I have checked the logs. The key it got stuck on is the
> last key it reads in. Since the progress is 100% I suppose the key is the
> last key? From the stdout log of our mapper, we are confirmed that the map
> function of the mapper has completed. After that, no more key was read in
> and no other progress is made by the mapper, which means it didn't complete
> / commit being 100%. For each job, we have different number of mapper got
> stuck. But it is roughly about one third to half mappers. From the stdout
> logs of our mapper, we are also confirmed that the map function of the
> mapper has finished. That's why we started to suspect the MapReduce
> framework has something to do with the stuck problem.
>
> Here is log from the stdout:
> [entry] [293419] <track><name>i bealive</name><artist>Simian Mobile
> Disco</artist></track>
> [0] [293419] start creating objects
> [1] [293419] start parsing xml
> [2] [293419] start updating data
> [sleep] [228312]
> [error] [228312] java.io.IOException: [error] [228312] reaches the maximum
> number of attempts whiling updating
> [3] [228312] start collecting output228312
> [3.1 done with null] [228312] done228312
> [fail] [228312] java.io.IOException: 3.1 throw null228312
> [done] [228312] done228312
> [sleep] [293419]
> [error] [293419] java.io.IOException: [error] [293419] reaches the maximum
> number of attempts whiling updating
> [3] [293419] start collecting output293419
> [3.1 done with null] [293419] done293419
> [fail] [293419] java.io.IOException: 3.1 throw null293419
> [done] [293419] done293419
>
> Here is the log from tasktracker:
> 2009-11-02 16:58:23,518 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_200911021416_0001_m_000047_1 1.0% name: 梟 artist: Plastic Tree
> 2009-11-02 16:58:50,527 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_200911021416_0001_m_000047_1 1.0% name: Zydeko artist: Cirque du
> Soleil
> 2009-11-02 16:59:23,539 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
> www.China.ie
> 2009-11-02 16:59:50,550 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
> www.China.ie
> 2009-11-02 17:00:11,560 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
> Mobile Disco
> 2009-11-02 17:00:23,565 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
> Mobile Disco
> 2009-11-02 17:01:11,585 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
> Mobile Disco
>
> From these logs, we can see that the last read in entry is "i bealive
> artist: Simian Mobile Disco" the last process entry in the mapper is the
> same as this entry and from the stdout log, we can see the map function has
> finished....
>

Put some stdout or logging code towards the end of the mapper and also check
if all threads are coming back. Do you think it could be some issue with the
threads?


> Vasekar: The HDFS is healthy. We didn't store too many small files in it
> yet. The return of command "hadoop fsck /" is like follows:
> Total size:    89114318394 B (Total open files size: 19845943808 B)
>  Total dirs:    430
>  Total files:   1761 (Files currently being written: 137)
>  Total blocks (validated):      2691 (avg. block size 33115688 B) (Total
> open file blocks (not validated): 309)
>  Minimally replicated blocks:   2691 (100.0 %)
>  Over-replicated blocks:        0 (0.0 %)
>  Under-replicated blocks:       0 (0.0 %)
>  Mis-replicated blocks:         0 (0.0 %)
>  Default replication factor:    3
>  Average block replication:     2.802304
>  Corrupt blocks:                0
>  Missing replicas:              0 (0.0 %)
>  Number of data-nodes:          76
>  Number of racks:               1
>
> Is this problem possibly due to the stuck communication between the actual
> task (the mapper) and the tasktracker? From the logs, we cannot see
> anything
> after the stuck.
>

The TT and JT logs would show if there is a lost communication. Enable DEBUG
logging for the processes and keep a tab.


>
>
> fromAmandeep Khurana <am...@gmail.com>
> reply-tocommon-user@hadoop.apache.org
> tocommon-user@hadoop.apache.org
> dateMon, Nov 2, 2009 at 4:36 PMsubjectRe: too many 100% mapper does not
> complete / finish / commitmailing list<common-user.hadoop.apache.org>
> Filter
> messages from this mailing
> listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
> from this mailing-list
> hide details 4:36 PM (1 hour ago)
> Did you try to add any logging and see what keys are they getting stuck on
> or whats the last keys it processed? Do the same number of mappers get
> stuck
> every time?
>
> Not having reducers is not a problem. Its pretty normal to do that.
>
> fromAmogh Vasekar <am...@yahoo-inc.com>
> reply-tocommon-user@hadoop.apache.org
> to"common-user@hadoop.apache.org" <co...@hadoop.apache.org>
> dateMon, Nov 2, 2009 at 4:50 PMsubjectRe: too many 100% mapper does not
> complete / finish / commitmailing list<common-user.hadoop.apache.org>
> Filter
> messages from this mailing
> listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
> from this mailing-list
> hide details 4:50 PM (1 hour ago)
>
> Hi,
> Quick questions...
> Are you creating too many small files?
> Are there any task side files being created?
> Is the heap for NN having enough space to list metadata? Any details on its
> general health will probably be helpful to people on the list.
>
> Amogh
> Best regards,
> Zhang Bingjun (Eddy)
>
> E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
> Tel No: +65-96188110 (M)
>
>
> On Mon, Nov 2, 2009 at 4:51 PM, Palleti, Pallavi <
> pallavi.palleti@corp.aol.com> wrote:
>
> > Hi Eddy,
> >
> > I faced similar issue when I used pig script for fetching webpages for
> > certain urls. I could see the map phase showing100% and it is still
> > running. As I was logging the page that it is currently fetching, I
> > could see the process hasn't yet finished. It might be the same issue.
> > So, you can add logging to check whether it is actually stuck or the
> > process is still going on.
> >
> > Thanks
> > Pallavi
> >
> > ________________________________
> >
> > From: Zhang Bingjun (Eddy) [mailto:eddymier@gmail.com]
> > Sent: Monday, November 02, 2009 2:03 PM
> > To: common-user@hadoop.apache.org; hdfs-user@hadoop.apache.org;
> > mapreduce-user@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
> > Subject: too many 100% mapper does not complete / finish / commit
> >
> >
> > Dear hadoop fellows,
> >
> > We have been using Hadoop-0.20.1 MapReduce to crawl some web data. In
> > this case, we only have mappers to crawl data and save data into HDFS in
> > a distributed way. No reducers is specified in the job conf.
> >
> > The problem is that for every job we have about one third mappers stuck
> > with 100% progress but never complete. If we look at the the tasktracker
> > log of those mappers, the last log was the key input INFO log line and
> > no others logs were output after that.
> >
> > From the stdout log of a specific attempt of one of those mappers, we
> > can see that the map function of the mapper has been finished completely
> > and the control of the execution should be somewhere in the MapReduce
> > framework part.
> >
> > Does anyone have any clue about this problem? Is it because we didn't
> > use any reducers? Since two thirds of the mappers could complete
> > successfully and commit their output data into HDFS, I suspect the stuck
> > mappers has something to do with the MapReduce framework code?
> >
> > Any input will be appreciated. Thanks a lot!
> >
> > Best regards,
> > Zhang Bingjun (Eddy)
> >
> > E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
> > Tel No: +65-96188110 (M)
> >
> >
>

Re: too many 100% mapper does not complete / finish / commit

Posted by "Zhang Bingjun (Eddy)" <ed...@gmail.com>.
Hi Pallavi, Khurana, and Vasekar,

Thanks a lot for your reply. To make up, the mapper we are using is the
multithreaded mapper.

To answer your questions:

Pallavi, Khurana: I have checked the logs. The key it got stuck on is the
last key it reads in. Since the progress is 100% I suppose the key is the
last key? From the stdout log of our mapper, we are confirmed that the map
function of the mapper has completed. After that, no more key was read in
and no other progress is made by the mapper, which means it didn't complete
/ commit being 100%. For each job, we have different number of mapper got
stuck. But it is roughly about one third to half mappers. From the stdout
logs of our mapper, we are also confirmed that the map function of the
mapper has finished. That's why we started to suspect the MapReduce
framework has something to do with the stuck problem.

Here is log from the stdout:
[entry] [293419] <track><name>i bealive</name><artist>Simian Mobile
Disco</artist></track>
[0] [293419] start creating objects
[1] [293419] start parsing xml
[2] [293419] start updating data
[sleep] [228312]
[error] [228312] java.io.IOException: [error] [228312] reaches the maximum
number of attempts whiling updating
[3] [228312] start collecting output228312
[3.1 done with null] [228312] done228312
[fail] [228312] java.io.IOException: 3.1 throw null228312
[done] [228312] done228312
[sleep] [293419]
[error] [293419] java.io.IOException: [error] [293419] reaches the maximum
number of attempts whiling updating
[3] [293419] start collecting output293419
[3.1 done with null] [293419] done293419
[fail] [293419] java.io.IOException: 3.1 throw null293419
[done] [293419] done293419

Here is the log from tasktracker:
2009-11-02 16:58:23,518 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: 梟 artist: Plastic Tree
2009-11-02 16:58:50,527 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: Zydeko artist: Cirque du
Soleil
2009-11-02 16:59:23,539 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
www.China.ie
2009-11-02 16:59:50,550 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
www.China.ie
2009-11-02 17:00:11,560 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
Mobile Disco
2009-11-02 17:00:23,565 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
Mobile Disco
2009-11-02 17:01:11,585 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
Mobile Disco

>From these logs, we can see that the last read in entry is "i bealive
artist: Simian Mobile Disco" the last process entry in the mapper is the
same as this entry and from the stdout log, we can see the map function has
finished....

Vasekar: The HDFS is healthy. We didn't store too many small files in it
yet. The return of command "hadoop fsck /" is like follows:
Total size:    89114318394 B (Total open files size: 19845943808 B)
 Total dirs:    430
 Total files:   1761 (Files currently being written: 137)
 Total blocks (validated):      2691 (avg. block size 33115688 B) (Total
open file blocks (not validated): 309)
 Minimally replicated blocks:   2691 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.802304
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          76
 Number of racks:               1

Is this problem possibly due to the stuck communication between the actual
task (the mapper) and the tasktracker? From the logs, we cannot see anything
after the stuck.


fromAmandeep Khurana <am...@gmail.com>reply-tocommon-user@hadoop.apache.org
tocommon-user@hadoop.apache.org
dateMon, Nov 2, 2009 at 4:36 PMsubjectRe: too many 100% mapper does not
complete / finish / commitmailing list<common-user.hadoop.apache.org> Filter
messages from this mailing listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
from this mailing-list
hide details 4:36 PM (1 hour ago)
Did you try to add any logging and see what keys are they getting stuck on
or whats the last keys it processed? Do the same number of mappers get stuck
every time?

Not having reducers is not a problem. Its pretty normal to do that.

fromAmogh Vasekar <am...@yahoo-inc.com>reply-tocommon-user@hadoop.apache.org
to"common-user@hadoop.apache.org" <co...@hadoop.apache.org>
dateMon, Nov 2, 2009 at 4:50 PMsubjectRe: too many 100% mapper does not
complete / finish / commitmailing list<common-user.hadoop.apache.org> Filter
messages from this mailing listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
from this mailing-list
hide details 4:50 PM (1 hour ago)

Hi,
Quick questions...
Are you creating too many small files?
Are there any task side files being created?
Is the heap for NN having enough space to list metadata? Any details on its
general health will probably be helpful to people on the list.

Amogh
Best regards,
Zhang Bingjun (Eddy)

E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
Tel No: +65-96188110 (M)


On Mon, Nov 2, 2009 at 4:51 PM, Palleti, Pallavi <
pallavi.palleti@corp.aol.com> wrote:

> Hi Eddy,
>
> I faced similar issue when I used pig script for fetching webpages for
> certain urls. I could see the map phase showing100% and it is still
> running. As I was logging the page that it is currently fetching, I
> could see the process hasn't yet finished. It might be the same issue.
> So, you can add logging to check whether it is actually stuck or the
> process is still going on.
>
> Thanks
> Pallavi
>
> ________________________________
>
> From: Zhang Bingjun (Eddy) [mailto:eddymier@gmail.com]
> Sent: Monday, November 02, 2009 2:03 PM
> To: common-user@hadoop.apache.org; hdfs-user@hadoop.apache.org;
> mapreduce-user@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
> Subject: too many 100% mapper does not complete / finish / commit
>
>
> Dear hadoop fellows,
>
> We have been using Hadoop-0.20.1 MapReduce to crawl some web data. In
> this case, we only have mappers to crawl data and save data into HDFS in
> a distributed way. No reducers is specified in the job conf.
>
> The problem is that for every job we have about one third mappers stuck
> with 100% progress but never complete. If we look at the the tasktracker
> log of those mappers, the last log was the key input INFO log line and
> no others logs were output after that.
>
> From the stdout log of a specific attempt of one of those mappers, we
> can see that the map function of the mapper has been finished completely
> and the control of the execution should be somewhere in the MapReduce
> framework part.
>
> Does anyone have any clue about this problem? Is it because we didn't
> use any reducers? Since two thirds of the mappers could complete
> successfully and commit their output data into HDFS, I suspect the stuck
> mappers has something to do with the MapReduce framework code?
>
> Any input will be appreciated. Thanks a lot!
>
> Best regards,
> Zhang Bingjun (Eddy)
>
> E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
> Tel No: +65-96188110 (M)
>
>

Re: too many 100% mapper does not complete / finish / commit

Posted by "Zhang Bingjun (Eddy)" <ed...@gmail.com>.
Hi Pallavi, Khurana, and Vasekar,

Thanks a lot for your reply. To make up, the mapper we are using is the
multithreaded mapper.

To answer your questions:

Pallavi, Khurana: I have checked the logs. The key it got stuck on is the
last key it reads in. Since the progress is 100% I suppose the key is the
last key? From the stdout log of our mapper, we are confirmed that the map
function of the mapper has completed. After that, no more key was read in
and no other progress is made by the mapper, which means it didn't complete
/ commit being 100%. For each job, we have different number of mapper got
stuck. But it is roughly about one third to half mappers. From the stdout
logs of our mapper, we are also confirmed that the map function of the
mapper has finished. That's why we started to suspect the MapReduce
framework has something to do with the stuck problem.

Here is log from the stdout:
[entry] [293419] <track><name>i bealive</name><artist>Simian Mobile
Disco</artist></track>
[0] [293419] start creating objects
[1] [293419] start parsing xml
[2] [293419] start updating data
[sleep] [228312]
[error] [228312] java.io.IOException: [error] [228312] reaches the maximum
number of attempts whiling updating
[3] [228312] start collecting output228312
[3.1 done with null] [228312] done228312
[fail] [228312] java.io.IOException: 3.1 throw null228312
[done] [228312] done228312
[sleep] [293419]
[error] [293419] java.io.IOException: [error] [293419] reaches the maximum
number of attempts whiling updating
[3] [293419] start collecting output293419
[3.1 done with null] [293419] done293419
[fail] [293419] java.io.IOException: 3.1 throw null293419
[done] [293419] done293419

Here is the log from tasktracker:
2009-11-02 16:58:23,518 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: 梟 artist: Plastic Tree
2009-11-02 16:58:50,527 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: Zydeko artist: Cirque du
Soleil
2009-11-02 16:59:23,539 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
www.China.ie
2009-11-02 16:59:50,550 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
www.China.ie
2009-11-02 17:00:11,560 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
Mobile Disco
2009-11-02 17:00:23,565 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
Mobile Disco
2009-11-02 17:01:11,585 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
Mobile Disco

>From these logs, we can see that the last read in entry is "i bealive
artist: Simian Mobile Disco" the last process entry in the mapper is the
same as this entry and from the stdout log, we can see the map function has
finished....

Vasekar: The HDFS is healthy. We didn't store too many small files in it
yet. The return of command "hadoop fsck /" is like follows:
Total size:    89114318394 B (Total open files size: 19845943808 B)
 Total dirs:    430
 Total files:   1761 (Files currently being written: 137)
 Total blocks (validated):      2691 (avg. block size 33115688 B) (Total
open file blocks (not validated): 309)
 Minimally replicated blocks:   2691 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.802304
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          76
 Number of racks:               1

Is this problem possibly due to the stuck communication between the actual
task (the mapper) and the tasktracker? From the logs, we cannot see anything
after the stuck.


fromAmandeep Khurana <am...@gmail.com>reply-tocommon-user@hadoop.apache.org
tocommon-user@hadoop.apache.org
dateMon, Nov 2, 2009 at 4:36 PMsubjectRe: too many 100% mapper does not
complete / finish / commitmailing list<common-user.hadoop.apache.org> Filter
messages from this mailing listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
from this mailing-list
hide details 4:36 PM (1 hour ago)
Did you try to add any logging and see what keys are they getting stuck on
or whats the last keys it processed? Do the same number of mappers get stuck
every time?

Not having reducers is not a problem. Its pretty normal to do that.

fromAmogh Vasekar <am...@yahoo-inc.com>reply-tocommon-user@hadoop.apache.org
to"common-user@hadoop.apache.org" <co...@hadoop.apache.org>
dateMon, Nov 2, 2009 at 4:50 PMsubjectRe: too many 100% mapper does not
complete / finish / commitmailing list<common-user.hadoop.apache.org> Filter
messages from this mailing listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
from this mailing-list
hide details 4:50 PM (1 hour ago)

Hi,
Quick questions...
Are you creating too many small files?
Are there any task side files being created?
Is the heap for NN having enough space to list metadata? Any details on its
general health will probably be helpful to people on the list.

Amogh
Best regards,
Zhang Bingjun (Eddy)

E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
Tel No: +65-96188110 (M)


On Mon, Nov 2, 2009 at 4:51 PM, Palleti, Pallavi <
pallavi.palleti@corp.aol.com> wrote:

> Hi Eddy,
>
> I faced similar issue when I used pig script for fetching webpages for
> certain urls. I could see the map phase showing100% and it is still
> running. As I was logging the page that it is currently fetching, I
> could see the process hasn't yet finished. It might be the same issue.
> So, you can add logging to check whether it is actually stuck or the
> process is still going on.
>
> Thanks
> Pallavi
>
> ________________________________
>
> From: Zhang Bingjun (Eddy) [mailto:eddymier@gmail.com]
> Sent: Monday, November 02, 2009 2:03 PM
> To: common-user@hadoop.apache.org; hdfs-user@hadoop.apache.org;
> mapreduce-user@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
> Subject: too many 100% mapper does not complete / finish / commit
>
>
> Dear hadoop fellows,
>
> We have been using Hadoop-0.20.1 MapReduce to crawl some web data. In
> this case, we only have mappers to crawl data and save data into HDFS in
> a distributed way. No reducers is specified in the job conf.
>
> The problem is that for every job we have about one third mappers stuck
> with 100% progress but never complete. If we look at the the tasktracker
> log of those mappers, the last log was the key input INFO log line and
> no others logs were output after that.
>
> From the stdout log of a specific attempt of one of those mappers, we
> can see that the map function of the mapper has been finished completely
> and the control of the execution should be somewhere in the MapReduce
> framework part.
>
> Does anyone have any clue about this problem? Is it because we didn't
> use any reducers? Since two thirds of the mappers could complete
> successfully and commit their output data into HDFS, I suspect the stuck
> mappers has something to do with the MapReduce framework code?
>
> Any input will be appreciated. Thanks a lot!
>
> Best regards,
> Zhang Bingjun (Eddy)
>
> E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
> Tel No: +65-96188110 (M)
>
>

Re: too many 100% mapper does not complete / finish / commit

Posted by "Zhang Bingjun (Eddy)" <ed...@gmail.com>.
Hi Pallavi, Khurana, and Vasekar,

Thanks a lot for your reply. To make up, the mapper we are using is the
multithreaded mapper.

To answer your questions:

Pallavi, Khurana: I have checked the logs. The key it got stuck on is the
last key it reads in. Since the progress is 100% I suppose the key is the
last key? From the stdout log of our mapper, we are confirmed that the map
function of the mapper has completed. After that, no more key was read in
and no other progress is made by the mapper, which means it didn't complete
/ commit being 100%. For each job, we have different number of mapper got
stuck. But it is roughly about one third to half mappers. From the stdout
logs of our mapper, we are also confirmed that the map function of the
mapper has finished. That's why we started to suspect the MapReduce
framework has something to do with the stuck problem.

Here is log from the stdout:
[entry] [293419] <track><name>i bealive</name><artist>Simian Mobile
Disco</artist></track>
[0] [293419] start creating objects
[1] [293419] start parsing xml
[2] [293419] start updating data
[sleep] [228312]
[error] [228312] java.io.IOException: [error] [228312] reaches the maximum
number of attempts whiling updating
[3] [228312] start collecting output228312
[3.1 done with null] [228312] done228312
[fail] [228312] java.io.IOException: 3.1 throw null228312
[done] [228312] done228312
[sleep] [293419]
[error] [293419] java.io.IOException: [error] [293419] reaches the maximum
number of attempts whiling updating
[3] [293419] start collecting output293419
[3.1 done with null] [293419] done293419
[fail] [293419] java.io.IOException: 3.1 throw null293419
[done] [293419] done293419

Here is the log from tasktracker:
2009-11-02 16:58:23,518 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: 梟 artist: Plastic Tree
2009-11-02 16:58:50,527 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: Zydeko artist: Cirque du
Soleil
2009-11-02 16:59:23,539 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
www.China.ie
2009-11-02 16:59:50,550 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
www.China.ie
2009-11-02 17:00:11,560 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
Mobile Disco
2009-11-02 17:00:23,565 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
Mobile Disco
2009-11-02 17:01:11,585 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
Mobile Disco

>From these logs, we can see that the last read in entry is "i bealive
artist: Simian Mobile Disco" the last process entry in the mapper is the
same as this entry and from the stdout log, we can see the map function has
finished....

Vasekar: The HDFS is healthy. We didn't store too many small files in it
yet. The return of command "hadoop fsck /" is like follows:
Total size:    89114318394 B (Total open files size: 19845943808 B)
 Total dirs:    430
 Total files:   1761 (Files currently being written: 137)
 Total blocks (validated):      2691 (avg. block size 33115688 B) (Total
open file blocks (not validated): 309)
 Minimally replicated blocks:   2691 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.802304
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          76
 Number of racks:               1

Is this problem possibly due to the stuck communication between the actual
task (the mapper) and the tasktracker? From the logs, we cannot see anything
after the stuck.


fromAmandeep Khurana <am...@gmail.com>reply-tocommon-user@hadoop.apache.org
tocommon-user@hadoop.apache.org
dateMon, Nov 2, 2009 at 4:36 PMsubjectRe: too many 100% mapper does not
complete / finish / commitmailing list<common-user.hadoop.apache.org> Filter
messages from this mailing listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
from this mailing-list
hide details 4:36 PM (1 hour ago)
Did you try to add any logging and see what keys are they getting stuck on
or whats the last keys it processed? Do the same number of mappers get stuck
every time?

Not having reducers is not a problem. Its pretty normal to do that.

fromAmogh Vasekar <am...@yahoo-inc.com>reply-tocommon-user@hadoop.apache.org
to"common-user@hadoop.apache.org" <co...@hadoop.apache.org>
dateMon, Nov 2, 2009 at 4:50 PMsubjectRe: too many 100% mapper does not
complete / finish / commitmailing list<common-user.hadoop.apache.org> Filter
messages from this mailing listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
from this mailing-list
hide details 4:50 PM (1 hour ago)

Hi,
Quick questions...
Are you creating too many small files?
Are there any task side files being created?
Is the heap for NN having enough space to list metadata? Any details on its
general health will probably be helpful to people on the list.

Amogh
Best regards,
Zhang Bingjun (Eddy)

E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
Tel No: +65-96188110 (M)


On Mon, Nov 2, 2009 at 4:51 PM, Palleti, Pallavi <
pallavi.palleti@corp.aol.com> wrote:

> Hi Eddy,
>
> I faced similar issue when I used pig script for fetching webpages for
> certain urls. I could see the map phase showing100% and it is still
> running. As I was logging the page that it is currently fetching, I
> could see the process hasn't yet finished. It might be the same issue.
> So, you can add logging to check whether it is actually stuck or the
> process is still going on.
>
> Thanks
> Pallavi
>
> ________________________________
>
> From: Zhang Bingjun (Eddy) [mailto:eddymier@gmail.com]
> Sent: Monday, November 02, 2009 2:03 PM
> To: common-user@hadoop.apache.org; hdfs-user@hadoop.apache.org;
> mapreduce-user@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
> Subject: too many 100% mapper does not complete / finish / commit
>
>
> Dear hadoop fellows,
>
> We have been using Hadoop-0.20.1 MapReduce to crawl some web data. In
> this case, we only have mappers to crawl data and save data into HDFS in
> a distributed way. No reducers is specified in the job conf.
>
> The problem is that for every job we have about one third mappers stuck
> with 100% progress but never complete. If we look at the the tasktracker
> log of those mappers, the last log was the key input INFO log line and
> no others logs were output after that.
>
> From the stdout log of a specific attempt of one of those mappers, we
> can see that the map function of the mapper has been finished completely
> and the control of the execution should be somewhere in the MapReduce
> framework part.
>
> Does anyone have any clue about this problem? Is it because we didn't
> use any reducers? Since two thirds of the mappers could complete
> successfully and commit their output data into HDFS, I suspect the stuck
> mappers has something to do with the MapReduce framework code?
>
> Any input will be appreciated. Thanks a lot!
>
> Best regards,
> Zhang Bingjun (Eddy)
>
> E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
> Tel No: +65-96188110 (M)
>
>

Re: too many 100% mapper does not complete / finish / commit

Posted by "Zhang Bingjun (Eddy)" <ed...@gmail.com>.
Hi Pallavi, Khurana, and Vasekar,

Thanks a lot for your reply. To make up, the mapper we are using is the
multithreaded mapper.

To answer your questions:

Pallavi, Khurana: I have checked the logs. The key it got stuck on is the
last key it reads in. Since the progress is 100% I suppose the key is the
last key? From the stdout log of our mapper, we are confirmed that the map
function of the mapper has completed. After that, no more key was read in
and no other progress is made by the mapper, which means it didn't complete
/ commit being 100%. For each job, we have different number of mapper got
stuck. But it is roughly about one third to half mappers. From the stdout
logs of our mapper, we are also confirmed that the map function of the
mapper has finished. That's why we started to suspect the MapReduce
framework has something to do with the stuck problem.

Here is log from the stdout:
[entry] [293419] <track><name>i bealive</name><artist>Simian Mobile
Disco</artist></track>
[0] [293419] start creating objects
[1] [293419] start parsing xml
[2] [293419] start updating data
[sleep] [228312]
[error] [228312] java.io.IOException: [error] [228312] reaches the maximum
number of attempts whiling updating
[3] [228312] start collecting output228312
[3.1 done with null] [228312] done228312
[fail] [228312] java.io.IOException: 3.1 throw null228312
[done] [228312] done228312
[sleep] [293419]
[error] [293419] java.io.IOException: [error] [293419] reaches the maximum
number of attempts whiling updating
[3] [293419] start collecting output293419
[3.1 done with null] [293419] done293419
[fail] [293419] java.io.IOException: 3.1 throw null293419
[done] [293419] done293419

Here is the log from tasktracker:
2009-11-02 16:58:23,518 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: 梟 artist: Plastic Tree
2009-11-02 16:58:50,527 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: Zydeko artist: Cirque du
Soleil
2009-11-02 16:59:23,539 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
www.China.ie
2009-11-02 16:59:50,550 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: www.China.ie artist:
www.China.ie
2009-11-02 17:00:11,560 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
Mobile Disco
2009-11-02 17:00:23,565 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
Mobile Disco
2009-11-02 17:01:11,585 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200911021416_0001_m_000047_1 1.0% name: i bealive artist: Simian
Mobile Disco

>From these logs, we can see that the last read in entry is "i bealive
artist: Simian Mobile Disco" the last process entry in the mapper is the
same as this entry and from the stdout log, we can see the map function has
finished....

Vasekar: The HDFS is healthy. We didn't store too many small files in it
yet. The return of command "hadoop fsck /" is like follows:
Total size:    89114318394 B (Total open files size: 19845943808 B)
 Total dirs:    430
 Total files:   1761 (Files currently being written: 137)
 Total blocks (validated):      2691 (avg. block size 33115688 B) (Total
open file blocks (not validated): 309)
 Minimally replicated blocks:   2691 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.802304
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          76
 Number of racks:               1

Is this problem possibly due to the stuck communication between the actual
task (the mapper) and the tasktracker? From the logs, we cannot see anything
after the stuck.


fromAmandeep Khurana <am...@gmail.com>reply-tocommon-user@hadoop.apache.org
tocommon-user@hadoop.apache.org
dateMon, Nov 2, 2009 at 4:36 PMsubjectRe: too many 100% mapper does not
complete / finish / commitmailing list<common-user.hadoop.apache.org> Filter
messages from this mailing listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
from this mailing-list
hide details 4:36 PM (1 hour ago)
Did you try to add any logging and see what keys are they getting stuck on
or whats the last keys it processed? Do the same number of mappers get stuck
every time?

Not having reducers is not a problem. Its pretty normal to do that.

fromAmogh Vasekar <am...@yahoo-inc.com>reply-tocommon-user@hadoop.apache.org
to"common-user@hadoop.apache.org" <co...@hadoop.apache.org>
dateMon, Nov 2, 2009 at 4:50 PMsubjectRe: too many 100% mapper does not
complete / finish / commitmailing list<common-user.hadoop.apache.org> Filter
messages from this mailing listmailed-byhadoop.apache.orgunsubscribeUnsubscribe
from this mailing-list
hide details 4:50 PM (1 hour ago)

Hi,
Quick questions...
Are you creating too many small files?
Are there any task side files being created?
Is the heap for NN having enough space to list metadata? Any details on its
general health will probably be helpful to people on the list.

Amogh
Best regards,
Zhang Bingjun (Eddy)

E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
Tel No: +65-96188110 (M)


On Mon, Nov 2, 2009 at 4:51 PM, Palleti, Pallavi <
pallavi.palleti@corp.aol.com> wrote:

> Hi Eddy,
>
> I faced similar issue when I used pig script for fetching webpages for
> certain urls. I could see the map phase showing100% and it is still
> running. As I was logging the page that it is currently fetching, I
> could see the process hasn't yet finished. It might be the same issue.
> So, you can add logging to check whether it is actually stuck or the
> process is still going on.
>
> Thanks
> Pallavi
>
> ________________________________
>
> From: Zhang Bingjun (Eddy) [mailto:eddymier@gmail.com]
> Sent: Monday, November 02, 2009 2:03 PM
> To: common-user@hadoop.apache.org; hdfs-user@hadoop.apache.org;
> mapreduce-user@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
> Subject: too many 100% mapper does not complete / finish / commit
>
>
> Dear hadoop fellows,
>
> We have been using Hadoop-0.20.1 MapReduce to crawl some web data. In
> this case, we only have mappers to crawl data and save data into HDFS in
> a distributed way. No reducers is specified in the job conf.
>
> The problem is that for every job we have about one third mappers stuck
> with 100% progress but never complete. If we look at the the tasktracker
> log of those mappers, the last log was the key input INFO log line and
> no others logs were output after that.
>
> From the stdout log of a specific attempt of one of those mappers, we
> can see that the map function of the mapper has been finished completely
> and the control of the execution should be somewhere in the MapReduce
> framework part.
>
> Does anyone have any clue about this problem? Is it because we didn't
> use any reducers? Since two thirds of the mappers could complete
> successfully and commit their output data into HDFS, I suspect the stuck
> mappers has something to do with the MapReduce framework code?
>
> Any input will be appreciated. Thanks a lot!
>
> Best regards,
> Zhang Bingjun (Eddy)
>
> E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
> Tel No: +65-96188110 (M)
>
>

RE: too many 100% mapper does not complete / finish / commit

Posted by "Palleti, Pallavi" <pa...@corp.aol.com>.
Hi Eddy,
 
I faced similar issue when I used pig script for fetching webpages for
certain urls. I could see the map phase showing100% and it is still
running. As I was logging the page that it is currently fetching, I
could see the process hasn't yet finished. It might be the same issue.
So, you can add logging to check whether it is actually stuck or the
process is still going on.
 
Thanks
Pallavi

________________________________

From: Zhang Bingjun (Eddy) [mailto:eddymier@gmail.com] 
Sent: Monday, November 02, 2009 2:03 PM
To: common-user@hadoop.apache.org; hdfs-user@hadoop.apache.org;
mapreduce-user@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
Subject: too many 100% mapper does not complete / finish / commit


Dear hadoop fellows, 

We have been using Hadoop-0.20.1 MapReduce to crawl some web data. In
this case, we only have mappers to crawl data and save data into HDFS in
a distributed way. No reducers is specified in the job conf. 

The problem is that for every job we have about one third mappers stuck
with 100% progress but never complete. If we look at the the tasktracker
log of those mappers, the last log was the key input INFO log line and
no others logs were output after that. 

>From the stdout log of a specific attempt of one of those mappers, we
can see that the map function of the mapper has been finished completely
and the control of the execution should be somewhere in the MapReduce
framework part. 

Does anyone have any clue about this problem? Is it because we didn't
use any reducers? Since two thirds of the mappers could complete
successfully and commit their output data into HDFS, I suspect the stuck
mappers has something to do with the MapReduce framework code? 

Any input will be appreciated. Thanks a lot!

Best regards,
Zhang Bingjun (Eddy)

E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
Tel No: +65-96188110 (M)


RE: too many 100% mapper does not complete / finish / commit

Posted by "Palleti, Pallavi" <pa...@corp.aol.com>.
Hi Eddy,
 
I faced similar issue when I used pig script for fetching webpages for
certain urls. I could see the map phase showing100% and it is still
running. As I was logging the page that it is currently fetching, I
could see the process hasn't yet finished. It might be the same issue.
So, you can add logging to check whether it is actually stuck or the
process is still going on.
 
Thanks
Pallavi

________________________________

From: Zhang Bingjun (Eddy) [mailto:eddymier@gmail.com] 
Sent: Monday, November 02, 2009 2:03 PM
To: common-user@hadoop.apache.org; hdfs-user@hadoop.apache.org;
mapreduce-user@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
Subject: too many 100% mapper does not complete / finish / commit


Dear hadoop fellows, 

We have been using Hadoop-0.20.1 MapReduce to crawl some web data. In
this case, we only have mappers to crawl data and save data into HDFS in
a distributed way. No reducers is specified in the job conf. 

The problem is that for every job we have about one third mappers stuck
with 100% progress but never complete. If we look at the the tasktracker
log of those mappers, the last log was the key input INFO log line and
no others logs were output after that. 

>From the stdout log of a specific attempt of one of those mappers, we
can see that the map function of the mapper has been finished completely
and the control of the execution should be somewhere in the MapReduce
framework part. 

Does anyone have any clue about this problem? Is it because we didn't
use any reducers? Since two thirds of the mappers could complete
successfully and commit their output data into HDFS, I suspect the stuck
mappers has something to do with the MapReduce framework code? 

Any input will be appreciated. Thanks a lot!

Best regards,
Zhang Bingjun (Eddy)

E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
Tel No: +65-96188110 (M)


RE: too many 100% mapper does not complete / finish / commit

Posted by "Palleti, Pallavi" <pa...@corp.aol.com>.
Hi Eddy,
 
I faced similar issue when I used pig script for fetching webpages for
certain urls. I could see the map phase showing100% and it is still
running. As I was logging the page that it is currently fetching, I
could see the process hasn't yet finished. It might be the same issue.
So, you can add logging to check whether it is actually stuck or the
process is still going on.
 
Thanks
Pallavi

________________________________

From: Zhang Bingjun (Eddy) [mailto:eddymier@gmail.com] 
Sent: Monday, November 02, 2009 2:03 PM
To: common-user@hadoop.apache.org; hdfs-user@hadoop.apache.org;
mapreduce-user@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
Subject: too many 100% mapper does not complete / finish / commit


Dear hadoop fellows, 

We have been using Hadoop-0.20.1 MapReduce to crawl some web data. In
this case, we only have mappers to crawl data and save data into HDFS in
a distributed way. No reducers is specified in the job conf. 

The problem is that for every job we have about one third mappers stuck
with 100% progress but never complete. If we look at the the tasktracker
log of those mappers, the last log was the key input INFO log line and
no others logs were output after that. 

>From the stdout log of a specific attempt of one of those mappers, we
can see that the map function of the mapper has been finished completely
and the control of the execution should be somewhere in the MapReduce
framework part. 

Does anyone have any clue about this problem? Is it because we didn't
use any reducers? Since two thirds of the mappers could complete
successfully and commit their output data into HDFS, I suspect the stuck
mappers has something to do with the MapReduce framework code? 

Any input will be appreciated. Thanks a lot!

Best regards,
Zhang Bingjun (Eddy)

E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
Tel No: +65-96188110 (M)


RE: too many 100% mapper does not complete / finish / commit

Posted by "Palleti, Pallavi" <pa...@corp.aol.com>.
Hi Eddy,
 
I faced similar issue when I used pig script for fetching webpages for
certain urls. I could see the map phase showing100% and it is still
running. As I was logging the page that it is currently fetching, I
could see the process hasn't yet finished. It might be the same issue.
So, you can add logging to check whether it is actually stuck or the
process is still going on.
 
Thanks
Pallavi

________________________________

From: Zhang Bingjun (Eddy) [mailto:eddymier@gmail.com] 
Sent: Monday, November 02, 2009 2:03 PM
To: common-user@hadoop.apache.org; hdfs-user@hadoop.apache.org;
mapreduce-user@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
Subject: too many 100% mapper does not complete / finish / commit


Dear hadoop fellows, 

We have been using Hadoop-0.20.1 MapReduce to crawl some web data. In
this case, we only have mappers to crawl data and save data into HDFS in
a distributed way. No reducers is specified in the job conf. 

The problem is that for every job we have about one third mappers stuck
with 100% progress but never complete. If we look at the the tasktracker
log of those mappers, the last log was the key input INFO log line and
no others logs were output after that. 

>From the stdout log of a specific attempt of one of those mappers, we
can see that the map function of the mapper has been finished completely
and the control of the execution should be somewhere in the MapReduce
framework part. 

Does anyone have any clue about this problem? Is it because we didn't
use any reducers? Since two thirds of the mappers could complete
successfully and commit their output data into HDFS, I suspect the stuck
mappers has something to do with the MapReduce framework code? 

Any input will be appreciated. Thanks a lot!

Best regards,
Zhang Bingjun (Eddy)

E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
Tel No: +65-96188110 (M)


Re: too many 100% mapper does not complete / finish / commit

Posted by Amogh Vasekar <am...@yahoo-inc.com>.
Hi,
Quick questions...
Are you creating too many small files?
Are there any task side files being created?
Is the heap for NN having enough space to list metadata? Any details on its general health will probably be helpful to people on the list.

Amogh



On 11/2/09 2:02 PM, "Zhang Bingjun (Eddy)" <ed...@gmail.com> wrote:

Dear hadoop fellows,

We have been using Hadoop-0.20.1 MapReduce to crawl some web data. In this
case, we only have mappers to crawl data and save data into HDFS in a
distributed way. No reducers is specified in the job conf.

The problem is that for every job we have about one third mappers stuck with
100% progress but never complete. If we look at the the tasktracker log of
those mappers, the last log was the key input INFO log line and no others
logs were output after that.

>From the stdout log of a specific attempt of one of those mappers, we can
see that the map function of the mapper has been finished completely and the
control of the execution should be somewhere in the MapReduce framework
part.

Does anyone have any clue about this problem? Is it because we didn't use
any reducers? Since two thirds of the mappers could complete successfully
and commit their output data into HDFS, I suspect the stuck mappers has
something to do with the MapReduce framework code?

Any input will be appreciated. Thanks a lot!

Best regards,
Zhang Bingjun (Eddy)

E-mail: eddymier@gmail.com, bingjun@nus.edu.sg, bingjun@comp.nus.edu.sg
Tel No: +65-96188110 (M)