You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Mike Smith <mi...@gmail.com> on 2006/03/03 09:15:26 UTC
Re: Unable to complete a full fetch, reason Child Error

Hi Doug

I did some more testings using the last svn. Childs still die without any
clear log after a while.

I used two machines through Hadoop, both are datanode and tasktracker and
one is namenode and jobtracker. I started with 2000 seed nodes and it went
fine till 4th cycle, reached about 600,000 pages and the next round was for
3,000,000 pages to fetch. It failed again with this exception in the middle
of fetching:

060302 232934 task_m_7lbv7e  fetching
http://www.findarticles.com/p/articles/mi_m0KJI/is_9_115/ai_107836357
060302 232934 task_m_7lbv7e  fetching
http://www.wholehealthmd.com/hc/resourceareas_supp/1,1442,544,00.html
060302 232934 task_m_7lbv7e  fetching
http://www.dow.com/haltermann/products/d-petro.htm
060302 232934 task_m_7lbv7e 0.7877368% 700644 pages, 24594 errors,
14.0pages/s, 2254 kb/s,
060302 232934 task_m_7lbv7e  fetching
http://www.findarticles.com/p/articles/mi_hb3594/is_199510/ai_n8541042
060302 232934 task_m_7lbv7e Error reading child output
java.io.IOException: Bad file descriptor
        at java.io.FileInputStream.readBytes(Native Method)
        at java.io.FileInputStream.read(FileInputStream.java:194)
        at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java
:411)
        at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java
:453)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)
        at java.io.InputStreamReader.read(InputStreamReader.java:167)
        at java.io.BufferedReader.fill(BufferedReader.java:136)
        at java.io.BufferedReader.readLine(BufferedReader.java:299)
        at java.io.BufferedReader.readLine(BufferedReader.java:362)
        at org.apache.hadoop.mapred.TaskRunner.logStream(TaskRunner.java
:299)
        at org.apache.hadoop.mapred.TaskRunner.access$100(TaskRunner.java
:32)
        at org.apache.hadoop.mapred.TaskRunner$1.run(TaskRunner.java:266)
060302 232934 task_m_7lbv7e 0.7877451% 700644 pages, 24594 errors,
14.0pages/s, 2254 kb/s,
060302 232934 task_m_7lbv7e 0.7877451% 700644 pages, 24594 errors,
14.0pages/s, 2254 kb/s,
060302 232934 Server connection on port 50050 from 164.67.195.27: exiting
060302 232934 Server connection on port 50050 from 164.67.195.27: exiting
060302 232934 task_m_7lbv7e Child Error
java.io.IOException: Task process exit with nonzero status.
        at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:273)
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:145)
060302 232937 task_m_7lbv7e done; removing files.


And this is console output:



060303 010945  map 86%  reduce 0%
060303 012033  map 86%  reduce 6%
060303 012223  map 87%  reduce 6%
060303 014623  map 88%  reduce 6%
060303 021304  map 89%  reduce 6%
060303 022921  map 50%  reduce 0%
060303 022921 SEVERE error, caught Exception in main()
java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:366)
        at org.apache.nutch.fetcher.Fetcher.doMain(Fetcher.java:400)
        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:411)


This error has been around for large scale crawl since couple months ago. I
was wondering if anybody else has had the same issue for large scale crawl.

Thanks, Mike.






On 2/26/06, Gal Nitzan <gn...@usa.net> wrote:
>
> Still got the same...
>
> I'm not sure if it is relevant to this issue but the call you added to
> Fetcher.java:
>
>     job.setBoolean("mapred.speculative.execution", false);
>
> Doesn't work. All task trackers still fetch together though I have only
> 3 sites in the fetchlist.
>
> The task trackers fetch the same pages...
>
> I have used latest build from hadoop trunk.
>
> Gal.
>
>
> On Fri, 2006-02-24 at 14:15 -0800, Doug Cutting wrote:
> > Mike Smith wrote:
> > > 060219 142408 task_m_grycae  Parent died.  Exiting task_m_grycae
> >
> > This means the child process, executing the task, was unable to ping its
> > parent process (the task tracker).
> >
> > > 060219 142408 task_m_grycae Child Error
> > > java.io.IOException: Task process exit with nonzero status.
> > >         at org.apache.hadoop.mapred.TaskRunner.runChild(
> TaskRunner.java:144)
> > >         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:97)
> >
> > And this means that the parent was really still alive, and has noticed
> > that the child killed itself.
> >
> > It would be good to know how the child failed to contact its parent.  We
> > should probably log a stack trace when this happens.  I just made that
> > change in Hadoop and will propagate it to Nutch.
> >
> > Doug
> >
>
>
>