You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Matt Zytaruk <ma...@wavefire.com> on 2005/12/13 19:46:33 UTC

Map Reduce Errors

Hello all, I've been trying to parse a segment of data (probably around 
500k pages) I previously fetched, and everytime I try, I get an error. 
Below is the error given by the slaves. The master gives a similar 
error.  This usually happens late in the reduce phase, but has also 
happened during the map phase once. Any ideas what might be going on 
here? Network issues? bugs in the tracker?

Thanks for any help you might be able to give.
-matt zytaruk

Slaves:

060102 200647 task_m_bvkze5 Child Error
java.io.IOException: Task process exit with nonzero status.
        at org.apache.nutch.mapred.TaskRunner.runChild(TaskRunner.java:139)
        at org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92)
060102 200833 task_m_bvkze5 done; removing files.
060102 200855 Client connection to 64.141.15.126:8050: closing
java.lang.reflect.UndeclaredThrowableException
        at $Proxy0.pollForClosedTask(Unknown Source)
        at 
org.apache.nutch.mapred.TaskTracker.offerService(TaskTracker.java:241)
        at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:268)
        at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:633)
Caused by: java.io.IOException: timed out waiting for response
        at org.apache.nutch.ipc.Client.call(Client.java:296)
        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
        ... 4 more
060102 201229 Lost connection to JobTracker 
[crawler-d-01.internal.wavefire.ca/64.141.15.126:8050].  Retrying...

Master:
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
        at $Proxy0.getJobStatus(Unknown Source)
        at org.apache.nutch.mapred.JobClient.getJob(JobClient.java:272)
        at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:295)
        at org.apache.nutch.crawl.ParseSegment.parse(ParseSegment.java:91)
        at org.apache.nutch.crawl.ParseSegment.main(ParseSegment.java:110)
Caused by: java.io.IOException: timed out waiting for response
        at org.apache.nutch.ipc.Client.call(Client.java:296)
        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)

Re: Map Reduce Errors

Posted by Stefan Groschupf <sg...@media-style.com>.

Matt,
I can reproduce the error at least on my servers.
Timed out.java.io.IOException: Task process exit with nonzero status.  
at org.apache.nutch.mapred.TaskRunner.runChild(TaskRunner.java:139)  
at org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92)
I will try to reproduce the problem in a debugging environment today.

Stefan

Am 14.12.2005 um 00:24 schrieb Matt Zytaruk:

> Well, I tried it with the two machines that should be working fine,  
> and it still crapped out, though this time with a different exception.
>
> 060104 001346 task_m_y6k6jq Child Error
> java.lang.NullPointerException
>        at java.io.BufferedInputStream.read(BufferedInputStream.java: 
> 279)
>        at sun.nio.cs.StreamDecoder$CharsetSD.readBytes 
> (StreamDecoder.java:408)
>        at sun.nio.cs.StreamDecoder$CharsetSD.implRead 
> (StreamDecoder.java:450)
>        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:182)
>        at java.io.InputStreamReader.read(InputStreamReader.java:167)
>        at java.io.BufferedReader.fill(BufferedReader.java:136)
>        at java.io.BufferedReader.readLine(BufferedReader.java:299)
>        at java.io.BufferedReader.readLine(BufferedReader.java:362)
>        at org.apache.nutch.mapred.TaskRunner.logStream 
> (TaskRunner.java:164)
>        at org.apache.nutch.mapred.TaskRunner.runChild 
> (TaskRunner.java:136)
>        at org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92)
> 060104 001407 Server connection on port 49627 from 127.0.0.2: exiting
>
> -Matt Zytaruk
>
>
> Stefan Groschupf wrote:
>
>>> It do see stuff about crc's being ignored sometimes at the end  
>>> of  an operation. Is there a setting for this?
>>>
>>
>> Isn't it in the nutch-default.xml?
>>
>>> I have also just learned that the box i've been using as the job   
>>> tracker and ndfs name node has a wonky system timer. So maybe  
>>> that  is the problem. I'm currently getting a test running using  
>>> only the  other two machines.
>>
>> Well this could may be the problem, I'm not sure how java handles  
>> the  system current millis behind the sense. Since you got a time  
>> out  exception may this could be a problem.
>> Stefan
>>
>>>
>>> -Matt
>>>
>>>
>>>
>>> Stefan Groschupf wrote:
>>>
>>>> hmm, sounds strange, but I'm interested to dig to find the   
>>>> problem  source, since I'm very much interested to get 0.8  
>>>> stabile  asap.
>>>> However find such a problem source is a pain in the neck.
>>>> Do you use the latest sources from svn?
>>>> Do you ignore crc errors? Doug mentioned that he notice often    
>>>> problems with this.
>>>>
>>>>
>>>> Stefan
>>>>
>>>>
>>>> Am 13.12.2005 um 20:28 schrieb Matt Zytaruk:
>>>>
>>>>> I dont think the network settings are the problem, as I have   
>>>>> been  able to parse other segments using map reduce no  
>>>>> problem.  If it was  the network configuration, wouldn't it  
>>>>> never work?   However, things  do not seem to be stable, as  
>>>>> some operations in  ndfs will error,  and then I do the same  
>>>>> thing 5 minutes later  and it works fine.  Same with other  
>>>>> things, some crawls work  fine, others throw  exceptions and  
>>>>> crash (I actually had a crawl  crash with the same  problem as  
>>>>> below). This is using 3 Opteron  boxes running Suse Linux.
>>>>>
>>>>> -Matt Zytaruk
>>>>>
>>>>> Stefan Groschupf wrote:
>>>>>
>>>>>> Looks like a problem with the tcp ip communication.
>>>>>> Any firewalls running on the boxes? May any ports closed?
>>>>>> Are the dns names correct configured?
>>>>>>
>>>>>> Is your job tracker running stable?
>>>>>>
>>>>>> Stefan
>>>>>>
>>>>>> Am 13.12.2005 um 19:46 schrieb Matt Zytaruk:
>>>>>>
>>>>>>> Hello all, I've been trying to parse a segment of data   
>>>>>>> (probably   around 500k pages) I previously fetched, and   
>>>>>>> everytime I try, I  get  an error. Below is the error given  
>>>>>>> by  the slaves. The master  gives  a similar error.  This  
>>>>>>> usually  happens late in the reduce  phase,  but has also  
>>>>>>> happened  during the map phase once. Any  ideas what  might  
>>>>>>> be going on  here? Network issues? bugs in the  tracker?
>>>>>>>
>>>>>>> Thanks for any help you might be able to give.
>>>>>>> -matt zytaruk
>>>>>>>
>>>>>>> Slaves:
>>>>>>>
>>>>>>> 060102 200647 task_m_bvkze5 Child Error
>>>>>>> java.io.IOException: Task process exit with nonzero status.
>>>>>>>        at org.apache.nutch.mapred.TaskRunner.runChild    
>>>>>>> (TaskRunner.java:139)
>>>>>>>        at org.apache.nutch.mapred.TaskRunner.run  
>>>>>>> (TaskRunner.java:92)
>>>>>>> 060102 200833 task_m_bvkze5 done; removing files.
>>>>>>> 060102 200855 Client connection to 64.141.15.126:8050: closing
>>>>>>> java.lang.reflect.UndeclaredThrowableException
>>>>>>>        at $Proxy0.pollForClosedTask(Unknown Source)
>>>>>>>        at org.apache.nutch.mapred.TaskTracker.offerService    
>>>>>>> (TaskTracker.java:241)
>>>>>>>        at org.apache.nutch.mapred.TaskTracker.run   
>>>>>>> (TaskTracker.java: 268)
>>>>>>>        at org.apache.nutch.mapred.TaskTracker.main   
>>>>>>> (TaskTracker.java: 633)
>>>>>>> Caused by: java.io.IOException: timed out waiting for response
>>>>>>>        at org.apache.nutch.ipc.Client.call(Client.java:296)
>>>>>>>        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
>>>>>>>        ... 4 more
>>>>>>> 060102 201229 Lost connection to JobTracker [crawler-    
>>>>>>> d-01.internal.wavefire.ca/64.141.15.126:8050].  Retrying...
>>>>>>>
>>>>>>> Master:
>>>>>>> Exception in thread "main"     
>>>>>>> java.lang.reflect.UndeclaredThrowableException
>>>>>>>        at $Proxy0.getJobStatus(Unknown Source)
>>>>>>>        at org.apache.nutch.mapred.JobClient.getJob  
>>>>>>> (JobClient.java: 272)
>>>>>>>        at org.apache.nutch.mapred.JobClient.runJob  
>>>>>>> (JobClient.java: 295)
>>>>>>>        at org.apache.nutch.crawl.ParseSegment.parse    
>>>>>>> (ParseSegment.java:91)
>>>>>>>        at org.apache.nutch.crawl.ParseSegment.main    
>>>>>>> (ParseSegment.java:110)
>>>>>>> Caused by: java.io.IOException: timed out waiting for response
>>>>>>>        at org.apache.nutch.ipc.Client.call(Client.java:296)
>>>>>>>        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------
>>>>>> company:        http://www.media-style.com
>>>>>> forum:        http://www.text-mining.org
>>>>>> blog:            http://www.find23.net
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>> ---------------------------------------------------------------
>> company:        http://www.media-style.com
>> forum:        http://www.text-mining.org
>> blog:            http://www.find23.net
>>
>>
>>
>
>

---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net

Re: Map Reduce Errors

Posted by Matt Zytaruk <ma...@wavefire.com>.

Well, I tried it with the two machines that should be working fine, and 
it still crapped out, though this time with a different exception.

060104 001346 task_m_y6k6jq Child Error
java.lang.NullPointerException
        at java.io.BufferedInputStream.read(BufferedInputStream.java:279)
        at 
sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:408)
        at 
sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:450)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:182)
        at java.io.InputStreamReader.read(InputStreamReader.java:167)
        at java.io.BufferedReader.fill(BufferedReader.java:136)
        at java.io.BufferedReader.readLine(BufferedReader.java:299)
        at java.io.BufferedReader.readLine(BufferedReader.java:362)
        at org.apache.nutch.mapred.TaskRunner.logStream(TaskRunner.java:164)
        at org.apache.nutch.mapred.TaskRunner.runChild(TaskRunner.java:136)
        at org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92)
060104 001407 Server connection on port 49627 from 127.0.0.2: exiting

-Matt Zytaruk


Stefan Groschupf wrote:

>> It do see stuff about crc's being ignored sometimes at the end of  an 
>> operation. Is there a setting for this?
>>
>
> Isn't it in the nutch-default.xml?
>
>> I have also just learned that the box i've been using as the job  
>> tracker and ndfs name node has a wonky system timer. So maybe that  
>> is the problem. I'm currently getting a test running using only the  
>> other two machines.
>
> Well this could may be the problem, I'm not sure how java handles the  
> system current millis behind the sense. Since you got a time out  
> exception may this could be a problem.
> Stefan
>
>>
>> -Matt
>>
>>
>>
>> Stefan Groschupf wrote:
>>
>>> hmm, sounds strange, but I'm interested to dig to find the  problem  
>>> source, since I'm very much interested to get 0.8 stabile  asap.
>>> However find such a problem source is a pain in the neck.
>>> Do you use the latest sources from svn?
>>> Do you ignore crc errors? Doug mentioned that he notice often   
>>> problems with this.
>>>
>>>
>>> Stefan
>>>
>>>
>>> Am 13.12.2005 um 20:28 schrieb Matt Zytaruk:
>>>
>>>> I dont think the network settings are the problem, as I have  been  
>>>> able to parse other segments using map reduce no problem.  If it 
>>>> was  the network configuration, wouldn't it never work?   However, 
>>>> things  do not seem to be stable, as some operations in  ndfs will 
>>>> error,  and then I do the same thing 5 minutes later  and it works 
>>>> fine.  Same with other things, some crawls work  fine, others 
>>>> throw  exceptions and crash (I actually had a crawl  crash with the 
>>>> same  problem as below). This is using 3 Opteron  boxes running 
>>>> Suse Linux.
>>>>
>>>> -Matt Zytaruk
>>>>
>>>> Stefan Groschupf wrote:
>>>>
>>>>> Looks like a problem with the tcp ip communication.
>>>>> Any firewalls running on the boxes? May any ports closed?
>>>>> Are the dns names correct configured?
>>>>>
>>>>> Is your job tracker running stable?
>>>>>
>>>>> Stefan
>>>>>
>>>>> Am 13.12.2005 um 19:46 schrieb Matt Zytaruk:
>>>>>
>>>>>> Hello all, I've been trying to parse a segment of data  
>>>>>> (probably   around 500k pages) I previously fetched, and  
>>>>>> everytime I try, I  get  an error. Below is the error given by  
>>>>>> the slaves. The master  gives  a similar error.  This usually  
>>>>>> happens late in the reduce  phase,  but has also happened  during 
>>>>>> the map phase once. Any  ideas what  might be going on  here? 
>>>>>> Network issues? bugs in the  tracker?
>>>>>>
>>>>>> Thanks for any help you might be able to give.
>>>>>> -matt zytaruk
>>>>>>
>>>>>> Slaves:
>>>>>>
>>>>>> 060102 200647 task_m_bvkze5 Child Error
>>>>>> java.io.IOException: Task process exit with nonzero status.
>>>>>>        at org.apache.nutch.mapred.TaskRunner.runChild   
>>>>>> (TaskRunner.java:139)
>>>>>>        at org.apache.nutch.mapred.TaskRunner.run 
>>>>>> (TaskRunner.java:92)
>>>>>> 060102 200833 task_m_bvkze5 done; removing files.
>>>>>> 060102 200855 Client connection to 64.141.15.126:8050: closing
>>>>>> java.lang.reflect.UndeclaredThrowableException
>>>>>>        at $Proxy0.pollForClosedTask(Unknown Source)
>>>>>>        at org.apache.nutch.mapred.TaskTracker.offerService   
>>>>>> (TaskTracker.java:241)
>>>>>>        at org.apache.nutch.mapred.TaskTracker.run  
>>>>>> (TaskTracker.java: 268)
>>>>>>        at org.apache.nutch.mapred.TaskTracker.main  
>>>>>> (TaskTracker.java: 633)
>>>>>> Caused by: java.io.IOException: timed out waiting for response
>>>>>>        at org.apache.nutch.ipc.Client.call(Client.java:296)
>>>>>>        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
>>>>>>        ... 4 more
>>>>>> 060102 201229 Lost connection to JobTracker [crawler-   
>>>>>> d-01.internal.wavefire.ca/64.141.15.126:8050].  Retrying...
>>>>>>
>>>>>> Master:
>>>>>> Exception in thread "main"    
>>>>>> java.lang.reflect.UndeclaredThrowableException
>>>>>>        at $Proxy0.getJobStatus(Unknown Source)
>>>>>>        at org.apache.nutch.mapred.JobClient.getJob 
>>>>>> (JobClient.java: 272)
>>>>>>        at org.apache.nutch.mapred.JobClient.runJob 
>>>>>> (JobClient.java: 295)
>>>>>>        at org.apache.nutch.crawl.ParseSegment.parse   
>>>>>> (ParseSegment.java:91)
>>>>>>        at org.apache.nutch.crawl.ParseSegment.main   
>>>>>> (ParseSegment.java:110)
>>>>>> Caused by: java.io.IOException: timed out waiting for response
>>>>>>        at org.apache.nutch.ipc.Client.call(Client.java:296)
>>>>>>        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------
>>>>> company:        http://www.media-style.com
>>>>> forum:        http://www.text-mining.org
>>>>> blog:            http://www.find23.net
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>
> ---------------------------------------------------------------
> company:        http://www.media-style.com
> forum:        http://www.text-mining.org
> blog:            http://www.find23.net
>
>
>

Re: Map Reduce Errors

Posted by Stefan Groschupf <sg...@media-style.com>.

> It do see stuff about crc's being ignored sometimes at the end of  
> an operation. Is there a setting for this?
>

Isn't it in the nutch-default.xml?

> I have also just learned that the box i've been using as the job  
> tracker and ndfs name node has a wonky system timer. So maybe that  
> is the problem. I'm currently getting a test running using only the  
> other two machines.
Well this could may be the problem, I'm not sure how java handles the  
system current millis behind the sense. Since you got a time out  
exception may this could be a problem.
Stefan

>
> -Matt
>
>
>
> Stefan Groschupf wrote:
>
>> hmm, sounds strange, but I'm interested to dig to find the  
>> problem  source, since I'm very much interested to get 0.8 stabile  
>> asap.
>> However find such a problem source is a pain in the neck.
>> Do you use the latest sources from svn?
>> Do you ignore crc errors? Doug mentioned that he notice often   
>> problems with this.
>>
>>
>> Stefan
>>
>>
>> Am 13.12.2005 um 20:28 schrieb Matt Zytaruk:
>>
>>> I dont think the network settings are the problem, as I have  
>>> been  able to parse other segments using map reduce no problem.  
>>> If it was  the network configuration, wouldn't it never work?   
>>> However, things  do not seem to be stable, as some operations in  
>>> ndfs will error,  and then I do the same thing 5 minutes later  
>>> and it works fine.  Same with other things, some crawls work  
>>> fine, others throw  exceptions and crash (I actually had a crawl  
>>> crash with the same  problem as below). This is using 3 Opteron  
>>> boxes running Suse Linux.
>>>
>>> -Matt Zytaruk
>>>
>>> Stefan Groschupf wrote:
>>>
>>>> Looks like a problem with the tcp ip communication.
>>>> Any firewalls running on the boxes? May any ports closed?
>>>> Are the dns names correct configured?
>>>>
>>>> Is your job tracker running stable?
>>>>
>>>> Stefan
>>>>
>>>> Am 13.12.2005 um 19:46 schrieb Matt Zytaruk:
>>>>
>>>>> Hello all, I've been trying to parse a segment of data  
>>>>> (probably   around 500k pages) I previously fetched, and  
>>>>> everytime I try, I  get  an error. Below is the error given by  
>>>>> the slaves. The master  gives  a similar error.  This usually  
>>>>> happens late in the reduce  phase,  but has also happened  
>>>>> during the map phase once. Any  ideas what  might be going on  
>>>>> here? Network issues? bugs in the  tracker?
>>>>>
>>>>> Thanks for any help you might be able to give.
>>>>> -matt zytaruk
>>>>>
>>>>> Slaves:
>>>>>
>>>>> 060102 200647 task_m_bvkze5 Child Error
>>>>> java.io.IOException: Task process exit with nonzero status.
>>>>>        at org.apache.nutch.mapred.TaskRunner.runChild   
>>>>> (TaskRunner.java:139)
>>>>>        at org.apache.nutch.mapred.TaskRunner.run 
>>>>> (TaskRunner.java:92)
>>>>> 060102 200833 task_m_bvkze5 done; removing files.
>>>>> 060102 200855 Client connection to 64.141.15.126:8050: closing
>>>>> java.lang.reflect.UndeclaredThrowableException
>>>>>        at $Proxy0.pollForClosedTask(Unknown Source)
>>>>>        at org.apache.nutch.mapred.TaskTracker.offerService   
>>>>> (TaskTracker.java:241)
>>>>>        at org.apache.nutch.mapred.TaskTracker.run  
>>>>> (TaskTracker.java: 268)
>>>>>        at org.apache.nutch.mapred.TaskTracker.main  
>>>>> (TaskTracker.java: 633)
>>>>> Caused by: java.io.IOException: timed out waiting for response
>>>>>        at org.apache.nutch.ipc.Client.call(Client.java:296)
>>>>>        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
>>>>>        ... 4 more
>>>>> 060102 201229 Lost connection to JobTracker [crawler-   
>>>>> d-01.internal.wavefire.ca/64.141.15.126:8050].  Retrying...
>>>>>
>>>>> Master:
>>>>> Exception in thread "main"    
>>>>> java.lang.reflect.UndeclaredThrowableException
>>>>>        at $Proxy0.getJobStatus(Unknown Source)
>>>>>        at org.apache.nutch.mapred.JobClient.getJob 
>>>>> (JobClient.java: 272)
>>>>>        at org.apache.nutch.mapred.JobClient.runJob 
>>>>> (JobClient.java: 295)
>>>>>        at org.apache.nutch.crawl.ParseSegment.parse   
>>>>> (ParseSegment.java:91)
>>>>>        at org.apache.nutch.crawl.ParseSegment.main   
>>>>> (ParseSegment.java:110)
>>>>> Caused by: java.io.IOException: timed out waiting for response
>>>>>        at org.apache.nutch.ipc.Client.call(Client.java:296)
>>>>>        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------
>>>> company:        http://www.media-style.com
>>>> forum:        http://www.text-mining.org
>>>> blog:            http://www.find23.net
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>>
>
>

---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net

Re: Map Reduce Errors

Posted by Matt Zytaruk <ma...@wavefire.com>.

The source I'm using is about 2 weeks or so old.

It do see stuff about crc's being ignored sometimes at the end of an 
operation. Is there a setting for this?

I have also just learned that the box i've been using as the job tracker 
and ndfs name node has a wonky system timer. So maybe that is the 
problem. I'm currently getting a test running using only the other two 
machines.

-Matt



Stefan Groschupf wrote:

> hmm, sounds strange, but I'm interested to dig to find the problem  
> source, since I'm very much interested to get 0.8 stabile asap.
> However find such a problem source is a pain in the neck.
> Do you use the latest sources from svn?
> Do you ignore crc errors? Doug mentioned that he notice often  
> problems with this.
>
>
> Stefan
>
>
> Am 13.12.2005 um 20:28 schrieb Matt Zytaruk:
>
>> I dont think the network settings are the problem, as I have been  
>> able to parse other segments using map reduce no problem. If it was  
>> the network configuration, wouldn't it never work?  However, things  
>> do not seem to be stable, as some operations in ndfs will error,  and 
>> then I do the same thing 5 minutes later and it works fine.  Same 
>> with other things, some crawls work fine, others throw  exceptions 
>> and crash (I actually had a crawl crash with the same  problem as 
>> below). This is using 3 Opteron boxes running Suse Linux.
>>
>> -Matt Zytaruk
>>
>> Stefan Groschupf wrote:
>>
>>> Looks like a problem with the tcp ip communication.
>>> Any firewalls running on the boxes? May any ports closed?
>>> Are the dns names correct configured?
>>>
>>> Is your job tracker running stable?
>>>
>>> Stefan
>>>
>>> Am 13.12.2005 um 19:46 schrieb Matt Zytaruk:
>>>
>>>> Hello all, I've been trying to parse a segment of data (probably   
>>>> around 500k pages) I previously fetched, and everytime I try, I  
>>>> get  an error. Below is the error given by the slaves. The master  
>>>> gives  a similar error.  This usually happens late in the reduce  
>>>> phase,  but has also happened during the map phase once. Any  ideas 
>>>> what  might be going on here? Network issues? bugs in the  tracker?
>>>>
>>>> Thanks for any help you might be able to give.
>>>> -matt zytaruk
>>>>
>>>> Slaves:
>>>>
>>>> 060102 200647 task_m_bvkze5 Child Error
>>>> java.io.IOException: Task process exit with nonzero status.
>>>>        at org.apache.nutch.mapred.TaskRunner.runChild  
>>>> (TaskRunner.java:139)
>>>>        at org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92)
>>>> 060102 200833 task_m_bvkze5 done; removing files.
>>>> 060102 200855 Client connection to 64.141.15.126:8050: closing
>>>> java.lang.reflect.UndeclaredThrowableException
>>>>        at $Proxy0.pollForClosedTask(Unknown Source)
>>>>        at org.apache.nutch.mapred.TaskTracker.offerService  
>>>> (TaskTracker.java:241)
>>>>        at org.apache.nutch.mapred.TaskTracker.run 
>>>> (TaskTracker.java: 268)
>>>>        at org.apache.nutch.mapred.TaskTracker.main 
>>>> (TaskTracker.java: 633)
>>>> Caused by: java.io.IOException: timed out waiting for response
>>>>        at org.apache.nutch.ipc.Client.call(Client.java:296)
>>>>        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
>>>>        ... 4 more
>>>> 060102 201229 Lost connection to JobTracker [crawler-  
>>>> d-01.internal.wavefire.ca/64.141.15.126:8050].  Retrying...
>>>>
>>>> Master:
>>>> Exception in thread "main"   
>>>> java.lang.reflect.UndeclaredThrowableException
>>>>        at $Proxy0.getJobStatus(Unknown Source)
>>>>        at org.apache.nutch.mapred.JobClient.getJob(JobClient.java: 
>>>> 272)
>>>>        at org.apache.nutch.mapred.JobClient.runJob(JobClient.java: 
>>>> 295)
>>>>        at org.apache.nutch.crawl.ParseSegment.parse  
>>>> (ParseSegment.java:91)
>>>>        at org.apache.nutch.crawl.ParseSegment.main  
>>>> (ParseSegment.java:110)
>>>> Caused by: java.io.IOException: timed out waiting for response
>>>>        at org.apache.nutch.ipc.Client.call(Client.java:296)
>>>>        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------
>>> company:        http://www.media-style.com
>>> forum:        http://www.text-mining.org
>>> blog:            http://www.find23.net
>>>
>>>
>>>
>>
>>
>
>
>
>

Re: Map Reduce Errors

Posted by Stefan Groschupf <sg...@media-style.com>.

hmm, sounds strange, but I'm interested to dig to find the problem  
source, since I'm very much interested to get 0.8 stabile asap.
However find such a problem source is a pain in the neck.
Do you use the latest sources from svn?
Do you ignore crc errors? Doug mentioned that he notice often  
problems with this.


Stefan


Am 13.12.2005 um 20:28 schrieb Matt Zytaruk:

> I dont think the network settings are the problem, as I have been  
> able to parse other segments using map reduce no problem. If it was  
> the network configuration, wouldn't it never work?  However, things  
> do not seem to be stable, as some operations in ndfs will error,  
> and then I do the same thing 5 minutes later and it works fine.  
> Same with other things, some crawls work fine, others throw  
> exceptions and crash (I actually had a crawl crash with the same  
> problem as below). This is using 3 Opteron boxes running Suse Linux.
>
> -Matt Zytaruk
>
> Stefan Groschupf wrote:
>
>> Looks like a problem with the tcp ip communication.
>> Any firewalls running on the boxes? May any ports closed?
>> Are the dns names correct configured?
>>
>> Is your job tracker running stable?
>>
>> Stefan
>>
>> Am 13.12.2005 um 19:46 schrieb Matt Zytaruk:
>>
>>> Hello all, I've been trying to parse a segment of data (probably   
>>> around 500k pages) I previously fetched, and everytime I try, I  
>>> get  an error. Below is the error given by the slaves. The master  
>>> gives  a similar error.  This usually happens late in the reduce  
>>> phase,  but has also happened during the map phase once. Any  
>>> ideas what  might be going on here? Network issues? bugs in the  
>>> tracker?
>>>
>>> Thanks for any help you might be able to give.
>>> -matt zytaruk
>>>
>>> Slaves:
>>>
>>> 060102 200647 task_m_bvkze5 Child Error
>>> java.io.IOException: Task process exit with nonzero status.
>>>        at org.apache.nutch.mapred.TaskRunner.runChild  
>>> (TaskRunner.java:139)
>>>        at org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92)
>>> 060102 200833 task_m_bvkze5 done; removing files.
>>> 060102 200855 Client connection to 64.141.15.126:8050: closing
>>> java.lang.reflect.UndeclaredThrowableException
>>>        at $Proxy0.pollForClosedTask(Unknown Source)
>>>        at org.apache.nutch.mapred.TaskTracker.offerService  
>>> (TaskTracker.java:241)
>>>        at org.apache.nutch.mapred.TaskTracker.run 
>>> (TaskTracker.java: 268)
>>>        at org.apache.nutch.mapred.TaskTracker.main 
>>> (TaskTracker.java: 633)
>>> Caused by: java.io.IOException: timed out waiting for response
>>>        at org.apache.nutch.ipc.Client.call(Client.java:296)
>>>        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
>>>        ... 4 more
>>> 060102 201229 Lost connection to JobTracker [crawler-  
>>> d-01.internal.wavefire.ca/64.141.15.126:8050].  Retrying...
>>>
>>> Master:
>>> Exception in thread "main"   
>>> java.lang.reflect.UndeclaredThrowableException
>>>        at $Proxy0.getJobStatus(Unknown Source)
>>>        at org.apache.nutch.mapred.JobClient.getJob(JobClient.java: 
>>> 272)
>>>        at org.apache.nutch.mapred.JobClient.runJob(JobClient.java: 
>>> 295)
>>>        at org.apache.nutch.crawl.ParseSegment.parse  
>>> (ParseSegment.java:91)
>>>        at org.apache.nutch.crawl.ParseSegment.main  
>>> (ParseSegment.java:110)
>>> Caused by: java.io.IOException: timed out waiting for response
>>>        at org.apache.nutch.ipc.Client.call(Client.java:296)
>>>        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
>>>
>>>
>>
>> ---------------------------------------------------------------
>> company:        http://www.media-style.com
>> forum:        http://www.text-mining.org
>> blog:            http://www.find23.net
>>
>>
>>
>
>

Re: Map Reduce Errors

Posted by Matt Zytaruk <ma...@wavefire.com>.

I dont think the network settings are the problem, as I have been able 
to parse other segments using map reduce no problem. If it was the 
network configuration, wouldn't it never work?  However, things do not 
seem to be stable, as some operations in ndfs will error, and then I do 
the same thing 5 minutes later and it works fine. Same with other 
things, some crawls work fine, others throw exceptions and crash (I 
actually had a crawl crash with the same problem as below). This is 
using 3 Opteron boxes running Suse Linux.

-Matt Zytaruk

Stefan Groschupf wrote:

> Looks like a problem with the tcp ip communication.
> Any firewalls running on the boxes? May any ports closed?
> Are the dns names correct configured?
>
> Is your job tracker running stable?
>
> Stefan
>
> Am 13.12.2005 um 19:46 schrieb Matt Zytaruk:
>
>> Hello all, I've been trying to parse a segment of data (probably  
>> around 500k pages) I previously fetched, and everytime I try, I get  
>> an error. Below is the error given by the slaves. The master gives  a 
>> similar error.  This usually happens late in the reduce phase,  but 
>> has also happened during the map phase once. Any ideas what  might be 
>> going on here? Network issues? bugs in the tracker?
>>
>> Thanks for any help you might be able to give.
>> -matt zytaruk
>>
>> Slaves:
>>
>> 060102 200647 task_m_bvkze5 Child Error
>> java.io.IOException: Task process exit with nonzero status.
>>        at org.apache.nutch.mapred.TaskRunner.runChild 
>> (TaskRunner.java:139)
>>        at org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92)
>> 060102 200833 task_m_bvkze5 done; removing files.
>> 060102 200855 Client connection to 64.141.15.126:8050: closing
>> java.lang.reflect.UndeclaredThrowableException
>>        at $Proxy0.pollForClosedTask(Unknown Source)
>>        at org.apache.nutch.mapred.TaskTracker.offerService 
>> (TaskTracker.java:241)
>>        at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java: 268)
>>        at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java: 
>> 633)
>> Caused by: java.io.IOException: timed out waiting for response
>>        at org.apache.nutch.ipc.Client.call(Client.java:296)
>>        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
>>        ... 4 more
>> 060102 201229 Lost connection to JobTracker [crawler- 
>> d-01.internal.wavefire.ca/64.141.15.126:8050].  Retrying...
>>
>> Master:
>> Exception in thread "main"  
>> java.lang.reflect.UndeclaredThrowableException
>>        at $Proxy0.getJobStatus(Unknown Source)
>>        at org.apache.nutch.mapred.JobClient.getJob(JobClient.java:272)
>>        at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:295)
>>        at org.apache.nutch.crawl.ParseSegment.parse 
>> (ParseSegment.java:91)
>>        at org.apache.nutch.crawl.ParseSegment.main 
>> (ParseSegment.java:110)
>> Caused by: java.io.IOException: timed out waiting for response
>>        at org.apache.nutch.ipc.Client.call(Client.java:296)
>>        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
>>
>>
>
> ---------------------------------------------------------------
> company:        http://www.media-style.com
> forum:        http://www.text-mining.org
> blog:            http://www.find23.net
>
>
>

Re: Map Reduce Errors

Posted by Stefan Groschupf <sg...@media-style.com>.

Looks like a problem with the tcp ip communication.
Any firewalls running on the boxes? May any ports closed?
Are the dns names correct configured?

Is your job tracker running stable?

Stefan

Am 13.12.2005 um 19:46 schrieb Matt Zytaruk:

> Hello all, I've been trying to parse a segment of data (probably  
> around 500k pages) I previously fetched, and everytime I try, I get  
> an error. Below is the error given by the slaves. The master gives  
> a similar error.  This usually happens late in the reduce phase,  
> but has also happened during the map phase once. Any ideas what  
> might be going on here? Network issues? bugs in the tracker?
>
> Thanks for any help you might be able to give.
> -matt zytaruk
>
> Slaves:
>
> 060102 200647 task_m_bvkze5 Child Error
> java.io.IOException: Task process exit with nonzero status.
>        at org.apache.nutch.mapred.TaskRunner.runChild 
> (TaskRunner.java:139)
>        at org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92)
> 060102 200833 task_m_bvkze5 done; removing files.
> 060102 200855 Client connection to 64.141.15.126:8050: closing
> java.lang.reflect.UndeclaredThrowableException
>        at $Proxy0.pollForClosedTask(Unknown Source)
>        at org.apache.nutch.mapred.TaskTracker.offerService 
> (TaskTracker.java:241)
>        at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java: 
> 268)
>        at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java: 
> 633)
> Caused by: java.io.IOException: timed out waiting for response
>        at org.apache.nutch.ipc.Client.call(Client.java:296)
>        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
>        ... 4 more
> 060102 201229 Lost connection to JobTracker [crawler- 
> d-01.internal.wavefire.ca/64.141.15.126:8050].  Retrying...
>
> Master:
> Exception in thread "main"  
> java.lang.reflect.UndeclaredThrowableException
>        at $Proxy0.getJobStatus(Unknown Source)
>        at org.apache.nutch.mapred.JobClient.getJob(JobClient.java:272)
>        at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:295)
>        at org.apache.nutch.crawl.ParseSegment.parse 
> (ParseSegment.java:91)
>        at org.apache.nutch.crawl.ParseSegment.main 
> (ParseSegment.java:110)
> Caused by: java.io.IOException: timed out waiting for response
>        at org.apache.nutch.ipc.Client.call(Client.java:296)
>        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
>
>

---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net