You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/10/09 12:45:49 UTC

[jira] Created: (NUTCH-108) tasktracker crashs when reconnecting to a new jobtracker.

tasktracker crashs when reconnecting to a new jobtracker.
---------------------------------------------------------

         Key: NUTCH-108
         URL: http://issues.apache.org/jira/browse/NUTCH-108
     Project: Nutch
        Type: Bug
    Versions: 0.8-dev    
    Reporter: Stefan Groschupf
    Priority: Critical


051008 213532 Lost connection to JobTracker [/192.168.200.100:7020].  Retrying...
051008 213537 Client connection to 192.168.200.100:7020: starting
051008 213537 Client connection to 192.168.200.105:7030: closing
051008 213537 Server connection on port 7030 from 192.168.200.105: exiting
051008 213537 Server connection on port 7030 from 192.168.200.102: exiting
051008 213537 Client connection to 192.168.200.102:7030: closing
051008 213537 task_m_1iswra done; removing files.
051008 213537 Server connection on port 7030 from 192.168.200.101: exiting
051008 213537 Client connection to 192.168.200.101:7030: closing
Exception in thread "main" java.util.ConcurrentModificationException
        at java.util.TreeMap$EntryIterator.nextEntry(TreeMap.java:1026)
        at java.util.TreeMap$ValueIterator.next(TreeMap.java:1057)
        at org.apache.nutch.mapred.TaskTracker.close(TaskTracker.java:134)
        at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:285)
        at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:629)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Re: [jira] Created: (NUTCH-108) tasktracker crashs when reconnecting to a new jobtracker.

Posted by Stefan Groschupf <sg...@media-style.com>.
... looks like we only need to synchronized the initialize method as  
well. Right?
Can someone add this word just to the initialize method, or should I  
create a patch just for this world?

Stefan

> tasktracker crashs when reconnecting to a new jobtracker.
> ---------------------------------------------------------
>
>          Key: NUTCH-108
>          URL: http://issues.apache.org/jira/browse/NUTCH-108
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>     Priority: Critical
>
>
> 051008 213532 Lost connection to JobTracker [/ 
> 192.168.200.100:7020].  Retrying...
> 051008 213537 Client connection to 192.168.200.100:7020: starting
> 051008 213537 Client connection to 192.168.200.105:7030: closing
> 051008 213537 Server connection on port 7030 from 192.168.200.105:  
> exiting
> 051008 213537 Server connection on port 7030 from 192.168.200.102:  
> exiting
> 051008 213537 Client connection to 192.168.200.102:7030: closing
> 051008 213537 task_m_1iswra done; removing files.
> 051008 213537 Server connection on port 7030 from 192.168.200.101:  
> exiting
> 051008 213537 Client connection to 192.168.200.101:7030: closing
> Exception in thread "main" java.util.ConcurrentModificationException
>         at java.util.TreeMap$EntryIterator.nextEntry(TreeMap.java: 
> 1026)
>         at java.util.TreeMap$ValueIterator.next(TreeMap.java:1057)
>         at org.apache.nutch.mapred.TaskTracker.close 
> (TaskTracker.java:134)
>         at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java: 
> 285)
>         at org.apache.nutch.mapred.TaskTracker.main 
> (TaskTracker.java:629)
>
> -- 
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the  
> administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
>    http://www.atlassian.com/software/jira
>
>
>

---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net



[jira] Resolved: (NUTCH-108) tasktracker crashs when reconnecting to a new jobtracker.

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-108?page=all ]
     
Doug Cutting resolved NUTCH-108:
--------------------------------

    Fix Version: 0.8-dev
     Resolution: Fixed

I just committed this patch.  Thanks, Paul!

> tasktracker crashs when reconnecting to a new jobtracker.
> ---------------------------------------------------------
>
>          Key: NUTCH-108
>          URL: http://issues.apache.org/jira/browse/NUTCH-108
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: TaskTracker.java.patch
>
> 051008 213532 Lost connection to JobTracker [/192.168.200.100:7020].  Retrying...
> 051008 213537 Client connection to 192.168.200.100:7020: starting
> 051008 213537 Client connection to 192.168.200.105:7030: closing
> 051008 213537 Server connection on port 7030 from 192.168.200.105: exiting
> 051008 213537 Server connection on port 7030 from 192.168.200.102: exiting
> 051008 213537 Client connection to 192.168.200.102:7030: closing
> 051008 213537 task_m_1iswra done; removing files.
> 051008 213537 Server connection on port 7030 from 192.168.200.101: exiting
> 051008 213537 Client connection to 192.168.200.101:7030: closing
> Exception in thread "main" java.util.ConcurrentModificationException
>         at java.util.TreeMap$EntryIterator.nextEntry(TreeMap.java:1026)
>         at java.util.TreeMap$ValueIterator.next(TreeMap.java:1057)
>         at org.apache.nutch.mapred.TaskTracker.close(TaskTracker.java:134)
>         at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:285)
>         at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:629)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-108) tasktracker crashs when reconnecting to a new jobtracker.

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-108?page=comments#action_12332265 ] 

Doug Cutting commented on NUTCH-108:
------------------------------------

I think the patch is to replace the loop at the start of TaskTracker.close() with something like:

  while (tasks.size() != 0) {
    TaskInProgress tip = (TaskInProgress)tasks.first();
    tip.jobHasFinished();
  }

I have not yet had time to test this.


> tasktracker crashs when reconnecting to a new jobtracker.
> ---------------------------------------------------------
>
>          Key: NUTCH-108
>          URL: http://issues.apache.org/jira/browse/NUTCH-108
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>     Priority: Critical

>
> 051008 213532 Lost connection to JobTracker [/192.168.200.100:7020].  Retrying...
> 051008 213537 Client connection to 192.168.200.100:7020: starting
> 051008 213537 Client connection to 192.168.200.105:7030: closing
> 051008 213537 Server connection on port 7030 from 192.168.200.105: exiting
> 051008 213537 Server connection on port 7030 from 192.168.200.102: exiting
> 051008 213537 Client connection to 192.168.200.102:7030: closing
> 051008 213537 task_m_1iswra done; removing files.
> 051008 213537 Server connection on port 7030 from 192.168.200.101: exiting
> 051008 213537 Client connection to 192.168.200.101:7030: closing
> Exception in thread "main" java.util.ConcurrentModificationException
>         at java.util.TreeMap$EntryIterator.nextEntry(TreeMap.java:1026)
>         at java.util.TreeMap$ValueIterator.next(TreeMap.java:1057)
>         at org.apache.nutch.mapred.TaskTracker.close(TaskTracker.java:134)
>         at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:285)
>         at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:629)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (NUTCH-108) tasktracker crashs when reconnecting to a new jobtracker.

Posted by "Paul Baclace (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-108?page=all ]

Paul Baclace updated NUTCH-108:
-------------------------------

    Attachment: TaskTracker.java.patch

Here is a patch for reducing redundant, voluminous output while retrying to connect.  


> tasktracker crashs when reconnecting to a new jobtracker.
> ---------------------------------------------------------
>
>          Key: NUTCH-108
>          URL: http://issues.apache.org/jira/browse/NUTCH-108
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>     Priority: Critical
>  Attachments: TaskTracker.java.patch
>
> 051008 213532 Lost connection to JobTracker [/192.168.200.100:7020].  Retrying...
> 051008 213537 Client connection to 192.168.200.100:7020: starting
> 051008 213537 Client connection to 192.168.200.105:7030: closing
> 051008 213537 Server connection on port 7030 from 192.168.200.105: exiting
> 051008 213537 Server connection on port 7030 from 192.168.200.102: exiting
> 051008 213537 Client connection to 192.168.200.102:7030: closing
> 051008 213537 task_m_1iswra done; removing files.
> 051008 213537 Server connection on port 7030 from 192.168.200.101: exiting
> 051008 213537 Client connection to 192.168.200.101:7030: closing
> Exception in thread "main" java.util.ConcurrentModificationException
>         at java.util.TreeMap$EntryIterator.nextEntry(TreeMap.java:1026)
>         at java.util.TreeMap$ValueIterator.next(TreeMap.java:1057)
>         at org.apache.nutch.mapred.TaskTracker.close(TaskTracker.java:134)
>         at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:285)
>         at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:629)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-108) tasktracker crashs when reconnecting to a new jobtracker.

Posted by "Paul Baclace (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-108?page=comments#action_12361339 ] 

Paul Baclace commented on NUTCH-108:
------------------------------------

I just had the opportunity to test this with 33 tasktrackers.  

One thing I noticed:  TaskTracker.java should be patched to reduce the redundant, voluminous output (unnecessary stack trace every 5 sec.) from the retry loop. 

All of the tasktrackers are now able to successfully reconnect.



> tasktracker crashs when reconnecting to a new jobtracker.
> ---------------------------------------------------------
>
>          Key: NUTCH-108
>          URL: http://issues.apache.org/jira/browse/NUTCH-108
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>     Priority: Critical

>
> 051008 213532 Lost connection to JobTracker [/192.168.200.100:7020].  Retrying...
> 051008 213537 Client connection to 192.168.200.100:7020: starting
> 051008 213537 Client connection to 192.168.200.105:7030: closing
> 051008 213537 Server connection on port 7030 from 192.168.200.105: exiting
> 051008 213537 Server connection on port 7030 from 192.168.200.102: exiting
> 051008 213537 Client connection to 192.168.200.102:7030: closing
> 051008 213537 task_m_1iswra done; removing files.
> 051008 213537 Server connection on port 7030 from 192.168.200.101: exiting
> 051008 213537 Client connection to 192.168.200.101:7030: closing
> Exception in thread "main" java.util.ConcurrentModificationException
>         at java.util.TreeMap$EntryIterator.nextEntry(TreeMap.java:1026)
>         at java.util.TreeMap$ValueIterator.next(TreeMap.java:1057)
>         at org.apache.nutch.mapred.TaskTracker.close(TaskTracker.java:134)
>         at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:285)
>         at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:629)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-108) tasktracker crashs when reconnecting to a new jobtracker.

Posted by "Rod Taylor (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-108?page=comments#action_12332161 ] 

Rod Taylor commented on NUTCH-108:
----------------------------------

I have seen this as well.

When I took a look the JobTracker had knowledge of all of the events (via localhost:7845) but did not have any trackers connected to it. The trackers on all 5 machines had stopped running. After restarting the trackers the system continued from where it left off.

Snipped from one tracker log. All tracker logs looked similar.

051015 070222 task_m_abaf21 0.99999994% 30093 pages, 4546 errors, 14.9 pages/s, 1609 kb/s,
051015 070222 Task task_m_abaf21 is done.
051015 070222 Task task_m_abaf21 is done.
051015 070222 Server connection on port 52226 from 192.168.100.14: exiting
java.lang.reflect.UndeclaredThrowableException
        at $Proxy0.emitHeartbeat(Unknown Source)
        at org.apache.nutch.mapred.TaskTracker.offerService(TaskTracker.java:203)
        at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:268)
        at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:625)
Caused by: java.io.IOException: timed out waiting for response
        at org.apache.nutch.ipc.Client.call(Client.java:296)
        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
        ... 4 more
051015 071940 Lost connection to JobTracker [sbider5.sitebuildit.com/192.168.100.14:5464].  Retrying...
java.lang.reflect.UndeclaredThrowableException
        at $Proxy0.emitHeartbeat(Unknown Source)
        at org.apache.nutch.mapred.TaskTracker.offerService(TaskTracker.java:203)
        at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:268)
        at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:625)
Caused by: java.io.IOException: timed out waiting for response
        at org.apache.nutch.ipc.Client.call(Client.java:296)
        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
        ... 4 more
<-- SNIP -->
051015 081350 Lost connection to JobTracker [sbider5.sitebuildit.com/192.168.100.14:5464].  Retrying...
java.lang.reflect.UndeclaredThrowableException
        at $Proxy0.emitHeartbeat(Unknown Source)
        at org.apache.nutch.mapred.TaskTracker.offerService(TaskTracker.java:203)
        at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:268)
        at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:625)
Caused by: java.io.IOException: timed out waiting for response
        at org.apache.nutch.ipc.Client.call(Client.java:296)
        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
        ... 4 more
051015 081455 Lost connection to JobTracker [sbider5.sitebuildit.com/192.168.100.14:5464].  Retrying...
051015 081510 task_m_2j2jh0 done; removing files.
051015 081510 Server connection on port 41894 from 192.168.100.10: exiting
051015 081510 Client connection to 192.168.100.10:61734: closing
051015 081510 Client connection to 192.168.100.12:63227: closing
051015 081510 Server connection on port 41894 from 192.168.100.12: exiting
Exception in thread "main" java.util.ConcurrentModificationException
        at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1031)
        at java.util.TreeMap$ValueIterator.next(TreeMap.java:1064)
        at org.apache.nutch.mapred.TaskTracker.close(TaskTracker.java:130)
        at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:281)
        at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:625)




> tasktracker crashs when reconnecting to a new jobtracker.
> ---------------------------------------------------------
>
>          Key: NUTCH-108
>          URL: http://issues.apache.org/jira/browse/NUTCH-108
>      Project: Nutch
>         Type: Bug
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>     Priority: Critical

>
> 051008 213532 Lost connection to JobTracker [/192.168.200.100:7020].  Retrying...
> 051008 213537 Client connection to 192.168.200.100:7020: starting
> 051008 213537 Client connection to 192.168.200.105:7030: closing
> 051008 213537 Server connection on port 7030 from 192.168.200.105: exiting
> 051008 213537 Server connection on port 7030 from 192.168.200.102: exiting
> 051008 213537 Client connection to 192.168.200.102:7030: closing
> 051008 213537 task_m_1iswra done; removing files.
> 051008 213537 Server connection on port 7030 from 192.168.200.101: exiting
> 051008 213537 Client connection to 192.168.200.101:7030: closing
> Exception in thread "main" java.util.ConcurrentModificationException
>         at java.util.TreeMap$EntryIterator.nextEntry(TreeMap.java:1026)
>         at java.util.TreeMap$ValueIterator.next(TreeMap.java:1057)
>         at org.apache.nutch.mapred.TaskTracker.close(TaskTracker.java:134)
>         at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:285)
>         at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:629)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira