You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2005/10/09 12:45:49 UTC
[jira] Created: (NUTCH-108) tasktracker crashs when reconnecting to a new jobtracker.
tasktracker crashs when reconnecting to a new jobtracker.
---------------------------------------------------------
Key: NUTCH-108
URL: http://issues.apache.org/jira/browse/NUTCH-108
Project: Nutch
Type: Bug
Versions: 0.8-dev
Reporter: Stefan Groschupf
Priority: Critical
051008 213532 Lost connection to JobTracker [/192.168.200.100:7020]. Retrying...
051008 213537 Client connection to 192.168.200.100:7020: starting
051008 213537 Client connection to 192.168.200.105:7030: closing
051008 213537 Server connection on port 7030 from 192.168.200.105: exiting
051008 213537 Server connection on port 7030 from 192.168.200.102: exiting
051008 213537 Client connection to 192.168.200.102:7030: closing
051008 213537 task_m_1iswra done; removing files.
051008 213537 Server connection on port 7030 from 192.168.200.101: exiting
051008 213537 Client connection to 192.168.200.101:7030: closing
Exception in thread "main" java.util.ConcurrentModificationException
at java.util.TreeMap$EntryIterator.nextEntry(TreeMap.java:1026)
at java.util.TreeMap$ValueIterator.next(TreeMap.java:1057)
at org.apache.nutch.mapred.TaskTracker.close(TaskTracker.java:134)
at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:285)
at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:629)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
Re: [jira] Created: (NUTCH-108) tasktracker crashs when reconnecting to a new jobtracker.
Posted by Stefan Groschupf <sg...@media-style.com>.
... looks like we only need to synchronized the initialize method as
well. Right?
Can someone add this word just to the initialize method, or should I
create a patch just for this world?
Stefan
> tasktracker crashs when reconnecting to a new jobtracker.
> ---------------------------------------------------------
>
> Key: NUTCH-108
> URL: http://issues.apache.org/jira/browse/NUTCH-108
> Project: Nutch
> Type: Bug
> Versions: 0.8-dev
> Reporter: Stefan Groschupf
> Priority: Critical
>
>
> 051008 213532 Lost connection to JobTracker [/
> 192.168.200.100:7020]. Retrying...
> 051008 213537 Client connection to 192.168.200.100:7020: starting
> 051008 213537 Client connection to 192.168.200.105:7030: closing
> 051008 213537 Server connection on port 7030 from 192.168.200.105:
> exiting
> 051008 213537 Server connection on port 7030 from 192.168.200.102:
> exiting
> 051008 213537 Client connection to 192.168.200.102:7030: closing
> 051008 213537 task_m_1iswra done; removing files.
> 051008 213537 Server connection on port 7030 from 192.168.200.101:
> exiting
> 051008 213537 Client connection to 192.168.200.101:7030: closing
> Exception in thread "main" java.util.ConcurrentModificationException
> at java.util.TreeMap$EntryIterator.nextEntry(TreeMap.java:
> 1026)
> at java.util.TreeMap$ValueIterator.next(TreeMap.java:1057)
> at org.apache.nutch.mapred.TaskTracker.close
> (TaskTracker.java:134)
> at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:
> 285)
> at org.apache.nutch.mapred.TaskTracker.main
> (TaskTracker.java:629)
>
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the
> administrators:
> http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
> http://www.atlassian.com/software/jira
>
>
>
---------------------------------------------------------------
company: http://www.media-style.com
forum: http://www.text-mining.org
blog: http://www.find23.net
[jira] Resolved: (NUTCH-108) tasktracker crashs when reconnecting to a new jobtracker.
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-108?page=all ]
Doug Cutting resolved NUTCH-108:
--------------------------------
Fix Version: 0.8-dev
Resolution: Fixed
I just committed this patch. Thanks, Paul!
> tasktracker crashs when reconnecting to a new jobtracker.
> ---------------------------------------------------------
>
> Key: NUTCH-108
> URL: http://issues.apache.org/jira/browse/NUTCH-108
> Project: Nutch
> Type: Bug
> Versions: 0.8-dev
> Reporter: Stefan Groschupf
> Priority: Critical
> Fix For: 0.8-dev
> Attachments: TaskTracker.java.patch
>
> 051008 213532 Lost connection to JobTracker [/192.168.200.100:7020]. Retrying...
> 051008 213537 Client connection to 192.168.200.100:7020: starting
> 051008 213537 Client connection to 192.168.200.105:7030: closing
> 051008 213537 Server connection on port 7030 from 192.168.200.105: exiting
> 051008 213537 Server connection on port 7030 from 192.168.200.102: exiting
> 051008 213537 Client connection to 192.168.200.102:7030: closing
> 051008 213537 task_m_1iswra done; removing files.
> 051008 213537 Server connection on port 7030 from 192.168.200.101: exiting
> 051008 213537 Client connection to 192.168.200.101:7030: closing
> Exception in thread "main" java.util.ConcurrentModificationException
> at java.util.TreeMap$EntryIterator.nextEntry(TreeMap.java:1026)
> at java.util.TreeMap$ValueIterator.next(TreeMap.java:1057)
> at org.apache.nutch.mapred.TaskTracker.close(TaskTracker.java:134)
> at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:285)
> at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:629)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-108) tasktracker crashs when reconnecting to a new jobtracker.
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-108?page=comments#action_12332265 ]
Doug Cutting commented on NUTCH-108:
------------------------------------
I think the patch is to replace the loop at the start of TaskTracker.close() with something like:
while (tasks.size() != 0) {
TaskInProgress tip = (TaskInProgress)tasks.first();
tip.jobHasFinished();
}
I have not yet had time to test this.
> tasktracker crashs when reconnecting to a new jobtracker.
> ---------------------------------------------------------
>
> Key: NUTCH-108
> URL: http://issues.apache.org/jira/browse/NUTCH-108
> Project: Nutch
> Type: Bug
> Versions: 0.8-dev
> Reporter: Stefan Groschupf
> Priority: Critical
>
> 051008 213532 Lost connection to JobTracker [/192.168.200.100:7020]. Retrying...
> 051008 213537 Client connection to 192.168.200.100:7020: starting
> 051008 213537 Client connection to 192.168.200.105:7030: closing
> 051008 213537 Server connection on port 7030 from 192.168.200.105: exiting
> 051008 213537 Server connection on port 7030 from 192.168.200.102: exiting
> 051008 213537 Client connection to 192.168.200.102:7030: closing
> 051008 213537 task_m_1iswra done; removing files.
> 051008 213537 Server connection on port 7030 from 192.168.200.101: exiting
> 051008 213537 Client connection to 192.168.200.101:7030: closing
> Exception in thread "main" java.util.ConcurrentModificationException
> at java.util.TreeMap$EntryIterator.nextEntry(TreeMap.java:1026)
> at java.util.TreeMap$ValueIterator.next(TreeMap.java:1057)
> at org.apache.nutch.mapred.TaskTracker.close(TaskTracker.java:134)
> at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:285)
> at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:629)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Updated: (NUTCH-108) tasktracker crashs when reconnecting to a new jobtracker.
Posted by "Paul Baclace (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-108?page=all ]
Paul Baclace updated NUTCH-108:
-------------------------------
Attachment: TaskTracker.java.patch
Here is a patch for reducing redundant, voluminous output while retrying to connect.
> tasktracker crashs when reconnecting to a new jobtracker.
> ---------------------------------------------------------
>
> Key: NUTCH-108
> URL: http://issues.apache.org/jira/browse/NUTCH-108
> Project: Nutch
> Type: Bug
> Versions: 0.8-dev
> Reporter: Stefan Groschupf
> Priority: Critical
> Attachments: TaskTracker.java.patch
>
> 051008 213532 Lost connection to JobTracker [/192.168.200.100:7020]. Retrying...
> 051008 213537 Client connection to 192.168.200.100:7020: starting
> 051008 213537 Client connection to 192.168.200.105:7030: closing
> 051008 213537 Server connection on port 7030 from 192.168.200.105: exiting
> 051008 213537 Server connection on port 7030 from 192.168.200.102: exiting
> 051008 213537 Client connection to 192.168.200.102:7030: closing
> 051008 213537 task_m_1iswra done; removing files.
> 051008 213537 Server connection on port 7030 from 192.168.200.101: exiting
> 051008 213537 Client connection to 192.168.200.101:7030: closing
> Exception in thread "main" java.util.ConcurrentModificationException
> at java.util.TreeMap$EntryIterator.nextEntry(TreeMap.java:1026)
> at java.util.TreeMap$ValueIterator.next(TreeMap.java:1057)
> at org.apache.nutch.mapred.TaskTracker.close(TaskTracker.java:134)
> at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:285)
> at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:629)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-108) tasktracker crashs when reconnecting to a new jobtracker.
Posted by "Paul Baclace (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-108?page=comments#action_12361339 ]
Paul Baclace commented on NUTCH-108:
------------------------------------
I just had the opportunity to test this with 33 tasktrackers.
One thing I noticed: TaskTracker.java should be patched to reduce the redundant, voluminous output (unnecessary stack trace every 5 sec.) from the retry loop.
All of the tasktrackers are now able to successfully reconnect.
> tasktracker crashs when reconnecting to a new jobtracker.
> ---------------------------------------------------------
>
> Key: NUTCH-108
> URL: http://issues.apache.org/jira/browse/NUTCH-108
> Project: Nutch
> Type: Bug
> Versions: 0.8-dev
> Reporter: Stefan Groschupf
> Priority: Critical
>
> 051008 213532 Lost connection to JobTracker [/192.168.200.100:7020]. Retrying...
> 051008 213537 Client connection to 192.168.200.100:7020: starting
> 051008 213537 Client connection to 192.168.200.105:7030: closing
> 051008 213537 Server connection on port 7030 from 192.168.200.105: exiting
> 051008 213537 Server connection on port 7030 from 192.168.200.102: exiting
> 051008 213537 Client connection to 192.168.200.102:7030: closing
> 051008 213537 task_m_1iswra done; removing files.
> 051008 213537 Server connection on port 7030 from 192.168.200.101: exiting
> 051008 213537 Client connection to 192.168.200.101:7030: closing
> Exception in thread "main" java.util.ConcurrentModificationException
> at java.util.TreeMap$EntryIterator.nextEntry(TreeMap.java:1026)
> at java.util.TreeMap$ValueIterator.next(TreeMap.java:1057)
> at org.apache.nutch.mapred.TaskTracker.close(TaskTracker.java:134)
> at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:285)
> at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:629)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-108) tasktracker crashs when reconnecting to a new jobtracker.
Posted by "Rod Taylor (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-108?page=comments#action_12332161 ]
Rod Taylor commented on NUTCH-108:
----------------------------------
I have seen this as well.
When I took a look the JobTracker had knowledge of all of the events (via localhost:7845) but did not have any trackers connected to it. The trackers on all 5 machines had stopped running. After restarting the trackers the system continued from where it left off.
Snipped from one tracker log. All tracker logs looked similar.
051015 070222 task_m_abaf21 0.99999994% 30093 pages, 4546 errors, 14.9 pages/s, 1609 kb/s,
051015 070222 Task task_m_abaf21 is done.
051015 070222 Task task_m_abaf21 is done.
051015 070222 Server connection on port 52226 from 192.168.100.14: exiting
java.lang.reflect.UndeclaredThrowableException
at $Proxy0.emitHeartbeat(Unknown Source)
at org.apache.nutch.mapred.TaskTracker.offerService(TaskTracker.java:203)
at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:268)
at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:625)
Caused by: java.io.IOException: timed out waiting for response
at org.apache.nutch.ipc.Client.call(Client.java:296)
at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
... 4 more
051015 071940 Lost connection to JobTracker [sbider5.sitebuildit.com/192.168.100.14:5464]. Retrying...
java.lang.reflect.UndeclaredThrowableException
at $Proxy0.emitHeartbeat(Unknown Source)
at org.apache.nutch.mapred.TaskTracker.offerService(TaskTracker.java:203)
at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:268)
at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:625)
Caused by: java.io.IOException: timed out waiting for response
at org.apache.nutch.ipc.Client.call(Client.java:296)
at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
... 4 more
<-- SNIP -->
051015 081350 Lost connection to JobTracker [sbider5.sitebuildit.com/192.168.100.14:5464]. Retrying...
java.lang.reflect.UndeclaredThrowableException
at $Proxy0.emitHeartbeat(Unknown Source)
at org.apache.nutch.mapred.TaskTracker.offerService(TaskTracker.java:203)
at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:268)
at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:625)
Caused by: java.io.IOException: timed out waiting for response
at org.apache.nutch.ipc.Client.call(Client.java:296)
at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
... 4 more
051015 081455 Lost connection to JobTracker [sbider5.sitebuildit.com/192.168.100.14:5464]. Retrying...
051015 081510 task_m_2j2jh0 done; removing files.
051015 081510 Server connection on port 41894 from 192.168.100.10: exiting
051015 081510 Client connection to 192.168.100.10:61734: closing
051015 081510 Client connection to 192.168.100.12:63227: closing
051015 081510 Server connection on port 41894 from 192.168.100.12: exiting
Exception in thread "main" java.util.ConcurrentModificationException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1031)
at java.util.TreeMap$ValueIterator.next(TreeMap.java:1064)
at org.apache.nutch.mapred.TaskTracker.close(TaskTracker.java:130)
at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:281)
at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:625)
> tasktracker crashs when reconnecting to a new jobtracker.
> ---------------------------------------------------------
>
> Key: NUTCH-108
> URL: http://issues.apache.org/jira/browse/NUTCH-108
> Project: Nutch
> Type: Bug
> Versions: 0.8-dev
> Reporter: Stefan Groschupf
> Priority: Critical
>
> 051008 213532 Lost connection to JobTracker [/192.168.200.100:7020]. Retrying...
> 051008 213537 Client connection to 192.168.200.100:7020: starting
> 051008 213537 Client connection to 192.168.200.105:7030: closing
> 051008 213537 Server connection on port 7030 from 192.168.200.105: exiting
> 051008 213537 Server connection on port 7030 from 192.168.200.102: exiting
> 051008 213537 Client connection to 192.168.200.102:7030: closing
> 051008 213537 task_m_1iswra done; removing files.
> 051008 213537 Server connection on port 7030 from 192.168.200.101: exiting
> 051008 213537 Client connection to 192.168.200.101:7030: closing
> Exception in thread "main" java.util.ConcurrentModificationException
> at java.util.TreeMap$EntryIterator.nextEntry(TreeMap.java:1026)
> at java.util.TreeMap$ValueIterator.next(TreeMap.java:1057)
> at org.apache.nutch.mapred.TaskTracker.close(TaskTracker.java:134)
> at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:285)
> at org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:629)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira