You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org> on 2008/03/17 05:53:24 UTC

[jira] Created: (HADOOP-3027) JobTracker shuts down during initialization if the NameNode is down

JobTracker shuts down during initialization if the NameNode is down
-------------------------------------------------------------------

                 Key: HADOOP-3027
                 URL: https://issues.apache.org/jira/browse/HADOOP-3027
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
            Reporter: Amareshwari Sriramadasu
            Priority: Blocker
             Fix For: 0.17.0


When the JobTracker is initializing and trying to connect to the NameNode, it shuts itself down if the NameNode is unreachable for more than one iteration of the connect loop. It can be easily reproduced if the JobTracker is started before the NameNode is started. The JobTracker will shut itself down in a few seconds. The problem seems to be with adding a shutdown hook in the FileSystem in the case where the same hook has been added before.

2008-03-17 09:45:20,979 INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 9101
2008-03-17 09:45:20,979 INFO org.apache.hadoop.mapred.JobTracker: JobTracker webserver: 50030
2008-03-17 09:45:21,374 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
2008-03-17 09:45:22,377 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).
2008-03-17 09:45:23,380 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).
2008-03-17 09:45:24,383 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).
2008-03-17 09:45:25,385 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).
2008-03-17 09:45:26,388 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).
2008-03-17 09:45:27,391 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).
2008-03-17 09:45:28,394 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).
2008-03-17 09:45:29,397 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).
2008-03-17 09:45:30,402 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 10 time(s).
2008-03-17 09:45:31,406 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: /tmp/hadoop/mapred/system
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
	at sun.nio.ch.SocketAdaptor.connect(Unknown Source)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:623)
	at org.apache.hadoop.ipc.Client.call(Client.java:546)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:211)
	at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source)
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:312)
	at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:94)
	at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:158)
	at org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:69)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1255)
	at org.apache.hadoop.fs.FileSystem.access$400(FileSystem.java:53)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1272)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:96)
	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:702)
	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:135)
	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2266)
2008-03-17 09:45:41,410 FATAL org.apache.hadoop.mapred.JobTracker: java.lang.IllegalArgumentException: Hook previously registered
	at java.lang.ApplicationShutdownHooks.add(Unknown Source)
	at java.lang.Runtime.addShutdownHook(Unknown Source)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1269)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:96)
	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:702)
	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:135)
	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2266)

2008-03-17 09:45:41,412 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG: 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3027) JobTracker shuts down during initialization if the NameNode is down

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581922#action_12581922 ] 

Hudson commented on HADOOP-3027:
--------------------------------

Integrated in Hadoop-trunk #441 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/441/])

> JobTracker shuts down during initialization if the NameNode is down
> -------------------------------------------------------------------
>
>                 Key: HADOOP-3027
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3027
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, mapred
>    Affects Versions: 0.16.0
>            Reporter: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.16.2, 0.17.0
>
>         Attachments: patch-3027.txt
>
>
> When the JobTracker is initializing and trying to connect to the NameNode, it shuts itself down if the NameNode is unreachable for more than one iteration of the connect loop. It can be easily reproduced if the JobTracker is started before the NameNode is started. The JobTracker will shut itself down in a few seconds. The problem seems to be with adding a shutdown hook in the FileSystem in the case where the same hook has been added before.
> 2008-03-17 09:45:20,979 INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 9101
> 2008-03-17 09:45:20,979 INFO org.apache.hadoop.mapred.JobTracker: JobTracker webserver: 50030
> 2008-03-17 09:45:21,374 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
> 2008-03-17 09:45:22,377 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).
> 2008-03-17 09:45:23,380 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).
> 2008-03-17 09:45:24,383 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).
> 2008-03-17 09:45:25,385 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).
> 2008-03-17 09:45:26,388 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).
> 2008-03-17 09:45:27,391 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).
> 2008-03-17 09:45:28,394 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).
> 2008-03-17 09:45:29,397 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).
> 2008-03-17 09:45:30,402 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 10 time(s).
> 2008-03-17 09:45:31,406 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: /tmp/hadoop/mapred/system
> java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
> 	at sun.nio.ch.SocketAdaptor.connect(Unknown Source)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:623)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:546)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:211)
> 	at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:312)
> 	at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:94)
> 	at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:158)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:69)
> 	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1255)
> 	at org.apache.hadoop.fs.FileSystem.access$400(FileSystem.java:53)
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1272)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:96)
> 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:702)
> 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:135)
> 	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2266)
> 2008-03-17 09:45:41,410 FATAL org.apache.hadoop.mapred.JobTracker: java.lang.IllegalArgumentException: Hook previously registered
> 	at java.lang.ApplicationShutdownHooks.add(Unknown Source)
> 	at java.lang.Runtime.addShutdownHook(Unknown Source)
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1269)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:96)
> 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:702)
> 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:135)
> 	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2266)
> 2008-03-17 09:45:41,412 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG: 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3027) JobTracker shuts down during initialization if the NameNode is down

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HADOOP-3027:
--------------------------------------------

    Fix Version/s: 0.16.2
           Status: Patch Available  (was: Open)

> JobTracker shuts down during initialization if the NameNode is down
> -------------------------------------------------------------------
>
>                 Key: HADOOP-3027
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3027
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, mapred
>    Affects Versions: 0.16.0
>            Reporter: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.16.2, 0.17.0
>
>         Attachments: patch-3027.txt
>
>
> When the JobTracker is initializing and trying to connect to the NameNode, it shuts itself down if the NameNode is unreachable for more than one iteration of the connect loop. It can be easily reproduced if the JobTracker is started before the NameNode is started. The JobTracker will shut itself down in a few seconds. The problem seems to be with adding a shutdown hook in the FileSystem in the case where the same hook has been added before.
> 2008-03-17 09:45:20,979 INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 9101
> 2008-03-17 09:45:20,979 INFO org.apache.hadoop.mapred.JobTracker: JobTracker webserver: 50030
> 2008-03-17 09:45:21,374 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
> 2008-03-17 09:45:22,377 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).
> 2008-03-17 09:45:23,380 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).
> 2008-03-17 09:45:24,383 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).
> 2008-03-17 09:45:25,385 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).
> 2008-03-17 09:45:26,388 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).
> 2008-03-17 09:45:27,391 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).
> 2008-03-17 09:45:28,394 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).
> 2008-03-17 09:45:29,397 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).
> 2008-03-17 09:45:30,402 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 10 time(s).
> 2008-03-17 09:45:31,406 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: /tmp/hadoop/mapred/system
> java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
> 	at sun.nio.ch.SocketAdaptor.connect(Unknown Source)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:623)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:546)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:211)
> 	at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:312)
> 	at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:94)
> 	at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:158)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:69)
> 	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1255)
> 	at org.apache.hadoop.fs.FileSystem.access$400(FileSystem.java:53)
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1272)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:96)
> 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:702)
> 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:135)
> 	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2266)
> 2008-03-17 09:45:41,410 FATAL org.apache.hadoop.mapred.JobTracker: java.lang.IllegalArgumentException: Hook previously registered
> 	at java.lang.ApplicationShutdownHooks.add(Unknown Source)
> 	at java.lang.Runtime.addShutdownHook(Unknown Source)
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1269)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:96)
> 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:702)
> 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:135)
> 	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2266)
> 2008-03-17 09:45:41,412 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG: 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3027) JobTracker shuts down during initialization if the NameNode is down

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HADOOP-3027:
--------------------------------------------

    Attachment: patch-3027.txt

In FileSystem.Cache.get, if fs is null and cache is empty, a shutdown hook is added to close all filesystem. 

When Namenode is down and Jobtracker wants to connect, Filesystem cache being empty,  the shutdown hook was added during the first trial. Since NameNode is down, createFileSystem fails. When the jobtracker tries again, fs is null and cache is still empty; so, it wants to add shutdown hook again . Thus there is IllegalArgumentException saying Hook previously registered.
The solution could be add addShutdown hook once createFileSystem succeeds. Here is a patch doing the same.

> JobTracker shuts down during initialization if the NameNode is down
> -------------------------------------------------------------------
>
>                 Key: HADOOP-3027
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3027
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, mapred
>    Affects Versions: 0.16.0
>            Reporter: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.17.0
>
>         Attachments: patch-3027.txt
>
>
> When the JobTracker is initializing and trying to connect to the NameNode, it shuts itself down if the NameNode is unreachable for more than one iteration of the connect loop. It can be easily reproduced if the JobTracker is started before the NameNode is started. The JobTracker will shut itself down in a few seconds. The problem seems to be with adding a shutdown hook in the FileSystem in the case where the same hook has been added before.
> 2008-03-17 09:45:20,979 INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 9101
> 2008-03-17 09:45:20,979 INFO org.apache.hadoop.mapred.JobTracker: JobTracker webserver: 50030
> 2008-03-17 09:45:21,374 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
> 2008-03-17 09:45:22,377 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).
> 2008-03-17 09:45:23,380 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).
> 2008-03-17 09:45:24,383 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).
> 2008-03-17 09:45:25,385 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).
> 2008-03-17 09:45:26,388 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).
> 2008-03-17 09:45:27,391 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).
> 2008-03-17 09:45:28,394 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).
> 2008-03-17 09:45:29,397 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).
> 2008-03-17 09:45:30,402 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 10 time(s).
> 2008-03-17 09:45:31,406 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: /tmp/hadoop/mapred/system
> java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
> 	at sun.nio.ch.SocketAdaptor.connect(Unknown Source)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:623)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:546)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:211)
> 	at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:312)
> 	at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:94)
> 	at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:158)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:69)
> 	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1255)
> 	at org.apache.hadoop.fs.FileSystem.access$400(FileSystem.java:53)
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1272)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:96)
> 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:702)
> 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:135)
> 	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2266)
> 2008-03-17 09:45:41,410 FATAL org.apache.hadoop.mapred.JobTracker: java.lang.IllegalArgumentException: Hook previously registered
> 	at java.lang.ApplicationShutdownHooks.add(Unknown Source)
> 	at java.lang.Runtime.addShutdownHook(Unknown Source)
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1269)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:96)
> 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:702)
> 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:135)
> 	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2266)
> 2008-03-17 09:45:41,412 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG: 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3027) JobTracker shuts down during initialization if the NameNode is down

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581626#action_12581626 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-3027:
------------------------------------------------

+1 thanks, Amareshwari!

> JobTracker shuts down during initialization if the NameNode is down
> -------------------------------------------------------------------
>
>                 Key: HADOOP-3027
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3027
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, mapred
>    Affects Versions: 0.16.0
>            Reporter: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.16.2, 0.17.0
>
>         Attachments: patch-3027.txt
>
>
> When the JobTracker is initializing and trying to connect to the NameNode, it shuts itself down if the NameNode is unreachable for more than one iteration of the connect loop. It can be easily reproduced if the JobTracker is started before the NameNode is started. The JobTracker will shut itself down in a few seconds. The problem seems to be with adding a shutdown hook in the FileSystem in the case where the same hook has been added before.
> 2008-03-17 09:45:20,979 INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 9101
> 2008-03-17 09:45:20,979 INFO org.apache.hadoop.mapred.JobTracker: JobTracker webserver: 50030
> 2008-03-17 09:45:21,374 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
> 2008-03-17 09:45:22,377 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).
> 2008-03-17 09:45:23,380 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).
> 2008-03-17 09:45:24,383 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).
> 2008-03-17 09:45:25,385 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).
> 2008-03-17 09:45:26,388 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).
> 2008-03-17 09:45:27,391 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).
> 2008-03-17 09:45:28,394 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).
> 2008-03-17 09:45:29,397 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).
> 2008-03-17 09:45:30,402 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 10 time(s).
> 2008-03-17 09:45:31,406 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: /tmp/hadoop/mapred/system
> java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
> 	at sun.nio.ch.SocketAdaptor.connect(Unknown Source)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:623)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:546)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:211)
> 	at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:312)
> 	at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:94)
> 	at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:158)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:69)
> 	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1255)
> 	at org.apache.hadoop.fs.FileSystem.access$400(FileSystem.java:53)
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1272)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:96)
> 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:702)
> 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:135)
> 	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2266)
> 2008-03-17 09:45:41,410 FATAL org.apache.hadoop.mapred.JobTracker: java.lang.IllegalArgumentException: Hook previously registered
> 	at java.lang.ApplicationShutdownHooks.add(Unknown Source)
> 	at java.lang.Runtime.addShutdownHook(Unknown Source)
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1269)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:96)
> 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:702)
> 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:135)
> 	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2266)
> 2008-03-17 09:45:41,412 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG: 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-3027) JobTracker shuts down during initialization if the NameNode is down

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley reassigned HADOOP-3027:
-------------------------------------

    Assignee: Amareshwari Sriramadasu

> JobTracker shuts down during initialization if the NameNode is down
> -------------------------------------------------------------------
>
>                 Key: HADOOP-3027
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3027
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, mapred
>    Affects Versions: 0.16.0
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.16.2
>
>         Attachments: patch-3027.txt
>
>
> When the JobTracker is initializing and trying to connect to the NameNode, it shuts itself down if the NameNode is unreachable for more than one iteration of the connect loop. It can be easily reproduced if the JobTracker is started before the NameNode is started. The JobTracker will shut itself down in a few seconds. The problem seems to be with adding a shutdown hook in the FileSystem in the case where the same hook has been added before.
> 2008-03-17 09:45:20,979 INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 9101
> 2008-03-17 09:45:20,979 INFO org.apache.hadoop.mapred.JobTracker: JobTracker webserver: 50030
> 2008-03-17 09:45:21,374 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
> 2008-03-17 09:45:22,377 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).
> 2008-03-17 09:45:23,380 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).
> 2008-03-17 09:45:24,383 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).
> 2008-03-17 09:45:25,385 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).
> 2008-03-17 09:45:26,388 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).
> 2008-03-17 09:45:27,391 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).
> 2008-03-17 09:45:28,394 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).
> 2008-03-17 09:45:29,397 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).
> 2008-03-17 09:45:30,402 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 10 time(s).
> 2008-03-17 09:45:31,406 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: /tmp/hadoop/mapred/system
> java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
> 	at sun.nio.ch.SocketAdaptor.connect(Unknown Source)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:623)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:546)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:211)
> 	at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:312)
> 	at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:94)
> 	at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:158)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:69)
> 	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1255)
> 	at org.apache.hadoop.fs.FileSystem.access$400(FileSystem.java:53)
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1272)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:96)
> 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:702)
> 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:135)
> 	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2266)
> 2008-03-17 09:45:41,410 FATAL org.apache.hadoop.mapred.JobTracker: java.lang.IllegalArgumentException: Hook previously registered
> 	at java.lang.ApplicationShutdownHooks.add(Unknown Source)
> 	at java.lang.Runtime.addShutdownHook(Unknown Source)
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1269)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:96)
> 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:702)
> 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:135)
> 	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2266)
> 2008-03-17 09:45:41,412 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG: 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3027) JobTracker shuts down during initialization if the NameNode is down

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HADOOP-3027:
--------------------------------------------

    Affects Version/s: 0.16.0

> JobTracker shuts down during initialization if the NameNode is down
> -------------------------------------------------------------------
>
>                 Key: HADOOP-3027
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3027
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.17.0
>
>
> When the JobTracker is initializing and trying to connect to the NameNode, it shuts itself down if the NameNode is unreachable for more than one iteration of the connect loop. It can be easily reproduced if the JobTracker is started before the NameNode is started. The JobTracker will shut itself down in a few seconds. The problem seems to be with adding a shutdown hook in the FileSystem in the case where the same hook has been added before.
> 2008-03-17 09:45:20,979 INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 9101
> 2008-03-17 09:45:20,979 INFO org.apache.hadoop.mapred.JobTracker: JobTracker webserver: 50030
> 2008-03-17 09:45:21,374 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
> 2008-03-17 09:45:22,377 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).
> 2008-03-17 09:45:23,380 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).
> 2008-03-17 09:45:24,383 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).
> 2008-03-17 09:45:25,385 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).
> 2008-03-17 09:45:26,388 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).
> 2008-03-17 09:45:27,391 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).
> 2008-03-17 09:45:28,394 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).
> 2008-03-17 09:45:29,397 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).
> 2008-03-17 09:45:30,402 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 10 time(s).
> 2008-03-17 09:45:31,406 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: /tmp/hadoop/mapred/system
> java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
> 	at sun.nio.ch.SocketAdaptor.connect(Unknown Source)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:623)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:546)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:211)
> 	at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:312)
> 	at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:94)
> 	at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:158)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:69)
> 	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1255)
> 	at org.apache.hadoop.fs.FileSystem.access$400(FileSystem.java:53)
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1272)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:96)
> 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:702)
> 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:135)
> 	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2266)
> 2008-03-17 09:45:41,410 FATAL org.apache.hadoop.mapred.JobTracker: java.lang.IllegalArgumentException: Hook previously registered
> 	at java.lang.ApplicationShutdownHooks.add(Unknown Source)
> 	at java.lang.Runtime.addShutdownHook(Unknown Source)
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1269)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:96)
> 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:702)
> 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:135)
> 	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2266)
> 2008-03-17 09:45:41,412 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG: 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3027) JobTracker shuts down during initialization if the NameNode is down

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-3027:
--------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

This shouldn't require a testcase. 
I just committed this. Thanks, Amareshwari!

> JobTracker shuts down during initialization if the NameNode is down
> -------------------------------------------------------------------
>
>                 Key: HADOOP-3027
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3027
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, mapred
>    Affects Versions: 0.16.0
>            Reporter: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.16.2, 0.17.0
>
>         Attachments: patch-3027.txt
>
>
> When the JobTracker is initializing and trying to connect to the NameNode, it shuts itself down if the NameNode is unreachable for more than one iteration of the connect loop. It can be easily reproduced if the JobTracker is started before the NameNode is started. The JobTracker will shut itself down in a few seconds. The problem seems to be with adding a shutdown hook in the FileSystem in the case where the same hook has been added before.
> 2008-03-17 09:45:20,979 INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 9101
> 2008-03-17 09:45:20,979 INFO org.apache.hadoop.mapred.JobTracker: JobTracker webserver: 50030
> 2008-03-17 09:45:21,374 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
> 2008-03-17 09:45:22,377 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).
> 2008-03-17 09:45:23,380 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).
> 2008-03-17 09:45:24,383 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).
> 2008-03-17 09:45:25,385 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).
> 2008-03-17 09:45:26,388 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).
> 2008-03-17 09:45:27,391 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).
> 2008-03-17 09:45:28,394 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).
> 2008-03-17 09:45:29,397 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).
> 2008-03-17 09:45:30,402 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 10 time(s).
> 2008-03-17 09:45:31,406 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: /tmp/hadoop/mapred/system
> java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
> 	at sun.nio.ch.SocketAdaptor.connect(Unknown Source)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:623)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:546)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:211)
> 	at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:312)
> 	at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:94)
> 	at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:158)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:69)
> 	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1255)
> 	at org.apache.hadoop.fs.FileSystem.access$400(FileSystem.java:53)
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1272)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:96)
> 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:702)
> 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:135)
> 	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2266)
> 2008-03-17 09:45:41,410 FATAL org.apache.hadoop.mapred.JobTracker: java.lang.IllegalArgumentException: Hook previously registered
> 	at java.lang.ApplicationShutdownHooks.add(Unknown Source)
> 	at java.lang.Runtime.addShutdownHook(Unknown Source)
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1269)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:96)
> 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:702)
> 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:135)
> 	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2266)
> 2008-03-17 09:45:41,412 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG: 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3027) JobTracker shuts down during initialization if the NameNode is down

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581554#action_12581554 ] 

Hadoop QA commented on HADOOP-3027:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12378484/patch-3027.txt
against trunk revision 619744.

    @author +1.  The patch does not contain any @author tags.

    tests included -1.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new javac compiler warnings.

    release audit +1.  The applied patch does not generate any new release audit warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2036/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2036/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2036/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2036/console

This message is automatically generated.

> JobTracker shuts down during initialization if the NameNode is down
> -------------------------------------------------------------------
>
>                 Key: HADOOP-3027
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3027
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, mapred
>    Affects Versions: 0.16.0
>            Reporter: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.16.2, 0.17.0
>
>         Attachments: patch-3027.txt
>
>
> When the JobTracker is initializing and trying to connect to the NameNode, it shuts itself down if the NameNode is unreachable for more than one iteration of the connect loop. It can be easily reproduced if the JobTracker is started before the NameNode is started. The JobTracker will shut itself down in a few seconds. The problem seems to be with adding a shutdown hook in the FileSystem in the case where the same hook has been added before.
> 2008-03-17 09:45:20,979 INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 9101
> 2008-03-17 09:45:20,979 INFO org.apache.hadoop.mapred.JobTracker: JobTracker webserver: 50030
> 2008-03-17 09:45:21,374 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
> 2008-03-17 09:45:22,377 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).
> 2008-03-17 09:45:23,380 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).
> 2008-03-17 09:45:24,383 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).
> 2008-03-17 09:45:25,385 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).
> 2008-03-17 09:45:26,388 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).
> 2008-03-17 09:45:27,391 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).
> 2008-03-17 09:45:28,394 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).
> 2008-03-17 09:45:29,397 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).
> 2008-03-17 09:45:30,402 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 10 time(s).
> 2008-03-17 09:45:31,406 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: /tmp/hadoop/mapred/system
> java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
> 	at sun.nio.ch.SocketAdaptor.connect(Unknown Source)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:623)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:546)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:211)
> 	at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:312)
> 	at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:94)
> 	at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:158)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:69)
> 	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1255)
> 	at org.apache.hadoop.fs.FileSystem.access$400(FileSystem.java:53)
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1272)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:96)
> 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:702)
> 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:135)
> 	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2266)
> 2008-03-17 09:45:41,410 FATAL org.apache.hadoop.mapred.JobTracker: java.lang.IllegalArgumentException: Hook previously registered
> 	at java.lang.ApplicationShutdownHooks.add(Unknown Source)
> 	at java.lang.Runtime.addShutdownHook(Unknown Source)
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1269)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:96)
> 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:702)
> 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:135)
> 	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2266)
> 2008-03-17 09:45:41,412 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG: 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3027) JobTracker shuts down during initialization if the NameNode is down

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler updated HADOOP-3027:
------------------------------------

    Component/s: mapred

> JobTracker shuts down during initialization if the NameNode is down
> -------------------------------------------------------------------
>
>                 Key: HADOOP-3027
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3027
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, mapred
>    Affects Versions: 0.16.0
>            Reporter: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.17.0
>
>
> When the JobTracker is initializing and trying to connect to the NameNode, it shuts itself down if the NameNode is unreachable for more than one iteration of the connect loop. It can be easily reproduced if the JobTracker is started before the NameNode is started. The JobTracker will shut itself down in a few seconds. The problem seems to be with adding a shutdown hook in the FileSystem in the case where the same hook has been added before.
> 2008-03-17 09:45:20,979 INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 9101
> 2008-03-17 09:45:20,979 INFO org.apache.hadoop.mapred.JobTracker: JobTracker webserver: 50030
> 2008-03-17 09:45:21,374 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
> 2008-03-17 09:45:22,377 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).
> 2008-03-17 09:45:23,380 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).
> 2008-03-17 09:45:24,383 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).
> 2008-03-17 09:45:25,385 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).
> 2008-03-17 09:45:26,388 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).
> 2008-03-17 09:45:27,391 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).
> 2008-03-17 09:45:28,394 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).
> 2008-03-17 09:45:29,397 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).
> 2008-03-17 09:45:30,402 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 10 time(s).
> 2008-03-17 09:45:31,406 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: /tmp/hadoop/mapred/system
> java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
> 	at sun.nio.ch.SocketAdaptor.connect(Unknown Source)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:623)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:546)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:211)
> 	at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:312)
> 	at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:94)
> 	at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:158)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:69)
> 	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1255)
> 	at org.apache.hadoop.fs.FileSystem.access$400(FileSystem.java:53)
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1272)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:96)
> 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:702)
> 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:135)
> 	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2266)
> 2008-03-17 09:45:41,410 FATAL org.apache.hadoop.mapred.JobTracker: java.lang.IllegalArgumentException: Hook previously registered
> 	at java.lang.ApplicationShutdownHooks.add(Unknown Source)
> 	at java.lang.Runtime.addShutdownHook(Unknown Source)
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1269)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:96)
> 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:702)
> 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:135)
> 	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2266)
> 2008-03-17 09:45:41,412 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG: 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.