You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Konstantin Shvachko (JIRA)" <ji...@apache.org> on 2006/06/08 04:34:29 UTC

[jira] Created: (HADOOP-289) Datanodes need to catch SocketTimeoutException and UnregisteredDatanodeException

Datanodes need to catch SocketTimeoutException and UnregisteredDatanodeException
--------------------------------------------------------------------------------

         Key: HADOOP-289
         URL: http://issues.apache.org/jira/browse/HADOOP-289
     Project: Hadoop
        Type: Bug

  Components: dfs  
    Versions: 0.3.1    
    Reporter: Konstantin Shvachko
 Assigned to: Konstantin Shvachko 
     Fix For: 0.3.2


- Datanode needs to catch SocketTimeoutException when registering otherwise it goes down
the same way as when the namenode is not available (HADOOP-282).
- UnregisteredDatanodeException need to be caught for all non-registering requests. The data
node should be shutdown in this case. Otherwise it will loop infinitely and consume namenode resources.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-289) Datanodes need to catch SocketTimeoutException and UnregisteredDatanodeException

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-289?page=all ]

Konstantin Shvachko updated HADOOP-289:
---------------------------------------

    Attachment: DatanodeExceptions.patch

This patch fixes the two problems described.
I placed all registration logic inside the DataNode.register(), seems more logical to me.
There is also a simple null value checkup included for FSNamesystem,
didn't want to create a separate  issue for that.


> Datanodes need to catch SocketTimeoutException and UnregisteredDatanodeException
> --------------------------------------------------------------------------------
>
>          Key: HADOOP-289
>          URL: http://issues.apache.org/jira/browse/HADOOP-289
>      Project: Hadoop
>         Type: Bug

>   Components: dfs
>     Versions: 0.3.1
>     Reporter: Konstantin Shvachko
>     Assignee: Konstantin Shvachko
>      Fix For: 0.3.2
>  Attachments: DatanodeExceptions.patch
>
> - Datanode needs to catch SocketTimeoutException when registering otherwise it goes down
> the same way as when the namenode is not available (HADOOP-282).
> - UnregisteredDatanodeException need to be caught for all non-registering requests. The data
> node should be shutdown in this case. Otherwise it will loop infinitely and consume namenode resources.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-289) Datanodes need to catch SocketTimeoutException and UnregisteredDatanodeException

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-289?page=comments#action_12415420 ] 

Doug Cutting commented on HADOOP-289:
-------------------------------------

This patch causes unit tests to fail for me.  For example, TestLocalDFS fails with:

2006-06-08 12:56:54,423 INFO  ipc.Client (Client.java:run(142)) - Client connection to 127.0.0.1:65312: starting
2006-06-08 12:56:54,432 INFO  ipc.Server (Server.java:run(233)) - Server handler 0 on 65312 call error: org.apache.hadoop.dfs.IncorrectVersionException: Unexpected version of data node reported: 0. Expecting = -2.
org.apache.hadoop.dfs.IncorrectVersionException: Unexpected version of data node reported: 0. Expecting = -2.
	at org.apache.hadoop.dfs.NameNode.verifyVersion(NameNode.java:474)
	at org.apache.hadoop.dfs.NameNode.register(NameNode.java:362)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:585)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:243)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:231)
2006-06-08 12:56:55,370 INFO  conf.Configuration (Configuration.java:loadResource(397)) - parsing file:/home/cutting/src/hadoop/test/conf/hadoop-default.xml
2006-06-08 12:56:55,390 INFO  conf.Configuration (Configuration.java:loadResource(397)) - parsing file:/home/cutting/src/hadoop/test/src/test/hadoop-site.xml
2006-06-08 12:56:55,395 WARN  fs.FSNamesystem (FSNamesystem.java:chooseTargets(1646)) - Replication requested of 1 is larger than cluster size (0). Using cluster size.
2006-06-08 12:56:55,395 WARN  dfs.StateChange (FSNamesystem.java:startFile(388)) - DIR* NameSystem.startFile: failed to create file /user/cutting/somewhat/.random.txt.crc on client hadoop because target-length is 0, below MIN_REPLICATION (1)
2006-06-08 12:56:55,396 INFO  ipc.Server (Server.java:run(233)) - Server handler 1 on 65312 call error: java.io.IOException: failed to create file /user/cutting/somewhat/.random.txt.crc on client hadoop because target-length is 0, below MIN_REPLICATION (1)
java.io.IOException: failed to create file /user/cutting/somewhat/.random.txt.crc on client hadoop because target-length is 0, below MIN_REPLICATION (1)
	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:354)
	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:165)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:585)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:243)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:231)


> Datanodes need to catch SocketTimeoutException and UnregisteredDatanodeException
> --------------------------------------------------------------------------------
>
>          Key: HADOOP-289
>          URL: http://issues.apache.org/jira/browse/HADOOP-289
>      Project: Hadoop
>         Type: Bug

>   Components: dfs
>     Versions: 0.3.1
>     Reporter: Konstantin Shvachko
>     Assignee: Konstantin Shvachko
>      Fix For: 0.3.2
>  Attachments: DatanodeExceptions.patch
>
> - Datanode needs to catch SocketTimeoutException when registering otherwise it goes down
> the same way as when the namenode is not available (HADOOP-282).
> - UnregisteredDatanodeException need to be caught for all non-registering requests. The data
> node should be shutdown in this case. Otherwise it will loop infinitely and consume namenode resources.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-289) Datanodes need to catch SocketTimeoutException and UnregisteredDatanodeException

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-289?page=all ]

Konstantin Shvachko updated HADOOP-289:
---------------------------------------

    Attachment: DatanodeExceptions-2.patch

Resubmitted the patch under DatanodeExceptions-2.patch. 
Not failing this time. Sorry.

> Datanodes need to catch SocketTimeoutException and UnregisteredDatanodeException
> --------------------------------------------------------------------------------
>
>          Key: HADOOP-289
>          URL: http://issues.apache.org/jira/browse/HADOOP-289
>      Project: Hadoop
>         Type: Bug

>   Components: dfs
>     Versions: 0.3.1
>     Reporter: Konstantin Shvachko
>     Assignee: Konstantin Shvachko
>      Fix For: 0.3.2
>  Attachments: DatanodeExceptions-2.patch, DatanodeExceptions.patch
>
> - Datanode needs to catch SocketTimeoutException when registering otherwise it goes down
> the same way as when the namenode is not available (HADOOP-282).
> - UnregisteredDatanodeException need to be caught for all non-registering requests. The data
> node should be shutdown in this case. Otherwise it will loop infinitely and consume namenode resources.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Resolved: (HADOOP-289) Datanodes need to catch SocketTimeoutException and UnregisteredDatanodeException

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-289?page=all ]
     
Doug Cutting resolved HADOOP-289:
---------------------------------

    Resolution: Fixed

I just committed this.  Thanks, Konstantin.

> Datanodes need to catch SocketTimeoutException and UnregisteredDatanodeException
> --------------------------------------------------------------------------------
>
>          Key: HADOOP-289
>          URL: http://issues.apache.org/jira/browse/HADOOP-289
>      Project: Hadoop
>         Type: Bug

>   Components: dfs
>     Versions: 0.3.1
>     Reporter: Konstantin Shvachko
>     Assignee: Konstantin Shvachko
>      Fix For: 0.3.2
>  Attachments: DatanodeExceptions-2.patch, DatanodeExceptions.patch
>
> - Datanode needs to catch SocketTimeoutException when registering otherwise it goes down
> the same way as when the namenode is not available (HADOOP-282).
> - UnregisteredDatanodeException need to be caught for all non-registering requests. The data
> node should be shutdown in this case. Otherwise it will loop infinitely and consume namenode resources.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-289) Datanodes need to catch SocketTimeoutException and UnregisteredDatanodeException

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-289?page=comments#action_12415467 ] 

Owen O'Malley commented on HADOOP-289:
--------------------------------------

Please replace the getLocalizedMessage and implicit toString with calls to StringUtils.stringifyException, which includes both the message and the call stack. The call stack helps a lot in finding and debugging the problem.

> Datanodes need to catch SocketTimeoutException and UnregisteredDatanodeException
> --------------------------------------------------------------------------------
>
>          Key: HADOOP-289
>          URL: http://issues.apache.org/jira/browse/HADOOP-289
>      Project: Hadoop
>         Type: Bug

>   Components: dfs
>     Versions: 0.3.1
>     Reporter: Konstantin Shvachko
>     Assignee: Konstantin Shvachko
>      Fix For: 0.3.2
>  Attachments: DatanodeExceptions-2.patch, DatanodeExceptions.patch
>
> - Datanode needs to catch SocketTimeoutException when registering otherwise it goes down
> the same way as when the namenode is not available (HADOOP-282).
> - UnregisteredDatanodeException need to be caught for all non-registering requests. The data
> node should be shutdown in this case. Otherwise it will loop infinitely and consume namenode resources.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira