You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Chen Song <ch...@gmail.com> on 2015/03/02 17:44:46 UTC

how to catch exception when data cannot be replication to any datanode

Hey

I got the following error in the application logs when trying to put a file
to DFS.

015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/tmp/impbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb
could only be replicated to 0 nodes instead of minReplication (=1).
There are 317 datanode(s) running and no node(s) are excluded in this
operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1447)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2703)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)

        at org.apache.hadoop.ipc.Client.call(Client.java:1409)
        at org.apache.hadoop.ipc.Client.call(Client.java:1362)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at com.sun.proxy.$Proxy23.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362)
        at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy24.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1438)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)


This results in empty file in HDFS. I did some search through this email
thread and found that this could be caused by disk full, or data node
unreachable.

However, this exception was only logged as WARN level when FileSystem.close
is called, and never thrown visible to client. My question is, on the
client level, How can I catch this exception and handle it?

Chen

-- 
Chen Song

Re: how to catch exception when data cannot be replication to any datanode

Posted by Chen Song <ch...@gmail.com>.

Also, it could be thrown out in BlockManager but on DFSClient side, it just
catch that exception and logs it as a warning.

The problem here is that the caller has no way to detect this error and
only see an empty file (0 bytes) after the fact.

Chen

On Mon, Mar 2, 2015 at 2:41 PM, Chen Song <ch...@gmail.com> wrote:

> I am using CDH5.1.0, which is hadoop 2.3.0.
>
> On Mon, Mar 2, 2015 at 12:23 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> Which hadoop release are you using ?
>>
>> In branch-2, I see this IOE in BlockManager :
>>
>>     if (targets.length < minReplication) {
>>       throw new IOException("File " + src + " could only be replicated to
>> "
>>           + targets.length + " nodes instead of minReplication (="
>>           + minReplication + ").  There are "
>>
>> Cheers
>>
>> On Mon, Mar 2, 2015 at 8:44 AM, Chen Song <ch...@gmail.com> wrote:
>>
>>> Hey
>>>
>>> I got the following error in the application logs when trying to put a
>>> file to DFS.
>>>
>>> 015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/impbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb could only be replicated to 0 nodes instead of minReplication (=1).  There are 317 datanode(s) running and no node(s) are excluded in this operation.
>>>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1447)
>>>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2703)
>>>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569)
>>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at javax.security.auth.Subject.doAs(Subject.java:415)
>>>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
>>>
>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1409)
>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1362)
>>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>         at com.sun.proxy.$Proxy23.addBlock(Unknown Source)
>>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362)
>>>         at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source)
>>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>         at com.sun.proxy.$Proxy24.addBlock(Unknown Source)
>>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1438)
>>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260)
>>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>>>
>>>
>>> This results in empty file in HDFS. I did some search through this email
>>> thread and found that this could be caused by disk full, or data node
>>> unreachable.
>>>
>>> However, this exception was only logged as WARN level when
>>> FileSystem.close is called, and never thrown visible to client. My question
>>> is, on the client level, How can I catch this exception and handle it?
>>>
>>> Chen
>>>
>>> --
>>> Chen Song
>>>
>>>
>>
>
>
> --
> Chen Song
>
>


-- 
Chen Song

Re: how to catch exception when data cannot be replication to any datanode

Posted by Chen Song <ch...@gmail.com>.

Also, it could be thrown out in BlockManager but on DFSClient side, it just
catch that exception and logs it as a warning.

The problem here is that the caller has no way to detect this error and
only see an empty file (0 bytes) after the fact.

Chen

On Mon, Mar 2, 2015 at 2:41 PM, Chen Song <ch...@gmail.com> wrote:

> I am using CDH5.1.0, which is hadoop 2.3.0.
>
> On Mon, Mar 2, 2015 at 12:23 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> Which hadoop release are you using ?
>>
>> In branch-2, I see this IOE in BlockManager :
>>
>>     if (targets.length < minReplication) {
>>       throw new IOException("File " + src + " could only be replicated to
>> "
>>           + targets.length + " nodes instead of minReplication (="
>>           + minReplication + ").  There are "
>>
>> Cheers
>>
>> On Mon, Mar 2, 2015 at 8:44 AM, Chen Song <ch...@gmail.com> wrote:
>>
>>> Hey
>>>
>>> I got the following error in the application logs when trying to put a
>>> file to DFS.
>>>
>>> 015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/impbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb could only be replicated to 0 nodes instead of minReplication (=1).  There are 317 datanode(s) running and no node(s) are excluded in this operation.
>>>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1447)
>>>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2703)
>>>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569)
>>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at javax.security.auth.Subject.doAs(Subject.java:415)
>>>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
>>>
>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1409)
>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1362)
>>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>         at com.sun.proxy.$Proxy23.addBlock(Unknown Source)
>>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362)
>>>         at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source)
>>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>         at com.sun.proxy.$Proxy24.addBlock(Unknown Source)
>>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1438)
>>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260)
>>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>>>
>>>
>>> This results in empty file in HDFS. I did some search through this email
>>> thread and found that this could be caused by disk full, or data node
>>> unreachable.
>>>
>>> However, this exception was only logged as WARN level when
>>> FileSystem.close is called, and never thrown visible to client. My question
>>> is, on the client level, How can I catch this exception and handle it?
>>>
>>> Chen
>>>
>>> --
>>> Chen Song
>>>
>>>
>>
>
>
> --
> Chen Song
>
>


-- 
Chen Song

Re: how to catch exception when data cannot be replication to any datanode

Posted by Chen Song <ch...@gmail.com>.

Also, it could be thrown out in BlockManager but on DFSClient side, it just
catch that exception and logs it as a warning.

The problem here is that the caller has no way to detect this error and
only see an empty file (0 bytes) after the fact.

Chen

On Mon, Mar 2, 2015 at 2:41 PM, Chen Song <ch...@gmail.com> wrote:

> I am using CDH5.1.0, which is hadoop 2.3.0.
>
> On Mon, Mar 2, 2015 at 12:23 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> Which hadoop release are you using ?
>>
>> In branch-2, I see this IOE in BlockManager :
>>
>>     if (targets.length < minReplication) {
>>       throw new IOException("File " + src + " could only be replicated to
>> "
>>           + targets.length + " nodes instead of minReplication (="
>>           + minReplication + ").  There are "
>>
>> Cheers
>>
>> On Mon, Mar 2, 2015 at 8:44 AM, Chen Song <ch...@gmail.com> wrote:
>>
>>> Hey
>>>
>>> I got the following error in the application logs when trying to put a
>>> file to DFS.
>>>
>>> 015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/impbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb could only be replicated to 0 nodes instead of minReplication (=1).  There are 317 datanode(s) running and no node(s) are excluded in this operation.
>>>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1447)
>>>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2703)
>>>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569)
>>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at javax.security.auth.Subject.doAs(Subject.java:415)
>>>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
>>>
>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1409)
>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1362)
>>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>         at com.sun.proxy.$Proxy23.addBlock(Unknown Source)
>>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362)
>>>         at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source)
>>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>         at com.sun.proxy.$Proxy24.addBlock(Unknown Source)
>>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1438)
>>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260)
>>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>>>
>>>
>>> This results in empty file in HDFS. I did some search through this email
>>> thread and found that this could be caused by disk full, or data node
>>> unreachable.
>>>
>>> However, this exception was only logged as WARN level when
>>> FileSystem.close is called, and never thrown visible to client. My question
>>> is, on the client level, How can I catch this exception and handle it?
>>>
>>> Chen
>>>
>>> --
>>> Chen Song
>>>
>>>
>>
>
>
> --
> Chen Song
>
>


-- 
Chen Song

Re: how to catch exception when data cannot be replication to any datanode

Posted by Chen Song <ch...@gmail.com>.

Also, it could be thrown out in BlockManager but on DFSClient side, it just
catch that exception and logs it as a warning.

The problem here is that the caller has no way to detect this error and
only see an empty file (0 bytes) after the fact.

Chen

On Mon, Mar 2, 2015 at 2:41 PM, Chen Song <ch...@gmail.com> wrote:

> I am using CDH5.1.0, which is hadoop 2.3.0.
>
> On Mon, Mar 2, 2015 at 12:23 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> Which hadoop release are you using ?
>>
>> In branch-2, I see this IOE in BlockManager :
>>
>>     if (targets.length < minReplication) {
>>       throw new IOException("File " + src + " could only be replicated to
>> "
>>           + targets.length + " nodes instead of minReplication (="
>>           + minReplication + ").  There are "
>>
>> Cheers
>>
>> On Mon, Mar 2, 2015 at 8:44 AM, Chen Song <ch...@gmail.com> wrote:
>>
>>> Hey
>>>
>>> I got the following error in the application logs when trying to put a
>>> file to DFS.
>>>
>>> 015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/impbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb could only be replicated to 0 nodes instead of minReplication (=1).  There are 317 datanode(s) running and no node(s) are excluded in this operation.
>>>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1447)
>>>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2703)
>>>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569)
>>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at javax.security.auth.Subject.doAs(Subject.java:415)
>>>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
>>>
>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1409)
>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1362)
>>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>         at com.sun.proxy.$Proxy23.addBlock(Unknown Source)
>>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362)
>>>         at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source)
>>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>         at com.sun.proxy.$Proxy24.addBlock(Unknown Source)
>>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1438)
>>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260)
>>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>>>
>>>
>>> This results in empty file in HDFS. I did some search through this email
>>> thread and found that this could be caused by disk full, or data node
>>> unreachable.
>>>
>>> However, this exception was only logged as WARN level when
>>> FileSystem.close is called, and never thrown visible to client. My question
>>> is, on the client level, How can I catch this exception and handle it?
>>>
>>> Chen
>>>
>>> --
>>> Chen Song
>>>
>>>
>>
>
>
> --
> Chen Song
>
>


-- 
Chen Song

Re: how to catch exception when data cannot be replication to any datanode

Posted by Chen Song <ch...@gmail.com>.

I am using CDH5.1.0, which is hadoop 2.3.0.

On Mon, Mar 2, 2015 at 12:23 PM, Ted Yu <yu...@gmail.com> wrote:

> Which hadoop release are you using ?
>
> In branch-2, I see this IOE in BlockManager :
>
>     if (targets.length < minReplication) {
>       throw new IOException("File " + src + " could only be replicated to "
>           + targets.length + " nodes instead of minReplication (="
>           + minReplication + ").  There are "
>
> Cheers
>
> On Mon, Mar 2, 2015 at 8:44 AM, Chen Song <ch...@gmail.com> wrote:
>
>> Hey
>>
>> I got the following error in the application logs when trying to put a
>> file to DFS.
>>
>> 015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/impbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb could only be replicated to 0 nodes instead of minReplication (=1).  There are 317 datanode(s) running and no node(s) are excluded in this operation.
>>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1447)
>>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2703)
>>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569)
>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:415)
>>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
>>
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1409)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1362)
>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>         at com.sun.proxy.$Proxy23.addBlock(Unknown Source)
>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362)
>>         at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>         at com.sun.proxy.$Proxy24.addBlock(Unknown Source)
>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1438)
>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260)
>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>>
>>
>> This results in empty file in HDFS. I did some search through this email
>> thread and found that this could be caused by disk full, or data node
>> unreachable.
>>
>> However, this exception was only logged as WARN level when
>> FileSystem.close is called, and never thrown visible to client. My question
>> is, on the client level, How can I catch this exception and handle it?
>>
>> Chen
>>
>> --
>> Chen Song
>>
>>
>


-- 
Chen Song

Re: how to catch exception when data cannot be replication to any datanode

Posted by Chen Song <ch...@gmail.com>.

I am using CDH5.1.0, which is hadoop 2.3.0.

On Mon, Mar 2, 2015 at 12:23 PM, Ted Yu <yu...@gmail.com> wrote:

> Which hadoop release are you using ?
>
> In branch-2, I see this IOE in BlockManager :
>
>     if (targets.length < minReplication) {
>       throw new IOException("File " + src + " could only be replicated to "
>           + targets.length + " nodes instead of minReplication (="
>           + minReplication + ").  There are "
>
> Cheers
>
> On Mon, Mar 2, 2015 at 8:44 AM, Chen Song <ch...@gmail.com> wrote:
>
>> Hey
>>
>> I got the following error in the application logs when trying to put a
>> file to DFS.
>>
>> 015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/impbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb could only be replicated to 0 nodes instead of minReplication (=1).  There are 317 datanode(s) running and no node(s) are excluded in this operation.
>>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1447)
>>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2703)
>>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569)
>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:415)
>>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
>>
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1409)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1362)
>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>         at com.sun.proxy.$Proxy23.addBlock(Unknown Source)
>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362)
>>         at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>         at com.sun.proxy.$Proxy24.addBlock(Unknown Source)
>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1438)
>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260)
>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>>
>>
>> This results in empty file in HDFS. I did some search through this email
>> thread and found that this could be caused by disk full, or data node
>> unreachable.
>>
>> However, this exception was only logged as WARN level when
>> FileSystem.close is called, and never thrown visible to client. My question
>> is, on the client level, How can I catch this exception and handle it?
>>
>> Chen
>>
>> --
>> Chen Song
>>
>>
>


-- 
Chen Song

Re: how to catch exception when data cannot be replication to any datanode

Posted by Chen Song <ch...@gmail.com>.

I am using CDH5.1.0, which is hadoop 2.3.0.

On Mon, Mar 2, 2015 at 12:23 PM, Ted Yu <yu...@gmail.com> wrote:

> Which hadoop release are you using ?
>
> In branch-2, I see this IOE in BlockManager :
>
>     if (targets.length < minReplication) {
>       throw new IOException("File " + src + " could only be replicated to "
>           + targets.length + " nodes instead of minReplication (="
>           + minReplication + ").  There are "
>
> Cheers
>
> On Mon, Mar 2, 2015 at 8:44 AM, Chen Song <ch...@gmail.com> wrote:
>
>> Hey
>>
>> I got the following error in the application logs when trying to put a
>> file to DFS.
>>
>> 015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/impbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb could only be replicated to 0 nodes instead of minReplication (=1).  There are 317 datanode(s) running and no node(s) are excluded in this operation.
>>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1447)
>>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2703)
>>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569)
>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:415)
>>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
>>
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1409)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1362)
>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>         at com.sun.proxy.$Proxy23.addBlock(Unknown Source)
>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362)
>>         at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>         at com.sun.proxy.$Proxy24.addBlock(Unknown Source)
>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1438)
>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260)
>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>>
>>
>> This results in empty file in HDFS. I did some search through this email
>> thread and found that this could be caused by disk full, or data node
>> unreachable.
>>
>> However, this exception was only logged as WARN level when
>> FileSystem.close is called, and never thrown visible to client. My question
>> is, on the client level, How can I catch this exception and handle it?
>>
>> Chen
>>
>> --
>> Chen Song
>>
>>
>


-- 
Chen Song

Re: how to catch exception when data cannot be replication to any datanode

Posted by Chen Song <ch...@gmail.com>.

I am using CDH5.1.0, which is hadoop 2.3.0.

On Mon, Mar 2, 2015 at 12:23 PM, Ted Yu <yu...@gmail.com> wrote:

> Which hadoop release are you using ?
>
> In branch-2, I see this IOE in BlockManager :
>
>     if (targets.length < minReplication) {
>       throw new IOException("File " + src + " could only be replicated to "
>           + targets.length + " nodes instead of minReplication (="
>           + minReplication + ").  There are "
>
> Cheers
>
> On Mon, Mar 2, 2015 at 8:44 AM, Chen Song <ch...@gmail.com> wrote:
>
>> Hey
>>
>> I got the following error in the application logs when trying to put a
>> file to DFS.
>>
>> 015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/impbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb could only be replicated to 0 nodes instead of minReplication (=1).  There are 317 datanode(s) running and no node(s) are excluded in this operation.
>>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1447)
>>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2703)
>>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569)
>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:415)
>>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
>>
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1409)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1362)
>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>         at com.sun.proxy.$Proxy23.addBlock(Unknown Source)
>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362)
>>         at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>         at com.sun.proxy.$Proxy24.addBlock(Unknown Source)
>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1438)
>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260)
>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>>
>>
>> This results in empty file in HDFS. I did some search through this email
>> thread and found that this could be caused by disk full, or data node
>> unreachable.
>>
>> However, this exception was only logged as WARN level when
>> FileSystem.close is called, and never thrown visible to client. My question
>> is, on the client level, How can I catch this exception and handle it?
>>
>> Chen
>>
>> --
>> Chen Song
>>
>>
>


-- 
Chen Song

Re: how to catch exception when data cannot be replication to any datanode

Posted by Ted Yu <yu...@gmail.com>.

Which hadoop release are you using ?

In branch-2, I see this IOE in BlockManager :

    if (targets.length < minReplication) {
      throw new IOException("File " + src + " could only be replicated to "
          + targets.length + " nodes instead of minReplication (="
          + minReplication + ").  There are "

Cheers

On Mon, Mar 2, 2015 at 8:44 AM, Chen Song <ch...@gmail.com> wrote:

> Hey
>
> I got the following error in the application logs when trying to put a
> file to DFS.
>
> 015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/impbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb could only be replicated to 0 nodes instead of minReplication (=1).  There are 317 datanode(s) running and no node(s) are excluded in this operation.
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1447)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2703)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:1409)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1362)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>         at com.sun.proxy.$Proxy23.addBlock(Unknown Source)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362)
>         at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>         at com.sun.proxy.$Proxy24.addBlock(Unknown Source)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1438)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>
>
> This results in empty file in HDFS. I did some search through this email
> thread and found that this could be caused by disk full, or data node
> unreachable.
>
> However, this exception was only logged as WARN level when
> FileSystem.close is called, and never thrown visible to client. My question
> is, on the client level, How can I catch this exception and handle it?
>
> Chen
>
> --
> Chen Song
>
>

Re: how to catch exception when data cannot be replication to any datanode

Posted by Ted Yu <yu...@gmail.com>.

Which hadoop release are you using ?

In branch-2, I see this IOE in BlockManager :

    if (targets.length < minReplication) {
      throw new IOException("File " + src + " could only be replicated to "
          + targets.length + " nodes instead of minReplication (="
          + minReplication + ").  There are "

Cheers

On Mon, Mar 2, 2015 at 8:44 AM, Chen Song <ch...@gmail.com> wrote:

> Hey
>
> I got the following error in the application logs when trying to put a
> file to DFS.
>
> 015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/impbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb could only be replicated to 0 nodes instead of minReplication (=1).  There are 317 datanode(s) running and no node(s) are excluded in this operation.
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1447)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2703)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:1409)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1362)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>         at com.sun.proxy.$Proxy23.addBlock(Unknown Source)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362)
>         at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>         at com.sun.proxy.$Proxy24.addBlock(Unknown Source)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1438)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>
>
> This results in empty file in HDFS. I did some search through this email
> thread and found that this could be caused by disk full, or data node
> unreachable.
>
> However, this exception was only logged as WARN level when
> FileSystem.close is called, and never thrown visible to client. My question
> is, on the client level, How can I catch this exception and handle it?
>
> Chen
>
> --
> Chen Song
>
>

Re: how to catch exception when data cannot be replication to any datanode

Posted by Ted Yu <yu...@gmail.com>.

Which hadoop release are you using ?

In branch-2, I see this IOE in BlockManager :

    if (targets.length < minReplication) {
      throw new IOException("File " + src + " could only be replicated to "
          + targets.length + " nodes instead of minReplication (="
          + minReplication + ").  There are "

Cheers

On Mon, Mar 2, 2015 at 8:44 AM, Chen Song <ch...@gmail.com> wrote:

> Hey
>
> I got the following error in the application logs when trying to put a
> file to DFS.
>
> 015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/impbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb could only be replicated to 0 nodes instead of minReplication (=1).  There are 317 datanode(s) running and no node(s) are excluded in this operation.
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1447)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2703)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:1409)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1362)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>         at com.sun.proxy.$Proxy23.addBlock(Unknown Source)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362)
>         at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>         at com.sun.proxy.$Proxy24.addBlock(Unknown Source)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1438)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>
>
> This results in empty file in HDFS. I did some search through this email
> thread and found that this could be caused by disk full, or data node
> unreachable.
>
> However, this exception was only logged as WARN level when
> FileSystem.close is called, and never thrown visible to client. My question
> is, on the client level, How can I catch this exception and handle it?
>
> Chen
>
> --
> Chen Song
>
>

Re: how to catch exception when data cannot be replication to any datanode

Posted by Ted Yu <yu...@gmail.com>.

Which hadoop release are you using ?

In branch-2, I see this IOE in BlockManager :

    if (targets.length < minReplication) {
      throw new IOException("File " + src + " could only be replicated to "
          + targets.length + " nodes instead of minReplication (="
          + minReplication + ").  There are "

Cheers

On Mon, Mar 2, 2015 at 8:44 AM, Chen Song <ch...@gmail.com> wrote:

> Hey
>
> I got the following error in the application logs when trying to put a
> file to DFS.
>
> 015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/impbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb could only be replicated to 0 nodes instead of minReplication (=1).  There are 317 datanode(s) running and no node(s) are excluded in this operation.
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1447)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2703)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:1409)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1362)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>         at com.sun.proxy.$Proxy23.addBlock(Unknown Source)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362)
>         at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>         at com.sun.proxy.$Proxy24.addBlock(Unknown Source)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1438)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>
>
> This results in empty file in HDFS. I did some search through this email
> thread and found that this could be caused by disk full, or data node
> unreachable.
>
> However, this exception was only logged as WARN level when
> FileSystem.close is called, and never thrown visible to client. My question
> is, on the client level, How can I catch this exception and handle it?
>
> Chen
>
> --
> Chen Song
>
>