You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Stas Oskin <st...@gmail.com> on 2009/05/21 11:11:10 UTC

Could only be replicated to 0 nodes, instead of 1

Hi.

I'm testing Hadoop in our lab, and started getting the following message
when trying to copy a file:
Could only be replicated to 0 nodes, instead of 1

I have the following setup:

* 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB
* Two clients are copying files all the time (one of them is the 1.5GB
machine)
* The replication is set on 2
* I let the space on 2 smaller machines to end, to test the behavior

Now, one of the clients (the one located on 1.5GB) works fine, and the other
one - the external, unable to copy and displays the error + the exception
below

Any idea if this expected on my scenario? Or how it can be solved?

Thanks in advance.



09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping
/test/test.bin retries left 1

09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/test/test.bin could only be replicated to 0 nodes, instead of 1

            at
org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123
)

            at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)

            at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)

            at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
)

            at java.lang.reflect.Method.invoke(Method.java:597)

            at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)

            at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)



            at org.apache.hadoop.ipc.Client.call(Client.java:716)

            at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)

            at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

            at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)

            at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
)

            at java.lang.reflect.Method.invoke(Method.java:597)

            at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
)

            at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
)

            at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)

            at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
)

            at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
)

            at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745
)

            at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922
)



09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad
datanode[0]

java.io.IOException: Could not get block locations. Aborting...

            at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153
)

            at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745
)

            at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899
)

Re: Could only be replicated to 0 nodes, instead of 1

Posted by Stas Oskin <st...@gmail.com>.
Hi.

I wonder if there was any progress with this issue?

Regards.

On Thu, May 21, 2009 at 9:01 PM, Raghu Angadi <ra...@yahoo-inc.com> wrote:

>
> I think you should file a jira on this. Most likely this is what is
> happening :
>
>  * two out of 3 dns can not take anymore blocks.
>  * While picking nodes for a new block, NN mostly skips the third dn as
> well since '# active writes' on it is larger than '2 * avg'.
>  * Even if there is one other block is being written on the 3rd, it is
> still greater than (2 * 1/3).
>
> To test this, if you write just one block to an idle cluster it should
> succeed.
>
> Writing from the client on the 3rd dn succeeds since local node is always
> favored.
>
> This particular problem is not that severe on a large cluster but HDFS
> should do the sensible thing.
>
> Raghu.
>
>
> Stas Oskin wrote:
>
>> Hi.
>>
>> I'm testing Hadoop in our lab, and started getting the following message
>> when trying to copy a file:
>> Could only be replicated to 0 nodes, instead of 1
>>
>> I have the following setup:
>>
>> * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB
>> * Two clients are copying files all the time (one of them is the 1.5GB
>> machine)
>> * The replication is set on 2
>> * I let the space on 2 smaller machines to end, to test the behavior
>>
>> Now, one of the clients (the one located on 1.5GB) works fine, and the
>> other
>> one - the external, unable to copy and displays the error + the exception
>> below
>>
>> Any idea if this expected on my scenario? Or how it can be solved?
>>
>> Thanks in advance.
>>
>>
>>
>> 09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping
>> /test/test.bin retries left 1
>>
>> 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception:
>> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
>> /test/test.bin could only be replicated to 0 nodes, instead of 1
>>
>>            at
>>
>> org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123
>> )
>>
>>            at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)
>>
>>            at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
>>
>>            at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
>> )
>>
>>            at java.lang.reflect.Method.invoke(Method.java:597)
>>
>>            at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
>>
>>            at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)
>>
>>
>>
>>            at org.apache.hadoop.ipc.Client.call(Client.java:716)
>>
>>            at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>>
>>            at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
>>
>>            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>>            at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
>> )
>>
>>            at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
>> )
>>
>>            at java.lang.reflect.Method.invoke(Method.java:597)
>>
>>            at
>>
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
>> )
>>
>>            at
>>
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
>> )
>>
>>            at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
>>
>>            at
>>
>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
>> )
>>
>>            at
>>
>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
>> )
>>
>>            at
>>
>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745
>> )
>>
>>            at
>>
>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922
>> )
>>
>>
>>
>> 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad
>> datanode[0]
>>
>> java.io.IOException: Could not get block locations. Aborting...
>>
>>            at
>>
>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153
>> )
>>
>>            at
>>
>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745
>> )
>>
>>            at
>>
>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899
>> )
>>
>>
>

Re: Could only be replicated to 0 nodes, instead of 1

Posted by Stas Oskin <st...@gmail.com>.
Hi.

I think you should file a jira on this. Most likely this is what is
> happening :
>

Will do - this goes to DFS section, correct?


>
>  * two out of 3 dns can not take anymore blocks.
>  * While picking nodes for a new block, NN mostly skips the third dn as
> well since '# active writes' on it is larger than '2 * avg'.
>  * Even if there is one other block is being written on the 3rd, it is
> still greater than (2 * 1/3).
>

Frankly I'm not so familiar with Hadoop inner workings to understand this
completely, but from what I digest, NN doesn't like the 3rd DN because there
is too many blocks on it, compared to other servers?


>
> To test this, if you write just one block to an idle cluster it should
> succeed.
>

What exactly is "idle cluster"? Something that nothing is being written to
(including the 3rd DN)?


>
> Writing from the client on the 3rd dn succeeds since local node is always
> favored.


Makes sense.


>
> This particular problem is not that severe on a large cluster but HDFS
> should do the sensible thing.
>

Yes, I agree that this is a non-standard situation, but IMHO the best way of
action would be write anyway, but throw a warning. There is one already
appearing when there is not enough space for replication, and it explains
quite well the matter. So similar one would be great.

Re: Could only be replicated to 0 nodes, instead of 1

Posted by Stas Oskin <st...@gmail.com>.
>
>
> Next time, it would be better include larger stack traces, logs etc in
> subsequent comments rather than in the description.
>

Will do, thanks for the tip.

Re: Could only be replicated to 0 nodes, instead of 1

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
Stas Oskin wrote:
>> I think you should file a jira on this. Most likely this is what is
>> happening :
>>
> 
> Here it is - hope it's ok:
> 
> https://issues.apache.org/jira/browse/HADOOP-5886

looks good. I will add my earlier post as comment. You could update the 
jira with any more tests.

Next time, it would be better include larger stack traces, logs etc in 
subsequent comments rather than in the description.

Thanks,
Raghu.

Re: Could only be replicated to 0 nodes, instead of 1

Posted by Stas Oskin <st...@gmail.com>.
>
> I think you should file a jira on this. Most likely this is what is
> happening :
>

Here it is - hope it's ok:

https://issues.apache.org/jira/browse/HADOOP-5886

Re: Could only be replicated to 0 nodes, instead of 1

Posted by Stas Oskin <st...@gmail.com>.
>
> The real trick has been to make sure the balancer doesn't get stuck -- a
> Nagios plugin makes sure that the stdout has been printed to in the last
> hour or so, otherwise it kills the running balancer.  Stuck balancers have
> been an issue in the past.
>


Thanks for the advice.

Re: Could only be replicated to 0 nodes, instead of 1

Posted by Brian Bockelman <bb...@cse.unl.edu>.
On May 21, 2009, at 3:10 PM, Stas Oskin wrote:

> Hi.
>
> If this analysis is right, I would add it can happen even on large  
> clusters!
>> I've seen this error at our cluster when we're very full (>97%) and  
>> very
>> few nodes have any empty space.  This usually happens because we  
>> have two
>> very large nodes (10x bigger than the rest of the cluster), and  
>> HDFS tends
>> to distribute writes randomly -- meaning the smaller nodes fill up  
>> quickly,
>> until the balancer can catch up.
>>
>
>
> A bit of topic, do you ran the balancer manually? Or you have some  
> scheduler
> that does it?

crontab does it for us, once an hour.  We're always importing data, so  
the cluster is always out-of-balance.

If the previous balancer didn't exit, the new one will simply exit.

The real trick has been to make sure the balancer doesn't get stuck --  
a Nagios plugin makes sure that the stdout has been printed to in the  
last hour or so, otherwise it kills the running balancer.  Stuck  
balancers have been an issue in the past.

Brian

Re: Could only be replicated to 0 nodes, instead of 1

Posted by Stas Oskin <st...@gmail.com>.
Hi.

If this analysis is right, I would add it can happen even on large clusters!
>  I've seen this error at our cluster when we're very full (>97%) and very
> few nodes have any empty space.  This usually happens because we have two
> very large nodes (10x bigger than the rest of the cluster), and HDFS tends
> to distribute writes randomly -- meaning the smaller nodes fill up quickly,
> until the balancer can catch up.
>


A bit of topic, do you ran the balancer manually? Or you have some scheduler
that does it?

Re: Could only be replicated to 0 nodes, instead of 1

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
Brian Bockelman wrote:
> 
> On May 21, 2009, at 2:01 PM, Raghu Angadi wrote:
> 
>>
>> I think you should file a jira on this. Most likely this is what is 
>> happening :
>>
>> * two out of 3 dns can not take anymore blocks.
>> * While picking nodes for a new block, NN mostly skips the third dn as 
>> well since '# active writes' on it is larger than '2 * avg'.
>> * Even if there is one other block is being written on the 3rd, it is 
>> still greater than (2 * 1/3).
>>
>> To test this, if you write just one block to an idle cluster it should 
>> succeed.
>>
>> Writing from the client on the 3rd dn succeeds since local node is 
>> always favored.
>>
>> This particular problem is not that severe on a large cluster but HDFS 
>> should do the sensible thing.
>>
> 
> Hey Raghu,
> 
> If this analysis is right, I would add it can happen even on large 
> clusters!  I've seen this error at our cluster when we're very full 
> (>97%) and very few nodes have any empty space.  This usually happens 
> because we have two very large nodes (10x bigger than the rest of the 
> cluster), and HDFS tends to distribute writes randomly -- meaning the 
> smaller nodes fill up quickly, until the balancer can catch up.

Yes. This would bite when ever a large portion of nodes can not accept 
blocks. In general can happen whenever less than half the nodes have any 
space left.

Raghu.


Re: Could only be replicated to 0 nodes, instead of 1

Posted by Brian Bockelman <bb...@cse.unl.edu>.
On May 21, 2009, at 2:01 PM, Raghu Angadi wrote:

>
> I think you should file a jira on this. Most likely this is what is  
> happening :
>
> * two out of 3 dns can not take anymore blocks.
> * While picking nodes for a new block, NN mostly skips the third dn  
> as well since '# active writes' on it is larger than '2 * avg'.
> * Even if there is one other block is being written on the 3rd, it  
> is still greater than (2 * 1/3).
>
> To test this, if you write just one block to an idle cluster it  
> should succeed.
>
> Writing from the client on the 3rd dn succeeds since local node is  
> always favored.
>
> This particular problem is not that severe on a large cluster but  
> HDFS should do the sensible thing.
>

Hey Raghu,

If this analysis is right, I would add it can happen even on large  
clusters!  I've seen this error at our cluster when we're very full  
(>97%) and very few nodes have any empty space.  This usually happens  
because we have two very large nodes (10x bigger than the rest of the  
cluster), and HDFS tends to distribute writes randomly -- meaning the  
smaller nodes fill up quickly, until the balancer can catch up.

Brian

> Raghu.
>
> Stas Oskin wrote:
>> Hi.
>> I'm testing Hadoop in our lab, and started getting the following  
>> message
>> when trying to copy a file:
>> Could only be replicated to 0 nodes, instead of 1
>> I have the following setup:
>> * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB
>> * Two clients are copying files all the time (one of them is the  
>> 1.5GB
>> machine)
>> * The replication is set on 2
>> * I let the space on 2 smaller machines to end, to test the behavior
>> Now, one of the clients (the one located on 1.5GB) works fine, and  
>> the other
>> one - the external, unable to copy and displays the error + the  
>> exception
>> below
>> Any idea if this expected on my scenario? Or how it can be solved?
>> Thanks in advance.
>> 09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException  
>> sleeping
>> /test/test.bin retries left 1
>> 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception:
>> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
>> /test/test.bin could only be replicated to 0 nodes, instead of 1
>>            at
>> org 
>> .apache 
>> .hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123
>> )
>>            at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java: 
>> 330)
>>            at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown  
>> Source)
>>            at
>> sun 
>> .reflect 
>> .DelegatingMethodAccessorImpl 
>> .invoke(DelegatingMethodAccessorImpl.java:25
>> )
>>            at java.lang.reflect.Method.invoke(Method.java:597)
>>            at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
>>            at org.apache.hadoop.ipc.Server$Handler.run(Server.java: 
>> 890)
>>            at org.apache.hadoop.ipc.Client.call(Client.java:716)
>>            at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>>            at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
>>            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native  
>> Method)
>>            at
>> sun 
>> .reflect 
>> .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
>> )
>>            at
>> sun 
>> .reflect 
>> .DelegatingMethodAccessorImpl 
>> .invoke(DelegatingMethodAccessorImpl.java:25
>> )
>>            at java.lang.reflect.Method.invoke(Method.java:597)
>>            at
>> org 
>> .apache 
>> .hadoop 
>> .io 
>> .retry 
>> .RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
>> )
>>            at
>> org 
>> .apache 
>> .hadoop 
>> .io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java: 
>> 59
>> )
>>            at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
>>            at
>> org.apache.hadoop.dfs.DFSClient 
>> $DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
>> )
>>            at
>> org.apache.hadoop.dfs.DFSClient 
>> $DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
>> )
>>            at
>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access 
>> $1800(DFSClient.java:1745
>> )
>>            at
>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream 
>> $DataStreamer.run(DFSClient.java:1922
>> )
>> 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null  
>> bad
>> datanode[0]
>> java.io.IOException: Could not get block locations. Aborting...
>>            at
>> org.apache.hadoop.dfs.DFSClient 
>> $DFSOutputStream.processDatanodeError(DFSClient.java:2153
>> )
>>            at
>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access 
>> $1400(DFSClient.java:1745
>> )
>>            at
>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream 
>> $DataStreamer.run(DFSClient.java:1899
>> )


Re: Could only be replicated to 0 nodes, instead of 1

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
I think you should file a jira on this. Most likely this is what is 
happening :

  * two out of 3 dns can not take anymore blocks.
  * While picking nodes for a new block, NN mostly skips the third dn as 
well since '# active writes' on it is larger than '2 * avg'.
  * Even if there is one other block is being written on the 3rd, it is 
still greater than (2 * 1/3).

To test this, if you write just one block to an idle cluster it should 
succeed.

Writing from the client on the 3rd dn succeeds since local node is 
always favored.

This particular problem is not that severe on a large cluster but HDFS 
should do the sensible thing.

Raghu.

Stas Oskin wrote:
> Hi.
> 
> I'm testing Hadoop in our lab, and started getting the following message
> when trying to copy a file:
> Could only be replicated to 0 nodes, instead of 1
> 
> I have the following setup:
> 
> * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB
> * Two clients are copying files all the time (one of them is the 1.5GB
> machine)
> * The replication is set on 2
> * I let the space on 2 smaller machines to end, to test the behavior
> 
> Now, one of the clients (the one located on 1.5GB) works fine, and the other
> one - the external, unable to copy and displays the error + the exception
> below
> 
> Any idea if this expected on my scenario? Or how it can be solved?
> 
> Thanks in advance.
> 
> 
> 
> 09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping
> /test/test.bin retries left 1
> 
> 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /test/test.bin could only be replicated to 0 nodes, instead of 1
> 
>             at
> org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123
> )
> 
>             at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)
> 
>             at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
> 
>             at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
> )
> 
>             at java.lang.reflect.Method.invoke(Method.java:597)
> 
>             at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> 
>             at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)
> 
> 
> 
>             at org.apache.hadoop.ipc.Client.call(Client.java:716)
> 
>             at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> 
>             at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
> 
>             at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 
>             at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
> )
> 
>             at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
> )
> 
>             at java.lang.reflect.Method.invoke(Method.java:597)
> 
>             at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
> )
> 
>             at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
> )
> 
>             at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
> 
>             at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
> )
> 
>             at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
> )
> 
>             at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745
> )
> 
>             at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922
> )
> 
> 
> 
> 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad
> datanode[0]
> 
> java.io.IOException: Could not get block locations. Aborting...
> 
>             at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153
> )
> 
>             at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745
> )
> 
>             at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899
> )
> 


Re: Could only be replicated to 0 nodes, instead of 1

Posted by Stas Oskin <st...@gmail.com>.
Hi.

i)Choose a right version ( Hadoop- 0.18 is good)


I'm using 0.18.3.


>
> ii)replication should be 3 as ur having 3 modes.( Indirectly see to it that
> ur configuration is correct !!)
>

Actually I'm testing 2x replication on any number of DN's, to see how
reliable is it.


>
> Hey even i am just suggesting this as i am also a new to hadoop
>
> Ashish Pareek
>
>
> On Thu, May 21, 2009 at 2:41 PM, Stas Oskin <st...@gmail.com> wrote:
>
> > Hi.
> >
> > I'm testing Hadoop in our lab, and started getting the following message
> > when trying to copy a file:
> > Could only be replicated to 0 nodes, instead of 1
> >
> > I have the following setup:
> >
> > * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB
> > * Two clients are copying files all the time (one of them is the 1.5GB
> > machine)
> > * The replication is set on 2
> > * I let the space on 2 smaller machines to end, to test the behavior
> >
> > Now, one of the clients (the one located on 1.5GB) works fine, and the
> > other
> > one - the external, unable to copy and displays the error + the exception
> > below
> >
> > Any idea if this expected on my scenario? Or how it can be solved?
> >
> > Thanks in advance.
> >
> >
> >
> > 09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping
> > /test/test.bin retries left 1
> >
> > 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception:
> > org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> > /test/test.bin could only be replicated to 0 nodes, instead of 1
> >
> >            at
> >
> >
> org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123
> > )
> >
> >            at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)
> >
> >            at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
> >
> >            at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
> > )
> >
> >            at java.lang.reflect.Method.invoke(Method.java:597)
> >
> >            at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> >
> >            at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)
> >
> >
> >
> >            at org.apache.hadoop.ipc.Client.call(Client.java:716)
> >
> >            at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> >
> >            at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
> >
> >            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >
> >            at
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
> > )
> >
> >            at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
> > )
> >
> >            at java.lang.reflect.Method.invoke(Method.java:597)
> >
> >            at
> >
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
> > )
> >
> >            at
> >
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
> > )
> >
> >            at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
> >
> >            at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
> > )
> >
> >            at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
> > )
> >
> >            at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745
> > )
> >
> >            at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922
> > )
> >
> >
> >
> > 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad
> > datanode[0]
> >
> > java.io.IOException: Could not get block locations. Aborting...
> >
> >            at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153
> > )
> >
> >            at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745
> > )
> >
> >            at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899
> > )
> >
>

Re: Could only be replicated to 0 nodes, instead of 1

Posted by Stas Oskin <st...@gmail.com>.
Hi.

2009/5/21 jason hadoop <ja...@gmail.com>

> It does not appear that any datanodes have connected to your namenode.
> on the datanode machines look in the hadoop logs directory at the datanode
> log files.
> There should be some information there that helps you diagnose the problem.
>
> chapter 4 of my book provides some detail on work with this problem
>

NameNode web panel shows that all DataNodes are connected.

Also, as I said above, one client (same as located on the 1.5GB DataNode) is
working ok.

Anything else that I can check?

Regards.

Re: Could only be replicated to 0 nodes, instead of 1

Posted by jason hadoop <ja...@gmail.com>.
It does not appear that any datanodes have connected to your namenode.
on the datanode machines look in the hadoop logs directory at the datanode
log files.
There should be some information there that helps you diagnose the problem.

chapter 4 of my book provides some detail on work with this problem

On Thu, May 21, 2009 at 4:29 AM, ashish pareek <pa...@gmail.com> wrote:

> Hi ,
>
>    I have two suggestion
>
> i)Choose a right version ( Hadoop- 0.18 is good)
> ii)replication should be 3 as ur having 3 modes.( Indirectly see to it that
> ur configuration is correct !!)
>
> Hey even i am just suggesting this as i am also a new to hadoop
>
> Ashish Pareek
>
>
> On Thu, May 21, 2009 at 2:41 PM, Stas Oskin <st...@gmail.com> wrote:
>
> > Hi.
> >
> > I'm testing Hadoop in our lab, and started getting the following message
> > when trying to copy a file:
> > Could only be replicated to 0 nodes, instead of 1
> >
> > I have the following setup:
> >
> > * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB
> > * Two clients are copying files all the time (one of them is the 1.5GB
> > machine)
> > * The replication is set on 2
> > * I let the space on 2 smaller machines to end, to test the behavior
> >
> > Now, one of the clients (the one located on 1.5GB) works fine, and the
> > other
> > one - the external, unable to copy and displays the error + the exception
> > below
> >
> > Any idea if this expected on my scenario? Or how it can be solved?
> >
> > Thanks in advance.
> >
> >
> >
> > 09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping
> > /test/test.bin retries left 1
> >
> > 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception:
> > org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> > /test/test.bin could only be replicated to 0 nodes, instead of 1
> >
> >            at
> >
> >
> org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123
> > )
> >
> >            at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)
> >
> >            at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
> >
> >            at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
> > )
> >
> >            at java.lang.reflect.Method.invoke(Method.java:597)
> >
> >            at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> >
> >            at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)
> >
> >
> >
> >            at org.apache.hadoop.ipc.Client.call(Client.java:716)
> >
> >            at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> >
> >            at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
> >
> >            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >
> >            at
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
> > )
> >
> >            at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
> > )
> >
> >            at java.lang.reflect.Method.invoke(Method.java:597)
> >
> >            at
> >
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
> > )
> >
> >            at
> >
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
> > )
> >
> >            at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
> >
> >            at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
> > )
> >
> >            at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
> > )
> >
> >            at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745
> > )
> >
> >            at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922
> > )
> >
> >
> >
> > 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad
> > datanode[0]
> >
> > java.io.IOException: Could not get block locations. Aborting...
> >
> >            at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153
> > )
> >
> >            at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745
> > )
> >
> >            at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899
> > )
> >
>



-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422
www.prohadoopbook.com a community for Hadoop Professionals

Re: Could only be replicated to 0 nodes, instead of 1

Posted by ashish pareek <pa...@gmail.com>.
Hi ,

    I have two suggestion

i)Choose a right version ( Hadoop- 0.18 is good)
ii)replication should be 3 as ur having 3 modes.( Indirectly see to it that
ur configuration is correct !!)

Hey even i am just suggesting this as i am also a new to hadoop

Ashish Pareek


On Thu, May 21, 2009 at 2:41 PM, Stas Oskin <st...@gmail.com> wrote:

> Hi.
>
> I'm testing Hadoop in our lab, and started getting the following message
> when trying to copy a file:
> Could only be replicated to 0 nodes, instead of 1
>
> I have the following setup:
>
> * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB
> * Two clients are copying files all the time (one of them is the 1.5GB
> machine)
> * The replication is set on 2
> * I let the space on 2 smaller machines to end, to test the behavior
>
> Now, one of the clients (the one located on 1.5GB) works fine, and the
> other
> one - the external, unable to copy and displays the error + the exception
> below
>
> Any idea if this expected on my scenario? Or how it can be solved?
>
> Thanks in advance.
>
>
>
> 09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping
> /test/test.bin retries left 1
>
> 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /test/test.bin could only be replicated to 0 nodes, instead of 1
>
>            at
>
> org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123
> )
>
>            at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)
>
>            at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
>
>            at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
> )
>
>            at java.lang.reflect.Method.invoke(Method.java:597)
>
>            at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
>
>            at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)
>
>
>
>            at org.apache.hadoop.ipc.Client.call(Client.java:716)
>
>            at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>
>            at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
>
>            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>            at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
> )
>
>            at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
> )
>
>            at java.lang.reflect.Method.invoke(Method.java:597)
>
>            at
>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
> )
>
>            at
>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
> )
>
>            at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
>
>            at
>
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
> )
>
>            at
>
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
> )
>
>            at
>
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745
> )
>
>            at
>
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922
> )
>
>
>
> 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad
> datanode[0]
>
> java.io.IOException: Could not get block locations. Aborting...
>
>            at
>
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153
> )
>
>            at
>
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745
> )
>
>            at
>
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899
> )
>