You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by bourne1900 <bo...@yahoo.cn> on 2011/10/18 11:50:16 UTC

could not complete file...

Hi,

There are 20 threads which put file into HDFS ceaseless, every file is 2k.
When 1 million files have finished, client begin throw "coulod not complete file" exception  ceaseless.
At that time, datanode is hang-up.

I think maybe heart beat is lost, so namenode does not know the state of datanode. But I do not know why heart beat have lost. Is there any info can be found from log when datanode can not send heart beat?  

Thanks and regards!
bourne

Re: Re: could not complete file...

Posted by bourne1900 <bo...@yahoo.cn>.

Thank you for your reply.

There is "PIPE ERROR" in datanode log, and nothing else. 
Client only shows "Could not complete file" ceaselessly.

From "namonodeIP:50070/dfshealth.jsp ", I found the datanode is hang-up, and there is only a datanode in my cluster :)

BTW, the retry times is unlimit I think, my hadoop version is 0.20.2, the DataNode.java is
--------------------------------
while (!fileComplete) {
          fileComplete = namenode.complete(src, clientName);
          if (!fileComplete) {
            try {
              Thread.sleep(400);
              if (System.currentTimeMillis() - localstart > 5000) {
                LOG.info("Could not complete file " + src + " retrying...");
              }
            } catch (InterruptedException ie) {
            }
          }
        }
--------------------------------

bourne1900 

Sender: Uma Maheswara Rao G 72686
Date: 2011年10月18日(星期二) 下午6:00
To: common-user
CC: common-user
Subject: Re: could not complete file...
----- Original Message -----
From: bourne1900 <bo...@yahoo.cn>
Date: Tuesday, October 18, 2011 3:21 pm
Subject: could not complete file...
To: common-user <co...@hadoop.apache.org>

> Hi,
> 
> There are 20 threads which put file into HDFS ceaseless, every 
> file is 2k.
> When 1 million files have finished, client begin throw "coulod not 
> complete file" exception  ceaseless.
 Could not complete file log is actually info log. This will be logged from client when closing the file. It will retry for some time (i remember 100 times) to ensure the suuceefull writes.
Did you observe any write failures here?

> At that time, datanode is hang-up.
> 
> I think maybe heart beat is lost, so namenode does not know the 
> state of datanode. But I do not know why heart beat have lost. Is 
> there any info can be found from log when datanode can not send 
> heart beat?
Can you check the NN UI to verify the number of live nodes. By this we can decide whether DN stopped sending heartbeats or not.  
> 
> Thanks and regards!
> bourne

Regards,
Uma

Re: could not complete file...

Posted by Uma Maheswara Rao G 72686 <ma...@huawei.com>.

----- Original Message -----
From: bourne1900 <bo...@yahoo.cn>
Date: Tuesday, October 18, 2011 3:21 pm
Subject: could not complete file...
To: common-user <co...@hadoop.apache.org>

> Hi,
> 
> There are 20 threads which put file into HDFS ceaseless, every 
> file is 2k.
> When 1 million files have finished, client begin throw "coulod not 
> complete file" exception  ceaseless.
 Could not complete file log is actually info log. This will be logged from client when closing the file. It will retry for some time (i remember 100 times) to ensure the suuceefull writes.
Did you observe any write failures here?

> At that time, datanode is hang-up.
> 
> I think maybe heart beat is lost, so namenode does not know the 
> state of datanode. But I do not know why heart beat have lost. Is 
> there any info can be found from log when datanode can not send 
> heart beat?
Can you check the NN UI to verify the number of live nodes. By this we can decide whether DN stopped sending heartbeats or not.  
> 
> Thanks and regards!
> bourne

Regards,
Uma