You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Sagar Naik <sn...@attributor.com> on 2008/11/14 19:38:45 UTC

Recovery of files in hadoop 18

Hi,
I accidentally deleted the root folder in our hdfs.
I have stopped the hdfs

Is there any way to recover the files from secondary namenode

Pl help


-Sagar



Re: java.io.IOException: Could not get block locations. Aborting...

Posted by jason hadoop <ja...@gmail.com>.
You will have to increase the per user file descriptor limit.
For most linux machines the file /etc/security/limits.conf controls this on
a per user basis.
You will need to log in a fresh shell session after making the changes, to
see them. Any login shells started before the change and process started by
those shell will have the old limits.

If you are opening vast numbers of files you may need to increase the per
system limits, via the /etc/sysctl.conf file and the fs.file-max parameter.
This page seems to be a decent reference:
http://bloggerdigest.blogspot.com/2006/10/file-descriptors-vs-linux-performance.html


On Mon, Feb 9, 2009 at 1:01 PM, Scott Whitecross <sc...@dataxu.com> wrote:

> Hi all -
>
> I've been running into this error the past few days:
> java.io.IOException: Could not get block locations. Aborting...
>        at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
>        at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
>        at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
>
> It seems to be related to trying to write to many files to HDFS.  I have a
> class extending org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I
> output to a few file names, everything works.  However, if I output to
> thousands of small files, the above error occurs.  I'm having trouble
> isolating the problem, as the problem doesn't occur in the debugger
> unfortunately.
>
> Is this a memory issue, or is there an upper limit to the number of files
> HDFS can hold?  Any settings to adjust?
>
> Thanks.

Re: java.io.IOException: Could not get block locations. Aborting...

Posted by Wu Wei <we...@alibaba-inc.com>.
We got the same problem as you when using MultipleOutputFormat both on 
hadoop 0.18 and 0.19. On hadoop 0.18, increasing the xceivers count does 
not fix the problem. But we found many error message complaining that 
xceiverCount exceeded the limit of concurrent xcievers in datanode 
(running on hadoop 0.19) log. After we increased the xceivers count, the 
problem was gone.

I guess you are using hadoop 0.18. Please try 0.19.

Good luck.

Scott Whitecross wrote:
> I tried modifying the settings, and I'm still running into the same 
> issue.  I increased the xceivers count (fs.datanode.max.xcievers) in 
> the hadoop-site.xml file.  I also checked to make sure the file 
> handles were increased, but they were fairly high to begin with.
>
> I don't think I'm dealing with anything out of the ordinary either.  
> I'm process three large 'log' files, totaling around 5 GB, and 
> producing around 8000 output files after some data processing, 
> probably totals 6 or 7 gig.   In the past, I've produced a lot fewer 
> files, and that has been fine.  When I change the process to output to 
> just a few files, no problem again.
>
> Anything else beyond the limits?  Is HDFS creating a substantial 
> amount of temp files as well?
>
>
>
>
>
>
> On Feb 9, 2009, at 8:11 PM, Bryan Duxbury wrote:
>
>> Correct.
>>
>> +1 to Jason's more unix file handles suggestion. That's a must-have.
>>
>> -Bryan
>>
>> On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote:
>>
>>> This would be an addition to the hadoop-site.xml file, to up 
>>> dfs.datanode.max.xcievers?
>>>
>>> Thanks.
>>>
>>>
>>>
>>> On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote:
>>>
>>>> Small files are bad for hadoop. You should avoid keeping a lot of 
>>>> small files if possible.
>>>>
>>>> That said, that error is something I've seen a lot. It usually 
>>>> happens when the number of xcievers hasn't been adjusted upwards 
>>>> from the default of 256. We run with 8000 xcievers, and that seems 
>>>> to solve our problems. I think that if you have a lot of open 
>>>> files, this problem happens a lot faster.
>>>>
>>>> -Bryan
>>>>
>>>> On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote:
>>>>
>>>>> Hi all -
>>>>>
>>>>> I've been running into this error the past few days:
>>>>> java.io.IOException: Could not get block locations. Aborting...
>>>>>     at 
>>>>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143) 
>>>>>
>>>>>     at 
>>>>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735) 
>>>>>
>>>>>     at 
>>>>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889) 
>>>>>
>>>>>
>>>>> It seems to be related to trying to write to many files to HDFS.  
>>>>> I have a class extending 
>>>>> org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output 
>>>>> to a few file names, everything works.  However, if I output to 
>>>>> thousands of small files, the above error occurs.  I'm having 
>>>>> trouble isolating the problem, as the problem doesn't occur in the 
>>>>> debugger unfortunately.
>>>>>
>>>>> Is this a memory issue, or is there an upper limit to the number 
>>>>> of files HDFS can hold?  Any settings to adjust?
>>>>>
>>>>> Thanks.
>>>>
>>>>
>>>
>>
>>
>
>
>

Re: java.io.IOException: Could not get block locations. Aborting...

Posted by Scott Whitecross <sc...@dataxu.com>.
I tried modifying the settings, and I'm still running into the same  
issue.  I increased the xceivers count (fs.datanode.max.xcievers) in  
the hadoop-site.xml file.  I also checked to make sure the file  
handles were increased, but they were fairly high to begin with.

I don't think I'm dealing with anything out of the ordinary either.   
I'm process three large 'log' files, totaling around 5 GB, and  
producing around 8000 output files after some data processing,  
probably totals 6 or 7 gig.   In the past, I've produced a lot fewer  
files, and that has been fine.  When I change the process to output to  
just a few files, no problem again.

Anything else beyond the limits?  Is HDFS creating a substantial  
amount of temp files as well?






On Feb 9, 2009, at 8:11 PM, Bryan Duxbury wrote:

> Correct.
>
> +1 to Jason's more unix file handles suggestion. That's a must-have.
>
> -Bryan
>
> On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote:
>
>> This would be an addition to the hadoop-site.xml file, to up  
>> dfs.datanode.max.xcievers?
>>
>> Thanks.
>>
>>
>>
>> On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote:
>>
>>> Small files are bad for hadoop. You should avoid keeping a lot of  
>>> small files if possible.
>>>
>>> That said, that error is something I've seen a lot. It usually  
>>> happens when the number of xcievers hasn't been adjusted upwards  
>>> from the default of 256. We run with 8000 xcievers, and that seems  
>>> to solve our problems. I think that if you have a lot of open  
>>> files, this problem happens a lot faster.
>>>
>>> -Bryan
>>>
>>> On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote:
>>>
>>>> Hi all -
>>>>
>>>> I've been running into this error the past few days:
>>>> java.io.IOException: Could not get block locations. Aborting...
>>>> 	at org.apache.hadoop.dfs.DFSClient 
>>>> $DFSOutputStream.processDatanodeError(DFSClient.java:2143)
>>>> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access 
>>>> $1400(DFSClient.java:1735)
>>>> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream 
>>>> $DataStreamer.run(DFSClient.java:1889)
>>>>
>>>> It seems to be related to trying to write to many files to HDFS.   
>>>> I have a class extending  
>>>> org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output  
>>>> to a few file names, everything works.  However, if I output to  
>>>> thousands of small files, the above error occurs.  I'm having  
>>>> trouble isolating the problem, as the problem doesn't occur in  
>>>> the debugger unfortunately.
>>>>
>>>> Is this a memory issue, or is there an upper limit to the number  
>>>> of files HDFS can hold?  Any settings to adjust?
>>>>
>>>> Thanks.
>>>
>>>
>>
>
>


Re: java.io.IOException: Could not get block locations. Aborting...

Posted by Brian Bockelman <bb...@cse.unl.edu>.
On Feb 9, 2009, at 7:50 PM, jason hadoop wrote:

> The other issue you may run into, with many files in your HDFS is  
> that you
> may end up with more than a few 100k worth of blocks on each of your
> datanodes. At present this can lead to instability due to the way the
> periodic block reports to the namenode are handled. The more blocks  
> per
> datanode, the larger the risk of congestion collapse in your hdfs.

Of course, if you stay below, say, 500k, you don't have much of a risk  
of congestion.

In our experience, 500k blocks or less is going to be fine with decent  
hardware.  Between 500k and 750k, you will hit a wall somewhere  
depending on your hardware.  Good luck getting anything above 750k.

The recommendation is that you keep this number as low as possible --  
and explore the limits of your system and hardware in testing before  
you discover them in production :)

Brian

>
>
> On Mon, Feb 9, 2009 at 5:11 PM, Bryan Duxbury <br...@rapleaf.com>  
> wrote:
>
>> Correct.
>>
>> +1 to Jason's more unix file handles suggestion. That's a must-have.
>>
>> -Bryan
>>
>>
>> On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote:
>>
>> This would be an addition to the hadoop-site.xml file, to up
>>> dfs.datanode.max.xcievers?
>>>
>>> Thanks.
>>>
>>>
>>>
>>> On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote:
>>>
>>> Small files are bad for hadoop. You should avoid keeping a lot of  
>>> small
>>>> files if possible.
>>>>
>>>> That said, that error is something I've seen a lot. It usually  
>>>> happens
>>>> when the number of xcievers hasn't been adjusted upwards from the  
>>>> default of
>>>> 256. We run with 8000 xcievers, and that seems to solve our  
>>>> problems. I
>>>> think that if you have a lot of open files, this problem happens  
>>>> a lot
>>>> faster.
>>>>
>>>> -Bryan
>>>>
>>>> On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote:
>>>>
>>>> Hi all -
>>>>>
>>>>> I've been running into this error the past few days:
>>>>> java.io.IOException: Could not get block locations. Aborting...
>>>>>       at
>>>>> org.apache.hadoop.dfs.DFSClient 
>>>>> $DFSOutputStream.processDatanodeError(DFSClient.java:2143)
>>>>>       at
>>>>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access 
>>>>> $1400(DFSClient.java:1735)
>>>>>       at
>>>>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream 
>>>>> $DataStreamer.run(DFSClient.java:1889)
>>>>>
>>>>> It seems to be related to trying to write to many files to  
>>>>> HDFS.  I have
>>>>> a class extending  
>>>>> org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I
>>>>> output to a few file names, everything works.  However, if I  
>>>>> output to
>>>>> thousands of small files, the above error occurs.  I'm having  
>>>>> trouble
>>>>> isolating the problem, as the problem doesn't occur in the  
>>>>> debugger
>>>>> unfortunately.
>>>>>
>>>>> Is this a memory issue, or is there an upper limit to the number  
>>>>> of
>>>>> files HDFS can hold?  Any settings to adjust?
>>>>>
>>>>> Thanks.
>>>>>
>>>>
>>>>
>>>>
>>>
>>


Re: java.io.IOException: Could not get block locations. Aborting...

Posted by jason hadoop <ja...@gmail.com>.
The other issue you may run into, with many files in your HDFS is that you
may end up with more than a few 100k worth of blocks on each of your
datanodes. At present this can lead to instability due to the way the
periodic block reports to the namenode are handled. The more blocks per
datanode, the larger the risk of congestion collapse in your hdfs.

On Mon, Feb 9, 2009 at 5:11 PM, Bryan Duxbury <br...@rapleaf.com> wrote:

> Correct.
>
> +1 to Jason's more unix file handles suggestion. That's a must-have.
>
> -Bryan
>
>
> On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote:
>
>  This would be an addition to the hadoop-site.xml file, to up
>> dfs.datanode.max.xcievers?
>>
>> Thanks.
>>
>>
>>
>> On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote:
>>
>>  Small files are bad for hadoop. You should avoid keeping a lot of small
>>> files if possible.
>>>
>>> That said, that error is something I've seen a lot. It usually happens
>>> when the number of xcievers hasn't been adjusted upwards from the default of
>>> 256. We run with 8000 xcievers, and that seems to solve our problems. I
>>> think that if you have a lot of open files, this problem happens a lot
>>> faster.
>>>
>>> -Bryan
>>>
>>> On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote:
>>>
>>>  Hi all -
>>>>
>>>> I've been running into this error the past few days:
>>>> java.io.IOException: Could not get block locations. Aborting...
>>>>        at
>>>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
>>>>        at
>>>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
>>>>        at
>>>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
>>>>
>>>> It seems to be related to trying to write to many files to HDFS.  I have
>>>> a class extending org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I
>>>> output to a few file names, everything works.  However, if I output to
>>>> thousands of small files, the above error occurs.  I'm having trouble
>>>> isolating the problem, as the problem doesn't occur in the debugger
>>>> unfortunately.
>>>>
>>>> Is this a memory issue, or is there an upper limit to the number of
>>>> files HDFS can hold?  Any settings to adjust?
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>>
>>
>

Re: java.io.IOException: Could not get block locations. Aborting...

Posted by Bryan Duxbury <br...@rapleaf.com>.
Correct.

+1 to Jason's more unix file handles suggestion. That's a must-have.

-Bryan

On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote:

> This would be an addition to the hadoop-site.xml file, to up  
> dfs.datanode.max.xcievers?
>
> Thanks.
>
>
>
> On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote:
>
>> Small files are bad for hadoop. You should avoid keeping a lot of  
>> small files if possible.
>>
>> That said, that error is something I've seen a lot. It usually  
>> happens when the number of xcievers hasn't been adjusted upwards  
>> from the default of 256. We run with 8000 xcievers, and that seems  
>> to solve our problems. I think that if you have a lot of open  
>> files, this problem happens a lot faster.
>>
>> -Bryan
>>
>> On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote:
>>
>>> Hi all -
>>>
>>> I've been running into this error the past few days:
>>> java.io.IOException: Could not get block locations. Aborting...
>>> 	at org.apache.hadoop.dfs.DFSClient 
>>> $DFSOutputStream.processDatanodeError(DFSClient.java:2143)
>>> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400 
>>> (DFSClient.java:1735)
>>> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream 
>>> $DataStreamer.run(DFSClient.java:1889)
>>>
>>> It seems to be related to trying to write to many files to HDFS.   
>>> I have a class extending  
>>> org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output  
>>> to a few file names, everything works.  However, if I output to  
>>> thousands of small files, the above error occurs.  I'm having  
>>> trouble isolating the problem, as the problem doesn't occur in  
>>> the debugger unfortunately.
>>>
>>> Is this a memory issue, or is there an upper limit to the number  
>>> of files HDFS can hold?  Any settings to adjust?
>>>
>>> Thanks.
>>
>>
>


Re: java.io.IOException: Could not get block locations. Aborting...

Posted by Scott Whitecross <sc...@dataxu.com>.
This would be an addition to the hadoop-site.xml file, to up  
dfs.datanode.max.xcievers?

Thanks.



On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote:

> Small files are bad for hadoop. You should avoid keeping a lot of  
> small files if possible.
>
> That said, that error is something I've seen a lot. It usually  
> happens when the number of xcievers hasn't been adjusted upwards  
> from the default of 256. We run with 8000 xcievers, and that seems  
> to solve our problems. I think that if you have a lot of open files,  
> this problem happens a lot faster.
>
> -Bryan
>
> On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote:
>
>> Hi all -
>>
>> I've been running into this error the past few days:
>> java.io.IOException: Could not get block locations. Aborting...
>> 	at org.apache.hadoop.dfs.DFSClient 
>> $DFSOutputStream.processDatanodeError(DFSClient.java:2143)
>> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access 
>> $1400(DFSClient.java:1735)
>> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream 
>> $DataStreamer.run(DFSClient.java:1889)
>>
>> It seems to be related to trying to write to many files to HDFS.  I  
>> have a class extending  
>> org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output  
>> to a few file names, everything works.  However, if I output to  
>> thousands of small files, the above error occurs.  I'm having  
>> trouble isolating the problem, as the problem doesn't occur in the  
>> debugger unfortunately.
>>
>> Is this a memory issue, or is there an upper limit to the number of  
>> files HDFS can hold?  Any settings to adjust?
>>
>> Thanks.
>
>


Re: java.io.IOException: Could not get block locations. Aborting...

Posted by Bryan Duxbury <br...@rapleaf.com>.
Small files are bad for hadoop. You should avoid keeping a lot of  
small files if possible.

That said, that error is something I've seen a lot. It usually  
happens when the number of xcievers hasn't been adjusted upwards from  
the default of 256. We run with 8000 xcievers, and that seems to  
solve our problems. I think that if you have a lot of open files,  
this problem happens a lot faster.

-Bryan

On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote:

> Hi all -
>
> I've been running into this error the past few days:
> java.io.IOException: Could not get block locations. Aborting...
> 	at org.apache.hadoop.dfs.DFSClient 
> $DFSOutputStream.processDatanodeError(DFSClient.java:2143)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400 
> (DFSClient.java:1735)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run 
> (DFSClient.java:1889)
>
> It seems to be related to trying to write to many files to HDFS.  I  
> have a class extending  
> org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output  
> to a few file names, everything works.  However, if I output to  
> thousands of small files, the above error occurs.  I'm having  
> trouble isolating the problem, as the problem doesn't occur in the  
> debugger unfortunately.
>
> Is this a memory issue, or is there an upper limit to the number of  
> files HDFS can hold?  Any settings to adjust?
>
> Thanks.


java.io.IOException: Could not get block locations. Aborting...

Posted by Scott Whitecross <sc...@dataxu.com>.
Hi all -

I've been running into this error the past few days:
java.io.IOException: Could not get block locations. Aborting...
	at org.apache.hadoop.dfs.DFSClient 
$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access 
$1400(DFSClient.java:1735)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream 
$DataStreamer.run(DFSClient.java:1889)

It seems to be related to trying to write to many files to HDFS.  I  
have a class extending  
org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output to a  
few file names, everything works.  However, if I output to thousands  
of small files, the above error occurs.  I'm having trouble isolating  
the problem, as the problem doesn't occur in the debugger unfortunately.

Is this a memory issue, or is there an upper limit to the number of  
files HDFS can hold?  Any settings to adjust?

Thanks.

Re: Recovery of files in hadoop 18

Posted by lohit <lo...@yahoo.com>.
Yes that is right whatever you did. One last check.
In secondary namenode log you should see the timestamp of last checkpoint. (or download of edits). Just make sure those are before when you run delete command. 
Basically, trying to make sure your delete command isn't in edits. (Another way woudl have been to open edits in hex editor or similar to check) , but this should work.
Once done, you could start.
Thanks,
Lohit



----- Original Message ----
From: Sagar Naik <sn...@attributor.com>
To: core-user@hadoop.apache.org
Sent: Friday, November 14, 2008 1:59:04 PM
Subject: Re: Recovery of files in hadoop 18

I had a secondary namenode running on the namenode machine.
I deleted the dfs.name.dir
then bin/hadoop namenode -importCheckpoint.

and restarted the dfs.

I guess the deletion of name.dir will delete the edit logs.
Can u pl tell me that this will not lead to replaying the delete 
transactions ?

Thanks for help/advice


-Sagar

lohit wrote:
> NameNode would not come out of safe mode as it is still waiting for datanodes to report those blocks which it expects. 
> I should have added, try to get a full output of fsck
> fsck <path> -openforwrite -files -blocks -location.
> -openforwrite files should tell you what files where open during the checkpoint, you might want to double check that is the case, the files were being writting during that moment. May be by looking at the filename you could tell if that was part of a job which was running.
>
> For any missing block, you might also want to cross verify on the datanode to see if is really missing.
>
> Once you are convinced that those are the only corrupt files which you can live with, start datanodes. 
> Namenode woudl still not come out of safemode as you have missing blocks, leave it for a while, run fsck look around, if everything ok, bring namenode out of safemode.
> I hope you had started this namenode with old image and empty edits. You do not want your latest edits to be replayed, which has your delete transactions.
>
> Thanks,
> Lohit
>
>
>
> ----- Original Message ----
> From: Sagar Naik <sn...@attributor.com>
> To: core-user@hadoop.apache.org
> Sent: Friday, November 14, 2008 12:11:46 PM
> Subject: Re: Recovery of files in hadoop 18
>
> Hey Lohit,
>
> Thanks for you help.
> I did as per your suggestion. imported from secondary namenode.
> we have some corrupted files.
>
> But for some reason, the namenode is still in safe_mode. It has been an hour or so.
> The fsck report is :
>
> Total size:    6954466496842 B (Total open files size: 543469222 B)
> Total dirs:    1159
> Total files:   1354155 (Files currently being written: 7673)
> Total blocks (validated):      1375725 (avg. block size 5055128 B) (Total open file blocks (not validated): 50)
> ********************************
> CORRUPT FILES:        1574
> MISSING BLOCKS:       1574
> MISSING SIZE:         1165735334 B
> CORRUPT BLOCKS:       1574
> ********************************
> Minimally replicated blocks:   1374151 (99.88559 %)
> Over-replicated blocks:        0 (0.0 %)
> Under-replicated blocks:       26619 (1.9349071 %)
> Mis-replicated blocks:         0 (0.0 %)
> Default replication factor:    3
> Average block replication:     2.977127
> Corrupt blocks:                1574
> Missing replicas:              26752 (0.65317154 %)
>
>
> Do you think, I should manually override the safemode and delete all the corrupted files and restart
>
> -Sagar
>
>
> lohit wrote:
>  
>> If you have enabled thrash. They should be moved to trash folder before permanently deleting them, restore them back. (hope you have that set fs.trash.interval)
>>
>> If not Shut down the cluster.
>> Take backup of you dfs.data.dir (both on namenode and secondary namenode).
>>
>> Secondary namenode should have last updated image, try to start namenode from that image, dont use the edits from namenode yet. Try do importCheckpoint explained in here https://issues.apache.org/jira/browse/HADOOP-2585?focusedCommentId=12558173#action_12558173. Start only namenode and run fsck -files. it will throw lot of messages saying you are missing blocks but thats fine since you havent started datanodes yet. But if it shows your files, that means they havent been deleted yet. This will give you a view of system of last backup. Start datanode If its up, try running fsck and check consistency of the sytem. you would lose all changes that has happened since the last checkpoint. 
>>
>> Hope that helps,
>> Lohit
>>
>>
>>
>> ----- Original Message ----
>> From: Sagar Naik <sn...@attributor.com>
>> To: core-user@hadoop.apache.org
>> Sent: Friday, November 14, 2008 10:38:45 AM
>> Subject: Recovery of files in hadoop 18
>>
>> Hi,
>> I accidentally deleted the root folder in our hdfs.
>> I have stopped the hdfs
>>
>> Is there any way to recover the files from secondary namenode
>>
>> Pl help
>>
>>
>> -Sagar
>>  
>>    

Re: Recovery of files in hadoop 18

Posted by Sagar Naik <sn...@attributor.com>.
I had a secondary namenode running on the namenode machine.
I deleted the dfs.name.dir
then bin/hadoop namenode -importCheckpoint.

and restarted the dfs.

I guess the deletion of name.dir will delete the edit logs.
Can u pl tell me that this will not lead to replaying the delete 
transactions ?

Thanks for help/advice


-Sagar

lohit wrote:
> NameNode would not come out of safe mode as it is still waiting for datanodes to report those blocks which it expects. 
> I should have added, try to get a full output of fsck
> fsck <path> -openforwrite -files -blocks -location.
> -openforwrite files should tell you what files where open during the checkpoint, you might want to double check that is the case, the files were being writting during that moment. May be by looking at the filename you could tell if that was part of a job which was running.
>
> For any missing block, you might also want to cross verify on the datanode to see if is really missing.
>
> Once you are convinced that those are the only corrupt files which you can live with, start datanodes. 
> Namenode woudl still not come out of safemode as you have missing blocks, leave it for a while, run fsck look around, if everything ok, bring namenode out of safemode.
> I hope you had started this namenode with old image and empty edits. You do not want your latest edits to be replayed, which has your delete transactions.
>
> Thanks,
> Lohit
>
>
>
> ----- Original Message ----
> From: Sagar Naik <sn...@attributor.com>
> To: core-user@hadoop.apache.org
> Sent: Friday, November 14, 2008 12:11:46 PM
> Subject: Re: Recovery of files in hadoop 18
>
> Hey Lohit,
>
> Thanks for you help.
> I did as per your suggestion. imported from secondary namenode.
> we have some corrupted files.
>
> But for some reason, the namenode is still in safe_mode. It has been an hour or so.
> The fsck report is :
>
> Total size:    6954466496842 B (Total open files size: 543469222 B)
> Total dirs:    1159
> Total files:   1354155 (Files currently being written: 7673)
> Total blocks (validated):      1375725 (avg. block size 5055128 B) (Total open file blocks (not validated): 50)
> ********************************
> CORRUPT FILES:        1574
> MISSING BLOCKS:       1574
> MISSING SIZE:         1165735334 B
> CORRUPT BLOCKS:       1574
> ********************************
> Minimally replicated blocks:   1374151 (99.88559 %)
> Over-replicated blocks:        0 (0.0 %)
> Under-replicated blocks:       26619 (1.9349071 %)
> Mis-replicated blocks:         0 (0.0 %)
> Default replication factor:    3
> Average block replication:     2.977127
> Corrupt blocks:                1574
> Missing replicas:              26752 (0.65317154 %)
>
>
> Do you think, I should manually override the safemode and delete all the corrupted files and restart
>
> -Sagar
>
>
> lohit wrote:
>   
>> If you have enabled thrash. They should be moved to trash folder before permanently deleting them, restore them back. (hope you have that set fs.trash.interval)
>>
>> If not Shut down the cluster.
>> Take backup of you dfs.data.dir (both on namenode and secondary namenode).
>>
>> Secondary namenode should have last updated image, try to start namenode from that image, dont use the edits from namenode yet. Try do importCheckpoint explained in here https://issues.apache.org/jira/browse/HADOOP-2585?focusedCommentId=12558173#action_12558173. Start only namenode and run fsck -files. it will throw lot of messages saying you are missing blocks but thats fine since you havent started datanodes yet. But if it shows your files, that means they havent been deleted yet. This will give you a view of system of last backup. Start datanode If its up, try running fsck and check consistency of the sytem. you would lose all changes that has happened since the last checkpoint. 
>>
>> Hope that helps,
>> Lohit
>>
>>
>>
>> ----- Original Message ----
>> From: Sagar Naik <sn...@attributor.com>
>> To: core-user@hadoop.apache.org
>> Sent: Friday, November 14, 2008 10:38:45 AM
>> Subject: Recovery of files in hadoop 18
>>
>> Hi,
>> I accidentally deleted the root folder in our hdfs.
>> I have stopped the hdfs
>>
>> Is there any way to recover the files from secondary namenode
>>
>> Pl help
>>
>>
>> -Sagar
>>  
>>     


Re: Recovery of files in hadoop 18

Posted by lohit <lo...@yahoo.com>.
NameNode would not come out of safe mode as it is still waiting for datanodes to report those blocks which it expects. 
I should have added, try to get a full output of fsck
fsck <path> -openforwrite -files -blocks -location.
-openforwrite files should tell you what files where open during the checkpoint, you might want to double check that is the case, the files were being writting during that moment. May be by looking at the filename you could tell if that was part of a job which was running.

For any missing block, you might also want to cross verify on the datanode to see if is really missing.

Once you are convinced that those are the only corrupt files which you can live with, start datanodes. 
Namenode woudl still not come out of safemode as you have missing blocks, leave it for a while, run fsck look around, if everything ok, bring namenode out of safemode.
I hope you had started this namenode with old image and empty edits. You do not want your latest edits to be replayed, which has your delete transactions.

Thanks,
Lohit



----- Original Message ----
From: Sagar Naik <sn...@attributor.com>
To: core-user@hadoop.apache.org
Sent: Friday, November 14, 2008 12:11:46 PM
Subject: Re: Recovery of files in hadoop 18

Hey Lohit,

Thanks for you help.
I did as per your suggestion. imported from secondary namenode.
we have some corrupted files.

But for some reason, the namenode is still in safe_mode. It has been an hour or so.
The fsck report is :

Total size:    6954466496842 B (Total open files size: 543469222 B)
Total dirs:    1159
Total files:   1354155 (Files currently being written: 7673)
Total blocks (validated):      1375725 (avg. block size 5055128 B) (Total open file blocks (not validated): 50)
********************************
CORRUPT FILES:        1574
MISSING BLOCKS:       1574
MISSING SIZE:         1165735334 B
CORRUPT BLOCKS:       1574
********************************
Minimally replicated blocks:   1374151 (99.88559 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       26619 (1.9349071 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.977127
Corrupt blocks:                1574
Missing replicas:              26752 (0.65317154 %)


Do you think, I should manually override the safemode and delete all the corrupted files and restart

-Sagar


lohit wrote:
> If you have enabled thrash. They should be moved to trash folder before permanently deleting them, restore them back. (hope you have that set fs.trash.interval)
> 
> If not Shut down the cluster.
> Take backup of you dfs.data.dir (both on namenode and secondary namenode).
> 
> Secondary namenode should have last updated image, try to start namenode from that image, dont use the edits from namenode yet. Try do importCheckpoint explained in here https://issues.apache.org/jira/browse/HADOOP-2585?focusedCommentId=12558173#action_12558173. Start only namenode and run fsck -files. it will throw lot of messages saying you are missing blocks but thats fine since you havent started datanodes yet. But if it shows your files, that means they havent been deleted yet. This will give you a view of system of last backup. Start datanode If its up, try running fsck and check consistency of the sytem. you would lose all changes that has happened since the last checkpoint. 
> 
> Hope that helps,
> Lohit
> 
> 
> 
> ----- Original Message ----
> From: Sagar Naik <sn...@attributor.com>
> To: core-user@hadoop.apache.org
> Sent: Friday, November 14, 2008 10:38:45 AM
> Subject: Recovery of files in hadoop 18
> 
> Hi,
> I accidentally deleted the root folder in our hdfs.
> I have stopped the hdfs
> 
> Is there any way to recover the files from secondary namenode
> 
> Pl help
> 
> 
> -Sagar
>  

Re: Recovery of files in hadoop 18

Posted by Sagar Naik <sn...@attributor.com>.
Hey Lohit,

Thanks for you help.
I did as per your suggestion. imported from secondary namenode.
we have some corrupted files.

But for some reason, the namenode is still in safe_mode. It has been an 
hour or so.
The fsck report is :

Total size:    6954466496842 B (Total open files size: 543469222 B)
 Total dirs:    1159
 Total files:   1354155 (Files currently being written: 7673)
 Total blocks (validated):      1375725 (avg. block size 5055128 B) 
(Total open file blocks (not validated): 50)
  ********************************
  CORRUPT FILES:        1574
  MISSING BLOCKS:       1574
  MISSING SIZE:         1165735334 B
  CORRUPT BLOCKS:       1574
  ********************************
 Minimally replicated blocks:   1374151 (99.88559 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       26619 (1.9349071 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.977127
 Corrupt blocks:                1574
 Missing replicas:              26752 (0.65317154 %)
 

Do you think, I should manually override the safemode and delete all the 
corrupted files and restart

-Sagar


lohit wrote:
> If you have enabled thrash. They should be moved to trash folder before permanently deleting them, restore them back. (hope you have that set fs.trash.interval)
>
> If not Shut down the cluster.
> Take backup of you dfs.data.dir (both on namenode and secondary namenode).
>
> Secondary namenode should have last updated image, try to start namenode from that image, dont use the edits from namenode yet. Try do importCheckpoint explained in here https://issues.apache.org/jira/browse/HADOOP-2585?focusedCommentId=12558173#action_12558173. Start only namenode and run fsck -files. it will throw lot of messages saying you are missing blocks but thats fine since you havent started datanodes yet. But if it shows your files, that means they havent been deleted yet. 
> This will give you a view of system of last backup. Start datanode If its up, try running fsck and check consistency of the sytem. you would lose all changes that has happened since the last checkpoint. 
>
>
> Hope that helps,
> Lohit
>
>
>
> ----- Original Message ----
> From: Sagar Naik <sn...@attributor.com>
> To: core-user@hadoop.apache.org
> Sent: Friday, November 14, 2008 10:38:45 AM
> Subject: Recovery of files in hadoop 18
>
> Hi,
> I accidentally deleted the root folder in our hdfs.
> I have stopped the hdfs
>
> Is there any way to recover the files from secondary namenode
>
> Pl help
>
>
> -Sagar
>   


Re: Recovery of files in hadoop 18

Posted by lohit <lo...@yahoo.com>.

If you have enabled thrash. They should be moved to trash folder before permanently deleting them, restore them back. (hope you have that set fs.trash.interval)

If not Shut down the cluster.
Take backup of you dfs.data.dir (both on namenode and secondary namenode).

Secondary namenode should have last updated image, try to start namenode from that image, dont use the edits from namenode yet. Try do importCheckpoint explained in here https://issues.apache.org/jira/browse/HADOOP-2585?focusedCommentId=12558173#action_12558173. Start only namenode and run fsck -files. it will throw lot of messages saying you are missing blocks but thats fine since you havent started datanodes yet. But if it shows your files, that means they havent been deleted yet. 
This will give you a view of system of last backup. Start datanode If its up, try running fsck and check consistency of the sytem. you would lose all changes that has happened since the last checkpoint. 


Hope that helps,
Lohit



----- Original Message ----
From: Sagar Naik <sn...@attributor.com>
To: core-user@hadoop.apache.org
Sent: Friday, November 14, 2008 10:38:45 AM
Subject: Recovery of files in hadoop 18

Hi,
I accidentally deleted the root folder in our hdfs.
I have stopped the hdfs

Is there any way to recover the files from secondary namenode

Pl help


-Sagar