You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Andrew Chi <ch...@gmail.com> on 2021/09/13 21:59:56 UTC

HDFS datanode: rename local filesystem directories?

I've had a recent drive failure that resulted in the removal of several
drives from an HDFS datanode machine (Hadoop version 3.3.0).  This caused
Linux to rename half of the drives in /dev/*, with the result that when we
mount the drives, the original directory mapping no longer exists.  The
data on those drives still exists, so this is equivalent to a renaming of
the local filesystem directories.

Originally, we had:
/hadoop/data/path/a
/hadoop/data/path/b
/hadoop/data/path/c

Now we have:
/hadoop/data/path/x
/hadoop/data/path/y
/hadoop/data/path/z

Where it's not clear how {a,b,c} map on to {x,y,z}.  The blocks have been
preserved within the directories, but the directories have essentially been
randomly permuted.

Can I simply go to hdfs-site.xml and change dfs.datanode.data.dir to the
new list of comma-separated directories /hadoop/data/path/{x,y,z}?  Will
the datanode still work correctly when I start it back up?

Thanks!
Andrew

Re: HDFS datanode: rename local filesystem directories?

Posted by Bob Metelsky <bo...@pm.me.INVALID>.

That's very good to know. Thanks for the update. I was wondering what happened :-)

Good to hear it was seamless/easy

Bob

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, September 17th, 2021 at 11:33 AM, Andrew Chi <ch...@gmail.com> wrote:

> Just wanted to give a quick update. The recovery was successful, and it turns out that you can rename the datanode directories (and just update dfs.datanode.data.dir in hdfs-site.xml). Simply start the datanode up, and it will "just work". At least for Hadoop 3.3.0, the namenode does not know or care about the datanode's local filesystem organization, as long as all the blocks are accounted for when the datanode starts up.
>
> Originally, we had:
> /hadoop/data/path/a
> /hadoop/data/path/b
> /hadoop/data/path/c
>
> Now we have:
> /hadoop/data/path/x
> /hadoop/data/path/y
> /hadoop/data/path/z
>
> Due to some drive failures and subsequent shuffling of /dev/sd*, we had lost the mapping of {a,b,c} on to {x,y,z}. Fortunately, the design of HDFS decouples the namespace from the physical data storage, and therefore this datanode recovery was seamless.

Re: HDFS datanode: rename local filesystem directories?

Posted by Andrew Chi <ch...@gmail.com>.

Just wanted to give a quick update.  The recovery was successful, and it
turns out that you can rename the datanode directories (and just update
dfs.datanode.data.dir in hdfs-site.xml).  Simply start the datanode up, and
it will "just work".  At least for Hadoop 3.3.0, the namenode does not know
or care about the datanode's local filesystem organization, as long as all
the blocks are accounted for when the datanode starts up.

Originally, we had:
/hadoop/data/path/a
/hadoop/data/path/b
/hadoop/data/path/c

Now we have:
/hadoop/data/path/x
/hadoop/data/path/y
/hadoop/data/path/z

Due to some drive failures and subsequent shuffling of /dev/sd*, we had
lost the mapping of {a,b,c} on to {x,y,z}.  Fortunately, the design of HDFS
decouples the namespace from the physical data storage, and therefore this
datanode recovery was seamless.

Re: HDFS datanode: rename local filesystem directories?

Posted by Andrew Chi <ch...@gmail.com>.

Thanks, Bob.  Comments inline.

On Mon, Sep 13, 2021 at 8:06 PM Bob Metelsky <bo...@pm.me> wrote:

> I think there are some fsck queries you can run where it will show the
> full path and then (MISSING) you can find that with google pretty easy.
>
Think of it… the namenode has to keep track where all the blocks are
> something like hostname/path that’s the job of the nn
>

The command "hdfs fsck /path/to/file.txt -files -blocks -locations" shows
the datanode IP address/port but not the local filesystem path on that
datanode.

Are you sure the NameNode actually knows those paths?  The code suggests
that it doesn't:
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java#L203-L208

 * The NameNode controls two critical tables:
 *   1)  filename{@literal ->}blocksequence (namespace)
 *   2)  block{@literal ->}machinelist ("inodes")
 *
 * The first table is stored on disk and is very precious.
 * The second table is rebuilt every time the NameNode comes up.

If you can (I would) let all the missing block rebalance, during that time
> try to identify the missing file paths. Once you find that, introduce one
> drive at a time and see…
>

I'll try this on a non-production setup first, and then apply it to the
production cluster if it works.  Thanks.

Re: HDFS datanode: rename local filesystem directories?

Posted by Bob Metelsky <bo...@pm.me.INVALID>.

I think there are some fsck queries you can run where it will show the full path and then (MISSING) you can find that with google pretty easy.

Think of it… the namenode has to keep track where all the blocks are something like hostname/path that’s the job of the nn

If you can (I would) let all the missing block rebalance, during that time try to identify the missing file paths. Once you find that, introduce one drive at a time and see…

On Mon, Sep 13, 2021 at 7:37 PM, Andrew Chi <ch...@gmail.com> wrote:

> Thanks a bunch. Most of the data was replicated on other datanodes, but there were some blocks that at the time of failure were only on the single datanode with the failed drives.
>
> I did look at the namenode.log, but it seems that for each block, the central log only provides the IP address of the datanode(s) on which the block is replicated. That suggests to me that the datanode's local filesystem path information is contained only on the datanode itself, but I can't figure out where. Perhaps the directory doesn't matter as long as the storageID is correct (in the current/VERSION file). But I'd like to verify this before starting up the datanode and potentially corrupting the HDFS filesystem.
>
> Is there datanode state stored anywhere other than the */current/ and the */lost+found/ directories?
>
> On Mon, Sep 13, 2021, 7:24 PM Bob Metelsky <bo...@pm.me> wrote:
>
>> Just throwing some ideas out here...
>>
>> if all the failed drives were on one server, its likely the blocks are replicated on other nodes. So you can
>>
>> hdfs dfsadmin - report |head -13
>> and look for under replicated blocks
>> you can put that in a loop and watch the count go down, eventually you will be left with actual missing blocks
>>
>> while true
>> do
>> hdfs dfsadmin - report |head -13
>> sleep 600
>> done
>>
>> you can also run some queries
>> https://knpcode.com/hadoop/hdfs/how-to-fix-corrupt-blocks-and-under-replicated-blocks-in-hdfs/
>>
>> Its very likely most of the data is replicated on other disks/nodes
>> you may also get some insight to actual path names by tailing the namenode.log
>>
>> Just ideas off the top of my head
>>
>> Good luck
>>
>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>> On Monday, September 13th, 2021 at 7:03 PM, Bob Metelsky <bo...@pm.me> wrote:
>>
>>> Hi berfore doing that, I would ls-ltR >
>>> filename.txt on each disk and see if there are hints/references to the original file system. That may help provide a more meaningful path to to HD’s-site.xml. Generally it sounds pretty close
>>>
>>> Let us know how it goes
>>>
>>> On Mon, Sep 13, 2021 at 5:59 PM, Andrew Chi <ch...@gmail.com> wrote:
>>>
>>>> I've had a recent drive failure that resulted in the removal of several drives from an HDFS datanode machine (Hadoop version 3.3.0). This caused Linux to rename half of the drives in /dev/*, with the result that when we mount the drives, the original directory mapping no longer exists. The data on those drives still exists, so this is equivalent to a renaming of the local filesystem directories.
>>>>
>>>> Originally, we had:
>>>> /hadoop/data/path/a
>>>> /hadoop/data/path/b
>>>> /hadoop/data/path/c
>>>>
>>>> Now we have:
>>>> /hadoop/data/path/x
>>>> /hadoop/data/path/y
>>>> /hadoop/data/path/z
>>>>
>>>> Where it's not clear how {a,b,c} map on to {x,y,z}. The blocks have been preserved within the directories, but the directories have essentially been randomly permuted.
>>>>
>>>> Can I simply go to hdfs-site.xml and change dfs.datanode.data.dir to the new list of comma-separated directories /hadoop/data/path/{x,y,z}? Will the datanode still work correctly when I start it back up?
>>>>
>>>> Thanks!
>>>> Andrew

Re: HDFS datanode: rename local filesystem directories?

Posted by Andrew Chi <ch...@gmail.com>.

Thanks a bunch. Most of the data was replicated on other datanodes, but
there were some blocks that at the time of failure were only on the single
datanode with the failed drives.

I did look at the namenode.log, but it seems that for each block, the
central log only provides the IP address of the datanode(s) on which the
block is replicated. That suggests to me that the datanode's local
filesystem path information is contained only on the datanode itself, but I
can't figure out where. Perhaps the directory doesn't matter as long as the
storageID is correct (in the current/VERSION file).  But I'd like to verify
this before starting up the datanode and potentially corrupting the HDFS
filesystem.

Is there datanode state stored anywhere other than the */current/ and the
*/lost+found/ directories?

On Mon, Sep 13, 2021, 7:24 PM Bob Metelsky <bo...@pm.me> wrote:

> Just throwing some ideas out here...
>
> if all the failed drives were on  one server, its likely the blocks are
> replicated on other nodes. So you can
>
> hdfs dfsadmin - report |head -13
> and look for under replicated blocks
> you can put that in a loop and watch the count go down, eventually you
> will be left with actual missing blocks
>
> while true
> do
> hdfs dfsadmin - report |head -13
> sleep 600
> done
>
> you can also run some queries
>
> https://knpcode.com/hadoop/hdfs/how-to-fix-corrupt-blocks-and-under-replicated-blocks-in-hdfs/
>
> Its very likely most of the data is replicated on other disks/nodes
> you may also get some insight to actual path names by tailing the
> namenode.log
>
> Just ideas off the top of my head
>
> Good luck
>
>
>
>
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Monday, September 13th, 2021 at 7:03 PM, Bob Metelsky <
> bob.metelsky@pm.me> wrote:
>
> Hi berfore doing that, I would ls-ltR >
> filename.txt on each disk and see if there are hints/references to the
> original file system. That may help provide a more meaningful path to to
> HD’s-site.xml. Generally it sounds pretty close
>
> Let us know how it goes
>
>
> On Mon, Sep 13, 2021 at 5:59 PM, Andrew Chi <ch...@gmail.com> wrote:
>
> I've had a recent drive failure that resulted in the removal of several
> drives from an HDFS datanode machine (Hadoop version 3.3.0).  This caused
> Linux to rename half of the drives in /dev/*, with the result that when we
> mount the drives, the original directory mapping no longer exists.  The
> data on those drives still exists, so this is equivalent to a renaming of
> the local filesystem directories.
>
> Originally, we had:
> /hadoop/data/path/a
> /hadoop/data/path/b
> /hadoop/data/path/c
>
> Now we have:
> /hadoop/data/path/x
> /hadoop/data/path/y
> /hadoop/data/path/z
>
> Where it's not clear how {a,b,c} map on to {x,y,z}.  The blocks have been
> preserved within the directories, but the directories have essentially been
> randomly permuted.
>
> Can I simply go to hdfs-site.xml and change dfs.datanode.data.dir to the
> new list of comma-separated directories /hadoop/data/path/{x,y,z}?  Will
> the datanode still work correctly when I start it back up?
>
> Thanks!
> Andrew
>
>
>
>
>

Re: HDFS datanode: rename local filesystem directories?

Posted by Bob Metelsky <bo...@pm.me.INVALID>.

Just throwing some ideas out here...

if all the failed drives were on one server, its likely the blocks are replicated on other nodes. So you can

hdfs dfsadmin - report |head -13
and look for under replicated blocks
you can put that in a loop and watch the count go down, eventually you will be left with actual missing blocks

while true
do
hdfs dfsadmin - report |head -13
sleep 600
done

you can also run some queries
https://knpcode.com/hadoop/hdfs/how-to-fix-corrupt-blocks-and-under-replicated-blocks-in-hdfs/

Its very likely most of the data is replicated on other disks/nodes
you may also get some insight to actual path names by tailing the namenode.log

Just ideas off the top of my head

Good luck

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, September 13th, 2021 at 7:03 PM, Bob Metelsky <bo...@pm.me> wrote:

> Hi berfore doing that, I would ls-ltR >
> filename.txt on each disk and see if there are hints/references to the original file system. That may help provide a more meaningful path to to HD’s-site.xml. Generally it sounds pretty close
>
> Let us know how it goes
>
> On Mon, Sep 13, 2021 at 5:59 PM, Andrew Chi <ch...@gmail.com> wrote:
>
>> I've had a recent drive failure that resulted in the removal of several drives from an HDFS datanode machine (Hadoop version 3.3.0). This caused Linux to rename half of the drives in /dev/*, with the result that when we mount the drives, the original directory mapping no longer exists. The data on those drives still exists, so this is equivalent to a renaming of the local filesystem directories.
>>
>> Originally, we had:
>> /hadoop/data/path/a
>> /hadoop/data/path/b
>> /hadoop/data/path/c
>>
>> Now we have:
>> /hadoop/data/path/x
>> /hadoop/data/path/y
>> /hadoop/data/path/z
>>
>> Where it's not clear how {a,b,c} map on to {x,y,z}. The blocks have been preserved within the directories, but the directories have essentially been randomly permuted.
>>
>> Can I simply go to hdfs-site.xml and change dfs.datanode.data.dir to the new list of comma-separated directories /hadoop/data/path/{x,y,z}? Will the datanode still work correctly when I start it back up?
>>
>> Thanks!
>> Andrew

Re: HDFS datanode: rename local filesystem directories?

Posted by Bob Metelsky <bo...@pm.me.INVALID>.

Hi berfore doing that, I would ls-ltR >
filename.txt on each disk and see if there are hints/references to the original file system. That may help provide a more meaningful path to to HD’s-site.xml. Generally it sounds pretty close

Let us know how it goes

On Mon, Sep 13, 2021 at 5:59 PM, Andrew Chi <ch...@gmail.com> wrote:

> I've had a recent drive failure that resulted in the removal of several drives from an HDFS datanode machine (Hadoop version 3.3.0). This caused Linux to rename half of the drives in /dev/*, with the result that when we mount the drives, the original directory mapping no longer exists. The data on those drives still exists, so this is equivalent to a renaming of the local filesystem directories.
>
> Originally, we had:
> /hadoop/data/path/a
> /hadoop/data/path/b
> /hadoop/data/path/c
>
> Now we have:
> /hadoop/data/path/x
> /hadoop/data/path/y
> /hadoop/data/path/z
>
> Where it's not clear how {a,b,c} map on to {x,y,z}. The blocks have been preserved within the directories, but the directories have essentially been randomly permuted.
>
> Can I simply go to hdfs-site.xml and change dfs.datanode.data.dir to the new list of comma-separated directories /hadoop/data/path/{x,y,z}? Will the datanode still work correctly when I start it back up?
>
> Thanks!
> Andrew