You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Chef Win2er <wi...@gmail.com> on 2016/02/12 07:01:01 UTC

Trash data after upgrade from 2.7.1 to 2.7.2

Hi Hadoop users,

I have hadoop-2.7.1 installed on my cluster with HA, 4 data nodes and 3
journal nodes.
I upgraded it to hadoop2.7.2 a a few days ago following the steps below.

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#Upgrade_without_Downtime

But today I realized that there's trash fold created in data node's data
directory and took a lot of space.

$ hdfs dfs -du -s -h
/

11.5 G  /

I set replication 2 so the disk usage may be 30G or 40G.
But actually it is 144GB.

$ hdfs dfsadmin -report
Configured Capacity: 422185762816 (393.19 GB)
Present Capacity: 415469745432 (386.94 GB)
DFS Remaining: 260712565164 (242.81 GB)
DFS Used: 154757180268 (144.13 GB)
DFS Used%: 37.25%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

By 'du -h' commnand I got the result below.

......
11G     ./datanode/current/BP-606697376-<datanode
ip>-1452599640542/current/finalized/subdir0
11G     ./datanode/current/BP-606697376-<datanode
ip>-1452599640542/current/finalized
11G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/current
...
38G     ./datanode/current/BP-606697376-<datanode
ip>-1452599640542/trash/finalized/subdir0
38G     ./datanode/current/BP-606697376-<datanode
ip>-1452599640542/trash/finalized
38G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/trash
...

Could anyone help me with this?

Thanks
MA

RE: Trash data after upgrade from 2.7.1 to 2.7.2

Posted by Vinayakumar B <vi...@huawei.com>.

Hi Chef,

   Can you confirm the below points?


1)      Did you upgrade all datanodes to 2.7.2?

2)      Did you finalized the upgrade using the following command?
Run "hdfs dfsadmin -rollingUpgrade finalize<https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#dfsadmin_-rollingUpgrade>" to finalize the rolling upgrade.
If the finalize is not executed, all the blocks which were present before upgrade, will be moved to trash on deletion.
 So to save the space, if you are trying to delete old files on upgraded ( but not finalized) cluster, will not actually save anything on disk.
-vinay

From: Chef Win2er [mailto:win2erchef@gmail.com]
Sent: 12 February 2016 11:31
To: user@hadoop.apache.org
Subject: Trash data after upgrade from 2.7.1 to 2.7.2

Hi Hadoop users,

I have hadoop-2.7.1 installed on my cluster with HA, 4 data nodes and 3 journal nodes.
I upgraded it to hadoop2.7.2 a a few days ago following the steps below.

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#Upgrade_without_Downtime
But today I realized that there's trash fold created in data node's data directory and took a lot of space.

$ hdfs dfs -du -s -h /
11.5 G  /

I set replication 2 so the disk usage may be 30G or 40G.
But actually it is 144GB.

$ hdfs dfsadmin -report
Configured Capacity: 422185762816 (393.19 GB)
Present Capacity: 415469745432 (386.94 GB)
DFS Remaining: 260712565164 (242.81 GB)
DFS Used: 154757180268 (144.13 GB)
DFS Used%: 37.25%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

By 'du -h' commnand I got the result below.

......
11G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/current/finalized/subdir0
11G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/current/finalized
11G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/current
...
38G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/trash/finalized/subdir0
38G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/trash/finalized
38G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/trash
...
Could anyone help me with this?

Thanks
MA

RE: Trash data after upgrade from 2.7.1 to 2.7.2

Posted by Vinayakumar B <vi...@huawei.com>.

Hi Chef,

   Can you confirm the below points?


1)      Did you upgrade all datanodes to 2.7.2?

2)      Did you finalized the upgrade using the following command?
Run "hdfs dfsadmin -rollingUpgrade finalize<https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#dfsadmin_-rollingUpgrade>" to finalize the rolling upgrade.
If the finalize is not executed, all the blocks which were present before upgrade, will be moved to trash on deletion.
 So to save the space, if you are trying to delete old files on upgraded ( but not finalized) cluster, will not actually save anything on disk.
-vinay

From: Chef Win2er [mailto:win2erchef@gmail.com]
Sent: 12 February 2016 11:31
To: user@hadoop.apache.org
Subject: Trash data after upgrade from 2.7.1 to 2.7.2

Hi Hadoop users,

I have hadoop-2.7.1 installed on my cluster with HA, 4 data nodes and 3 journal nodes.
I upgraded it to hadoop2.7.2 a a few days ago following the steps below.

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#Upgrade_without_Downtime
But today I realized that there's trash fold created in data node's data directory and took a lot of space.

$ hdfs dfs -du -s -h /
11.5 G  /

I set replication 2 so the disk usage may be 30G or 40G.
But actually it is 144GB.

$ hdfs dfsadmin -report
Configured Capacity: 422185762816 (393.19 GB)
Present Capacity: 415469745432 (386.94 GB)
DFS Remaining: 260712565164 (242.81 GB)
DFS Used: 154757180268 (144.13 GB)
DFS Used%: 37.25%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

By 'du -h' commnand I got the result below.

......
11G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/current/finalized/subdir0
11G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/current/finalized
11G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/current
...
38G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/trash/finalized/subdir0
38G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/trash/finalized
38G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/trash
...
Could anyone help me with this?

Thanks
MA

Re: Trash data after upgrade from 2.7.1 to 2.7.2

Posted by "Marcel Mitsuto F. S." <mi...@gmail.com>.

Check if it you have the following files: *dncp_block_verification.log.curr
, dncp_block_verification.log.prev*
Usually they grow a lot and it's not reported within DFS usage statistics.



marcel mitsuto
http://about.me/djeps
[image: marcel on about.me]

<http://about.me/djeps>

On Fri, Feb 12, 2016 at 7:01 AM, Chef Win2er <wi...@gmail.com> wrote:

> Hi Hadoop users,
>
> I have hadoop-2.7.1 installed on my cluster with HA, 4 data nodes and 3
> journal nodes.
> I upgraded it to hadoop2.7.2 a a few days ago following the steps below.
>
>
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#Upgrade_without_Downtime
>
> But today I realized that there's trash fold created in data node's data
> directory and took a lot of space.
>
> $ hdfs dfs -du -s -h
> /
>
> 11.5 G  /
>
> I set replication 2 so the disk usage may be 30G or 40G.
> But actually it is 144GB.
>
> $ hdfs dfsadmin -report
> Configured Capacity: 422185762816 (393.19 GB)
> Present Capacity: 415469745432 (386.94 GB)
> DFS Remaining: 260712565164 (242.81 GB)
> DFS Used: 154757180268 (144.13 GB)
> DFS Used%: 37.25%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> Missing blocks (with replication factor 1): 0
>
> By 'du -h' commnand I got the result below.
>
> ......
> 11G     ./datanode/current/BP-606697376-<datanode
> ip>-1452599640542/current/finalized/subdir0
> 11G     ./datanode/current/BP-606697376-<datanode
> ip>-1452599640542/current/finalized
> 11G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/current
> ...
> 38G     ./datanode/current/BP-606697376-<datanode
> ip>-1452599640542/trash/finalized/subdir0
> 38G     ./datanode/current/BP-606697376-<datanode
> ip>-1452599640542/trash/finalized
> 38G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/trash
> ...
>
> Could anyone help me with this?
>
> Thanks
> MA
>

Re: Trash data after upgrade from 2.7.1 to 2.7.2

Posted by "Marcel Mitsuto F. S." <mi...@gmail.com>.

Check if it you have the following files: *dncp_block_verification.log.curr
, dncp_block_verification.log.prev*
Usually they grow a lot and it's not reported within DFS usage statistics.



marcel mitsuto
http://about.me/djeps
[image: marcel on about.me]

<http://about.me/djeps>

On Fri, Feb 12, 2016 at 7:01 AM, Chef Win2er <wi...@gmail.com> wrote:

> Hi Hadoop users,
>
> I have hadoop-2.7.1 installed on my cluster with HA, 4 data nodes and 3
> journal nodes.
> I upgraded it to hadoop2.7.2 a a few days ago following the steps below.
>
>
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#Upgrade_without_Downtime
>
> But today I realized that there's trash fold created in data node's data
> directory and took a lot of space.
>
> $ hdfs dfs -du -s -h
> /
>
> 11.5 G  /
>
> I set replication 2 so the disk usage may be 30G or 40G.
> But actually it is 144GB.
>
> $ hdfs dfsadmin -report
> Configured Capacity: 422185762816 (393.19 GB)
> Present Capacity: 415469745432 (386.94 GB)
> DFS Remaining: 260712565164 (242.81 GB)
> DFS Used: 154757180268 (144.13 GB)
> DFS Used%: 37.25%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> Missing blocks (with replication factor 1): 0
>
> By 'du -h' commnand I got the result below.
>
> ......
> 11G     ./datanode/current/BP-606697376-<datanode
> ip>-1452599640542/current/finalized/subdir0
> 11G     ./datanode/current/BP-606697376-<datanode
> ip>-1452599640542/current/finalized
> 11G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/current
> ...
> 38G     ./datanode/current/BP-606697376-<datanode
> ip>-1452599640542/trash/finalized/subdir0
> 38G     ./datanode/current/BP-606697376-<datanode
> ip>-1452599640542/trash/finalized
> 38G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/trash
> ...
>
> Could anyone help me with this?
>
> Thanks
> MA
>

Re: Trash data after upgrade from 2.7.1 to 2.7.2

Posted by "Marcel Mitsuto F. S." <mi...@gmail.com>.

Check if it you have the following files: *dncp_block_verification.log.curr
, dncp_block_verification.log.prev*
Usually they grow a lot and it's not reported within DFS usage statistics.



marcel mitsuto
http://about.me/djeps
[image: marcel on about.me]

<http://about.me/djeps>

On Fri, Feb 12, 2016 at 7:01 AM, Chef Win2er <wi...@gmail.com> wrote:

> Hi Hadoop users,
>
> I have hadoop-2.7.1 installed on my cluster with HA, 4 data nodes and 3
> journal nodes.
> I upgraded it to hadoop2.7.2 a a few days ago following the steps below.
>
>
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#Upgrade_without_Downtime
>
> But today I realized that there's trash fold created in data node's data
> directory and took a lot of space.
>
> $ hdfs dfs -du -s -h
> /
>
> 11.5 G  /
>
> I set replication 2 so the disk usage may be 30G or 40G.
> But actually it is 144GB.
>
> $ hdfs dfsadmin -report
> Configured Capacity: 422185762816 (393.19 GB)
> Present Capacity: 415469745432 (386.94 GB)
> DFS Remaining: 260712565164 (242.81 GB)
> DFS Used: 154757180268 (144.13 GB)
> DFS Used%: 37.25%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> Missing blocks (with replication factor 1): 0
>
> By 'du -h' commnand I got the result below.
>
> ......
> 11G     ./datanode/current/BP-606697376-<datanode
> ip>-1452599640542/current/finalized/subdir0
> 11G     ./datanode/current/BP-606697376-<datanode
> ip>-1452599640542/current/finalized
> 11G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/current
> ...
> 38G     ./datanode/current/BP-606697376-<datanode
> ip>-1452599640542/trash/finalized/subdir0
> 38G     ./datanode/current/BP-606697376-<datanode
> ip>-1452599640542/trash/finalized
> 38G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/trash
> ...
>
> Could anyone help me with this?
>
> Thanks
> MA
>

Re: Trash data after upgrade from 2.7.1 to 2.7.2

Posted by "Marcel Mitsuto F. S." <mi...@gmail.com>.

Check if it you have the following files: *dncp_block_verification.log.curr
, dncp_block_verification.log.prev*
Usually they grow a lot and it's not reported within DFS usage statistics.



marcel mitsuto
http://about.me/djeps
[image: marcel on about.me]

<http://about.me/djeps>

On Fri, Feb 12, 2016 at 7:01 AM, Chef Win2er <wi...@gmail.com> wrote:

> Hi Hadoop users,
>
> I have hadoop-2.7.1 installed on my cluster with HA, 4 data nodes and 3
> journal nodes.
> I upgraded it to hadoop2.7.2 a a few days ago following the steps below.
>
>
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#Upgrade_without_Downtime
>
> But today I realized that there's trash fold created in data node's data
> directory and took a lot of space.
>
> $ hdfs dfs -du -s -h
> /
>
> 11.5 G  /
>
> I set replication 2 so the disk usage may be 30G or 40G.
> But actually it is 144GB.
>
> $ hdfs dfsadmin -report
> Configured Capacity: 422185762816 (393.19 GB)
> Present Capacity: 415469745432 (386.94 GB)
> DFS Remaining: 260712565164 (242.81 GB)
> DFS Used: 154757180268 (144.13 GB)
> DFS Used%: 37.25%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> Missing blocks (with replication factor 1): 0
>
> By 'du -h' commnand I got the result below.
>
> ......
> 11G     ./datanode/current/BP-606697376-<datanode
> ip>-1452599640542/current/finalized/subdir0
> 11G     ./datanode/current/BP-606697376-<datanode
> ip>-1452599640542/current/finalized
> 11G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/current
> ...
> 38G     ./datanode/current/BP-606697376-<datanode
> ip>-1452599640542/trash/finalized/subdir0
> 38G     ./datanode/current/BP-606697376-<datanode
> ip>-1452599640542/trash/finalized
> 38G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/trash
> ...
>
> Could anyone help me with this?
>
> Thanks
> MA
>

RE: Trash data after upgrade from 2.7.1 to 2.7.2

Posted by Vinayakumar B <vi...@huawei.com>.

Hi Chef,

   Can you confirm the below points?


1)      Did you upgrade all datanodes to 2.7.2?

2)      Did you finalized the upgrade using the following command?
Run "hdfs dfsadmin -rollingUpgrade finalize<https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#dfsadmin_-rollingUpgrade>" to finalize the rolling upgrade.
If the finalize is not executed, all the blocks which were present before upgrade, will be moved to trash on deletion.
 So to save the space, if you are trying to delete old files on upgraded ( but not finalized) cluster, will not actually save anything on disk.
-vinay

From: Chef Win2er [mailto:win2erchef@gmail.com]
Sent: 12 February 2016 11:31
To: user@hadoop.apache.org
Subject: Trash data after upgrade from 2.7.1 to 2.7.2

Hi Hadoop users,

I have hadoop-2.7.1 installed on my cluster with HA, 4 data nodes and 3 journal nodes.
I upgraded it to hadoop2.7.2 a a few days ago following the steps below.

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#Upgrade_without_Downtime
But today I realized that there's trash fold created in data node's data directory and took a lot of space.

$ hdfs dfs -du -s -h /
11.5 G  /

I set replication 2 so the disk usage may be 30G or 40G.
But actually it is 144GB.

$ hdfs dfsadmin -report
Configured Capacity: 422185762816 (393.19 GB)
Present Capacity: 415469745432 (386.94 GB)
DFS Remaining: 260712565164 (242.81 GB)
DFS Used: 154757180268 (144.13 GB)
DFS Used%: 37.25%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

By 'du -h' commnand I got the result below.

......
11G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/current/finalized/subdir0
11G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/current/finalized
11G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/current
...
38G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/trash/finalized/subdir0
38G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/trash/finalized
38G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/trash
...
Could anyone help me with this?

Thanks
MA

RE: Trash data after upgrade from 2.7.1 to 2.7.2

Posted by Vinayakumar B <vi...@huawei.com>.

Hi Chef,

   Can you confirm the below points?


1)      Did you upgrade all datanodes to 2.7.2?

2)      Did you finalized the upgrade using the following command?
Run "hdfs dfsadmin -rollingUpgrade finalize<https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#dfsadmin_-rollingUpgrade>" to finalize the rolling upgrade.
If the finalize is not executed, all the blocks which were present before upgrade, will be moved to trash on deletion.
 So to save the space, if you are trying to delete old files on upgraded ( but not finalized) cluster, will not actually save anything on disk.
-vinay

From: Chef Win2er [mailto:win2erchef@gmail.com]
Sent: 12 February 2016 11:31
To: user@hadoop.apache.org
Subject: Trash data after upgrade from 2.7.1 to 2.7.2

Hi Hadoop users,

I have hadoop-2.7.1 installed on my cluster with HA, 4 data nodes and 3 journal nodes.
I upgraded it to hadoop2.7.2 a a few days ago following the steps below.

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#Upgrade_without_Downtime
But today I realized that there's trash fold created in data node's data directory and took a lot of space.

$ hdfs dfs -du -s -h /
11.5 G  /

I set replication 2 so the disk usage may be 30G or 40G.
But actually it is 144GB.

$ hdfs dfsadmin -report
Configured Capacity: 422185762816 (393.19 GB)
Present Capacity: 415469745432 (386.94 GB)
DFS Remaining: 260712565164 (242.81 GB)
DFS Used: 154757180268 (144.13 GB)
DFS Used%: 37.25%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

By 'du -h' commnand I got the result below.

......
11G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/current/finalized/subdir0
11G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/current/finalized
11G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/current
...
38G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/trash/finalized/subdir0
38G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/trash/finalized
38G     ./datanode/current/BP-606697376-<datanode ip>-1452599640542/trash
...
Could anyone help me with this?

Thanks
MA