You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Torsten Curdt <tc...@apache.org> on 2007/12/12 11:41:59 UTC

finalize upgrade

Hey guys,

triggered by a post on the mailing list I also checked our 0.14  
cluster and although we really though we did the finalize after the  
upgrade we also have a big "previous" dir there. A couple of things I  
am wondering here...

1) I thought that the data is actually not duplicated ...so why is it  
so big?
2) Is there a way of finding out whether finalize still needs to be run?

cheers
--
Torsten

Re: finalize upgrade

Posted by Torsten Curdt <tc...@apache.org>.
On 14.12.2007, at 23:35, Konstantin Shvachko wrote:

>> Well, from the output it looks like that has been run. At least I   
>> cannot see any sign telling me I still need to run it ...still  
>> was  the previous directory on the name node.
>>> The way it works in pre 0.16 is that you start the cluster, and  
>>> issue:
>>> hadoop dfsadmin -finalizeUpgrade
>> I've just run that again. Now the 'previous' dir on the namenode  
>> is  gone. But on the data nodes the 'previous' is still there.
>
> That means finalizeUpgrade has not been run before or failed for  
> some reason.
>
>>> Now, if you still want to do it manually, then yes just remove   
>>> "previous"
>>> dir on the name-node and then start the cluster.
>>> Data-nodes will finalize automatically.
>> Hmmm ...I cannot see that happening. Now what?
>
> Finalizing on the data-nodes is very lazy.
> During registrations and block reports the name-node will inform  
> data-nodes
> whether it has the previous state or not. If not the data-nodes  
> will remove
> their previous states.
> On the running cluster a complete previous state removal can take  
> up to one hour.
>
> If you need to accelerate it - restart the cluster.
> But it should be done by now. There's been 3 hours since you wrote  
> this.

Yepp ...you were right :) 'previous' is gone now.

cheers
--
Torsten

Re: finalize upgrade

Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.
> Well, from the output it looks like that has been run. At least I  
> cannot see any sign telling me I still need to run it ...still was  the 
> previous directory on the name node.
> 
>> The way it works in pre 0.16 is that you start the cluster, and issue:
>> hadoop dfsadmin -finalizeUpgrade
> 
> I've just run that again. Now the 'previous' dir on the namenode is  
> gone. But on the data nodes the 'previous' is still there.

That means finalizeUpgrade has not been run before or failed for some reason.

>> Now, if you still want to do it manually, then yes just remove  
>> "previous"
>> dir on the name-node and then start the cluster.
>> Data-nodes will finalize automatically.
> 
> Hmmm ...I cannot see that happening. Now what?

Finalizing on the data-nodes is very lazy.
During registrations and block reports the name-node will inform data-nodes
whether it has the previous state or not. If not the data-nodes will remove
their previous states.
On the running cluster a complete previous state removal can take up to one hour.

If you need to accelerate it - restart the cluster.
But it should be done by now. There's been 3 hours since you wrote this.

Re: finalize upgrade

Posted by Torsten Curdt <tc...@apache.org>.
On 14.12.2007, at 19:41, Konstantin Shvachko wrote:

> Sorry, it looks like the UI and report feature will appear only in  
> 0.16.
> It is related to HADOOP-1604.
> In general you are not supposed to remove any directories manually.

That's why I am so careful :)

> You should just use finalizeUpgrade.

Well, from the output it looks like that has been run. At least I  
cannot see any sign telling me I still need to run it ...still was  
the previous directory on the name node.

> The way it works in pre 0.16 is that you start the cluster, and issue:
> hadoop dfsadmin -finalizeUpgrade

I've just run that again. Now the 'previous' dir on the namenode is  
gone. But on the data nodes the 'previous' is still there.

> Now, if you still want to do it manually, then yes just remove  
> "previous"
> dir on the name-node and then start the cluster.
> Data-nodes will finalize automatically.

Hmmm ...I cannot see that happening. Now what?

cheers
--
Torsten

Re: finalize upgrade

Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.
Sorry, it looks like the UI and report feature will appear only in 0.16.
It is related to HADOOP-1604.
In general you are not supposed to remove any directories manually.
You should just use finalizeUpgrade.
The way it works in pre 0.16 is that you start the cluster, and issue:
hadoop dfsadmin -finalizeUpgrade

Now, if you still want to do it manually, then yes just remove "previous"
dir on the name-node and then start the cluster.
Data-nodes will finalize automatically.

Is it what you were asking?

--Konstantin

Torsten Curdt wrote:
> Can anyone confirm?
> 
> On 13.12.2007, at 09:46, Torsten Curdt wrote:
> 
>> No sign of 'upgrade still needs to be finalized' or something ...so  I 
>> assume removing the 'previous' dir is safe then?
>>
>> On 12.12.2007, at 21:18, Konstantin Shvachko wrote:
>>
>>>> 2) Is there a way of finding out whether finalize still needs to  be 
>>>> run?
>>>
>>>
>>> Yes, you can see it on the name-node web UI, and by running
>>> hadoop dfsadmin -report
>>
>>
> 
> 

Re: finalize upgrade

Posted by Torsten Curdt <tc...@apache.org>.
Can anyone confirm?

On 13.12.2007, at 09:46, Torsten Curdt wrote:

> No sign of 'upgrade still needs to be finalized' or something ...so  
> I assume removing the 'previous' dir is safe then?
>
> On 12.12.2007, at 21:18, Konstantin Shvachko wrote:
>
>>> 2) Is there a way of finding out whether finalize still needs to  
>>> be run?
>>
>> Yes, you can see it on the name-node web UI, and by running
>> hadoop dfsadmin -report
>


Re: finalize upgrade

Posted by Torsten Curdt <tc...@apache.org>.
No sign of 'upgrade still needs to be finalized' or something ...so I  
assume removing the 'previous' dir is safe then?

On 12.12.2007, at 21:18, Konstantin Shvachko wrote:

>> 2) Is there a way of finding out whether finalize still needs to  
>> be run?
>
> Yes, you can see it on the name-node web UI, and by running
> hadoop dfsadmin -report


Re: finalize upgrade

Posted by Eric Guillemot <eg...@uvic.ca>.
unsubscribe

----- Original Message ----- 
From: "Konstantin Shvachko" <sh...@yahoo-inc.com>
To: <ha...@lucene.apache.org>
Sent: Wednesday, December 12, 2007 12:18 PM
Subject: Re: finalize upgrade


>> 2) Is there a way of finding out whether finalize still needs to be run?
> 
> Yes, you can see it on the name-node web UI, and by running
> hadoop dfsadmin -report
>


Re: finalize upgrade

Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.
> 2) Is there a way of finding out whether finalize still needs to be run?

Yes, you can see it on the name-node web UI, and by running
hadoop dfsadmin -report

Re: finalize upgrade

Posted by Colin Evans <co...@metaweb.com>.
I just found this problem today after upgrading to 0.15.  I just deleted 
the "previous" directory on all of our machines with no bad consequences 
so far.


Torsten Curdt wrote:
> Hey guys,
>
> triggered by a post on the mailing list I also checked our 0.14 
> cluster and although we really though we did the finalize after the 
> upgrade we also have a big "previous" dir there. A couple of things I 
> am wondering here...
>
> 1) I thought that the data is actually not duplicated ...so why is it 
> so big?
> 2) Is there a way of finding out whether finalize still needs to be run?
>
> cheers
> -- 
> Torsten


Re: finalize upgrade

Posted by Doug Cutting <cu...@apache.org>.
Joydeep Sen Sarma wrote:
> it consumes real space though. we were disk full on the drive hosting control/tmp data and got space back once the finalizeUpgrade finished .. 

Is that perhaps because it still holds data that's since been deleted?

Doug

RE: finalize upgrade

Posted by Joydeep Sen Sarma <js...@facebook.com>.
it consumes real space though. we were disk full on the drive hosting control/tmp data and got space back once the finalizeUpgrade finished .. 

-----Original Message-----
From: Doug Cutting [mailto:cutting@apache.org]
Sent: Wed 12/12/2007 11:14 AM
To: hadoop-user@lucene.apache.org
Subject: Re: finalize upgrade
 
Torsten Curdt wrote:
> 1) I thought that the data is actually not duplicated ...so why is it so 
> big?

I think it is a directory of hard links.

Doug


Re: finalize upgrade

Posted by Doug Cutting <cu...@apache.org>.
Torsten Curdt wrote:
> 1) I thought that the data is actually not duplicated ...so why is it so 
> big?

I think it is a directory of hard links.

Doug