You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Jan Van Besien <ja...@ngdata.com> on 2012/08/16 10:11:43 UTC

checkpointnode backupnode hdfs HA

I am a bit confused about the different options for namenode high 
availability (or something along those lines) in CDH4 (hadoop-2.0.0).

I understand that the secondary namenode is deprecated, and that there 
are two options to replace it: checkpoint or backup namenodes. Both are 
well explained in the documentation, but the confusion begins when 
reading about "HDFS High Availability", for example here: 
http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailability.html

Is the topic "HDFS High Availability" as described there (using shared 
storage) related to checkpoint/backup nodes. If so, in what way?

If I read about backup nodes, it also seems to be aimed at high 
availability. From what I understood, the current implementation doesn't 
provide (warm) fail-over yet, but this is planned. So starting to 
replace secondary namenodes now with backup namenodes sounds like a 
future proof idea?

thanks,
Jan

Re: Stopping a single Datanode

Posted by Nitin Pawar <ni...@gmail.com>.

you can just kill the process id

or there is a command inside bin directory hadoop-daemon.sh

use it with stop datanode and it should stop it

On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> Sorry - this seems pretty basic, but I could not find a reference on
> line or in my books. Is there a graceful way to stop a single datanode,
> (for example to move the system to a new rack where it will be put back
> on-line) or do you just whack the process ID and let HDFS clean up the
> mess?
>
> Thanks
>



-- 
Nitin Pawar

Re: Stopping a single Datanode

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Terry,

    You can ssh the command to the node where you want to stop the DN.
Something like this :
$ cluster@ubuntu:~/hadoop-1.0.3$ bin/hadoop-daemon.sh --config
/home/cluster/hadoop-1.0.3/conf/ stop datanode

Regards,
    Mohammad Tariq



On Fri, Aug 17, 2012 at 2:26 AM, Terry Healy <th...@bnl.gov> wrote:

> Thanks guys. I will need the decommission in a few weeks, but for now
> just a simple system move. I found out the hard way not to have a
> masters and slaves file in the conf directory of a slave: when I tried
> bin/stop-all.sh, it stopped processes everywhere.
>
> Gave me an idea to list it's own name as the only one in slaves, which
> might work as expected then....but if I can just kill the process that
> is even easier.
>
>
> On 08/16/2012 03:49 PM, Harsh J wrote:
> > Perhaps what you're looking for is the Decommission feature of HDFS,
> > which lets you safely remove a DN without incurring replica loss? It
> > is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
> > Chapter 10: Administering Hadoop / Maintenance section - Title
> > "Decommissioning old nodes", or at
> > http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?
> >
> > On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> >> Sorry - this seems pretty basic, but I could not find a reference on
> >> line or in my books. Is there a graceful way to stop a single datanode,
> >> (for example to move the system to a new rack where it will be put back
> >> on-line) or do you just whack the process ID and let HDFS clean up the
> >> mess?
> >>
> >> Thanks
> >>
> >
> >
> >
>

Re: Stopping a single Datanode

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Terry,

    You can ssh the command to the node where you want to stop the DN.
Something like this :
$ cluster@ubuntu:~/hadoop-1.0.3$ bin/hadoop-daemon.sh --config
/home/cluster/hadoop-1.0.3/conf/ stop datanode

Regards,
    Mohammad Tariq



On Fri, Aug 17, 2012 at 2:26 AM, Terry Healy <th...@bnl.gov> wrote:

> Thanks guys. I will need the decommission in a few weeks, but for now
> just a simple system move. I found out the hard way not to have a
> masters and slaves file in the conf directory of a slave: when I tried
> bin/stop-all.sh, it stopped processes everywhere.
>
> Gave me an idea to list it's own name as the only one in slaves, which
> might work as expected then....but if I can just kill the process that
> is even easier.
>
>
> On 08/16/2012 03:49 PM, Harsh J wrote:
> > Perhaps what you're looking for is the Decommission feature of HDFS,
> > which lets you safely remove a DN without incurring replica loss? It
> > is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
> > Chapter 10: Administering Hadoop / Maintenance section - Title
> > "Decommissioning old nodes", or at
> > http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?
> >
> > On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> >> Sorry - this seems pretty basic, but I could not find a reference on
> >> line or in my books. Is there a graceful way to stop a single datanode,
> >> (for example to move the system to a new rack where it will be put back
> >> on-line) or do you just whack the process ID and let HDFS clean up the
> >> mess?
> >>
> >> Thanks
> >>
> >
> >
> >
>

Re: Stopping a single Datanode

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Terry,

    You can ssh the command to the node where you want to stop the DN.
Something like this :
$ cluster@ubuntu:~/hadoop-1.0.3$ bin/hadoop-daemon.sh --config
/home/cluster/hadoop-1.0.3/conf/ stop datanode

Regards,
    Mohammad Tariq



On Fri, Aug 17, 2012 at 2:26 AM, Terry Healy <th...@bnl.gov> wrote:

> Thanks guys. I will need the decommission in a few weeks, but for now
> just a simple system move. I found out the hard way not to have a
> masters and slaves file in the conf directory of a slave: when I tried
> bin/stop-all.sh, it stopped processes everywhere.
>
> Gave me an idea to list it's own name as the only one in slaves, which
> might work as expected then....but if I can just kill the process that
> is even easier.
>
>
> On 08/16/2012 03:49 PM, Harsh J wrote:
> > Perhaps what you're looking for is the Decommission feature of HDFS,
> > which lets you safely remove a DN without incurring replica loss? It
> > is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
> > Chapter 10: Administering Hadoop / Maintenance section - Title
> > "Decommissioning old nodes", or at
> > http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?
> >
> > On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> >> Sorry - this seems pretty basic, but I could not find a reference on
> >> line or in my books. Is there a graceful way to stop a single datanode,
> >> (for example to move the system to a new rack where it will be put back
> >> on-line) or do you just whack the process ID and let HDFS clean up the
> >> mess?
> >>
> >> Thanks
> >>
> >
> >
> >
>

Re: Stopping a single Datanode

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Terry,

    You can ssh the command to the node where you want to stop the DN.
Something like this :
$ cluster@ubuntu:~/hadoop-1.0.3$ bin/hadoop-daemon.sh --config
/home/cluster/hadoop-1.0.3/conf/ stop datanode

Regards,
    Mohammad Tariq



On Fri, Aug 17, 2012 at 2:26 AM, Terry Healy <th...@bnl.gov> wrote:

> Thanks guys. I will need the decommission in a few weeks, but for now
> just a simple system move. I found out the hard way not to have a
> masters and slaves file in the conf directory of a slave: when I tried
> bin/stop-all.sh, it stopped processes everywhere.
>
> Gave me an idea to list it's own name as the only one in slaves, which
> might work as expected then....but if I can just kill the process that
> is even easier.
>
>
> On 08/16/2012 03:49 PM, Harsh J wrote:
> > Perhaps what you're looking for is the Decommission feature of HDFS,
> > which lets you safely remove a DN without incurring replica loss? It
> > is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
> > Chapter 10: Administering Hadoop / Maintenance section - Title
> > "Decommissioning old nodes", or at
> > http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?
> >
> > On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> >> Sorry - this seems pretty basic, but I could not find a reference on
> >> line or in my books. Is there a graceful way to stop a single datanode,
> >> (for example to move the system to a new rack where it will be put back
> >> on-line) or do you just whack the process ID and let HDFS clean up the
> >> mess?
> >>
> >> Thanks
> >>
> >
> >
> >
>

Re: Stopping a single Datanode

Posted by Terry Healy <th...@bnl.gov>.

Thanks guys. I will need the decommission in a few weeks, but for now
just a simple system move. I found out the hard way not to have a
masters and slaves file in the conf directory of a slave: when I tried
bin/stop-all.sh, it stopped processes everywhere.

Gave me an idea to list it's own name as the only one in slaves, which
might work as expected then....but if I can just kill the process that
is even easier.

On 08/16/2012 03:49 PM, Harsh J wrote:
> Perhaps what you're looking for is the Decommission feature of HDFS,
> which lets you safely remove a DN without incurring replica loss? It
> is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
> Chapter 10: Administering Hadoop / Maintenance section - Title
> "Decommissioning old nodes", or at
> http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?
> 
> On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
>> Sorry - this seems pretty basic, but I could not find a reference on
>> line or in my books. Is there a graceful way to stop a single datanode,
>> (for example to move the system to a new rack where it will be put back
>> on-line) or do you just whack the process ID and let HDFS clean up the
>> mess?
>>
>> Thanks
>>
> 
> 
>

Re: Stopping a single Datanode

Posted by Terry Healy <th...@bnl.gov>.

Thanks guys. I will need the decommission in a few weeks, but for now
just a simple system move. I found out the hard way not to have a
masters and slaves file in the conf directory of a slave: when I tried
bin/stop-all.sh, it stopped processes everywhere.

Gave me an idea to list it's own name as the only one in slaves, which
might work as expected then....but if I can just kill the process that
is even easier.

On 08/16/2012 03:49 PM, Harsh J wrote:
> Perhaps what you're looking for is the Decommission feature of HDFS,
> which lets you safely remove a DN without incurring replica loss? It
> is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
> Chapter 10: Administering Hadoop / Maintenance section - Title
> "Decommissioning old nodes", or at
> http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?
> 
> On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
>> Sorry - this seems pretty basic, but I could not find a reference on
>> line or in my books. Is there a graceful way to stop a single datanode,
>> (for example to move the system to a new rack where it will be put back
>> on-line) or do you just whack the process ID and let HDFS clean up the
>> mess?
>>
>> Thanks
>>
> 
> 
>

Re: Stopping a single Datanode

Posted by Terry Healy <th...@bnl.gov>.

Thanks guys. I will need the decommission in a few weeks, but for now
just a simple system move. I found out the hard way not to have a
masters and slaves file in the conf directory of a slave: when I tried
bin/stop-all.sh, it stopped processes everywhere.

Gave me an idea to list it's own name as the only one in slaves, which
might work as expected then....but if I can just kill the process that
is even easier.

On 08/16/2012 03:49 PM, Harsh J wrote:
> Perhaps what you're looking for is the Decommission feature of HDFS,
> which lets you safely remove a DN without incurring replica loss? It
> is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
> Chapter 10: Administering Hadoop / Maintenance section - Title
> "Decommissioning old nodes", or at
> http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?
> 
> On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
>> Sorry - this seems pretty basic, but I could not find a reference on
>> line or in my books. Is there a graceful way to stop a single datanode,
>> (for example to move the system to a new rack where it will be put back
>> on-line) or do you just whack the process ID and let HDFS clean up the
>> mess?
>>
>> Thanks
>>
> 
> 
>

Re: Stopping a single Datanode

Posted by Terry Healy <th...@bnl.gov>.

Thanks guys. I will need the decommission in a few weeks, but for now
just a simple system move. I found out the hard way not to have a
masters and slaves file in the conf directory of a slave: when I tried
bin/stop-all.sh, it stopped processes everywhere.

Gave me an idea to list it's own name as the only one in slaves, which
might work as expected then....but if I can just kill the process that
is even easier.

On 08/16/2012 03:49 PM, Harsh J wrote:
> Perhaps what you're looking for is the Decommission feature of HDFS,
> which lets you safely remove a DN without incurring replica loss? It
> is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
> Chapter 10: Administering Hadoop / Maintenance section - Title
> "Decommissioning old nodes", or at
> http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?
> 
> On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
>> Sorry - this seems pretty basic, but I could not find a reference on
>> line or in my books. Is there a graceful way to stop a single datanode,
>> (for example to move the system to a new rack where it will be put back
>> on-line) or do you just whack the process ID and let HDFS clean up the
>> mess?
>>
>> Thanks
>>
> 
> 
>

Re: Stopping a single Datanode

Posted by Harsh J <ha...@cloudera.com>.

Perhaps what you're looking for is the Decommission feature of HDFS,
which lets you safely remove a DN without incurring replica loss? It
is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
Chapter 10: Administering Hadoop / Maintenance section - Title
"Decommissioning old nodes", or at
http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?

On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> Sorry - this seems pretty basic, but I could not find a reference on
> line or in my books. Is there a graceful way to stop a single datanode,
> (for example to move the system to a new rack where it will be put back
> on-line) or do you just whack the process ID and let HDFS clean up the
> mess?
>
> Thanks
>

-- 
Harsh J

Re: Stopping a single Datanode

Posted by Nitin Pawar <ni...@gmail.com>.

you can just kill the process id

or there is a command inside bin directory hadoop-daemon.sh

use it with stop datanode and it should stop it

On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> Sorry - this seems pretty basic, but I could not find a reference on
> line or in my books. Is there a graceful way to stop a single datanode,
> (for example to move the system to a new rack where it will be put back
> on-line) or do you just whack the process ID and let HDFS clean up the
> mess?
>
> Thanks
>



-- 
Nitin Pawar

Re: Stopping a single Datanode

Posted by Harsh J <ha...@cloudera.com>.

Perhaps what you're looking for is the Decommission feature of HDFS,
which lets you safely remove a DN without incurring replica loss? It
is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
Chapter 10: Administering Hadoop / Maintenance section - Title
"Decommissioning old nodes", or at
http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?

On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> Sorry - this seems pretty basic, but I could not find a reference on
> line or in my books. Is there a graceful way to stop a single datanode,
> (for example to move the system to a new rack where it will be put back
> on-line) or do you just whack the process ID and let HDFS clean up the
> mess?
>
> Thanks
>

-- 
Harsh J

Re: Stopping a single Datanode

Posted by Harsh J <ha...@cloudera.com>.

Perhaps what you're looking for is the Decommission feature of HDFS,
which lets you safely remove a DN without incurring replica loss? It
is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
Chapter 10: Administering Hadoop / Maintenance section - Title
"Decommissioning old nodes", or at
http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?

On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> Sorry - this seems pretty basic, but I could not find a reference on
> line or in my books. Is there a graceful way to stop a single datanode,
> (for example to move the system to a new rack where it will be put back
> on-line) or do you just whack the process ID and let HDFS clean up the
> mess?
>
> Thanks
>

-- 
Harsh J

Re: Stopping a single Datanode

Posted by Nitin Pawar <ni...@gmail.com>.

you can just kill the process id

or there is a command inside bin directory hadoop-daemon.sh

use it with stop datanode and it should stop it

On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> Sorry - this seems pretty basic, but I could not find a reference on
> line or in my books. Is there a graceful way to stop a single datanode,
> (for example to move the system to a new rack where it will be put back
> on-line) or do you just whack the process ID and let HDFS clean up the
> mess?
>
> Thanks
>



-- 
Nitin Pawar

Re: Stopping a single Datanode

Posted by Nitin Pawar <ni...@gmail.com>.

you can just kill the process id

or there is a command inside bin directory hadoop-daemon.sh

use it with stop datanode and it should stop it

On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> Sorry - this seems pretty basic, but I could not find a reference on
> line or in my books. Is there a graceful way to stop a single datanode,
> (for example to move the system to a new rack where it will be put back
> on-line) or do you just whack the process ID and let HDFS clean up the
> mess?
>
> Thanks
>



-- 
Nitin Pawar

Re: Stopping a single Datanode

Posted by Harsh J <ha...@cloudera.com>.

Perhaps what you're looking for is the Decommission feature of HDFS,
which lets you safely remove a DN without incurring replica loss? It
is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
Chapter 10: Administering Hadoop / Maintenance section - Title
"Decommissioning old nodes", or at
http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?

On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> Sorry - this seems pretty basic, but I could not find a reference on
> line or in my books. Is there a graceful way to stop a single datanode,
> (for example to move the system to a new rack where it will be put back
> on-line) or do you just whack the process ID and let HDFS clean up the
> mess?
>
> Thanks
>

-- 
Harsh J

Stopping a single Datanode

Posted by Terry Healy <th...@bnl.gov>.

Sorry - this seems pretty basic, but I could not find a reference on
line or in my books. Is there a graceful way to stop a single datanode,
(for example to move the system to a new rack where it will be put back
on-line) or do you just whack the process ID and let HDFS clean up the
mess?

Thanks

RE: checkpointnode backupnode hdfs HA

Posted by Uma Maheswara Rao G <ma...@huawei.com>.

Community already gone ahead with the explained HA Solution below. HDFS-1623

 Yes, there were alternative solution proposed before (using backupNode approach) as well. like HDFS-2124,HDFS-2064. Not much work done there.

>to come back to one of my previous questions: is replacing (now
>deprecated) secondary namenodes with backup namenodes a future proof
>idea, or should I maybe go for the new HA architecture right away?
I should say yes.

Regards,
Uma
________________________________________
From: Jan Van Besien [janvb@ngdata.com]
Sent: Thursday, August 16, 2012 3:21 PM
To: user@hadoop.apache.org
Subject: Re: checkpointnode backupnode hdfs HA

On 08/16/2012 10:37 AM, Uma Maheswara Rao G wrote:
> Don't confuse with the backupnode/checkpoint nodes here.
>
> The new HA architecture mainly targetted to build HA with Namenode states.

Thanks, your explanation is already helpful. If you say "the new HA
architecture", does this mean that the (older) ideas to extend the
backupnode functionality to provide (warm) standby as explained here

https://issues.apache.org/jira/browse/HADOOP-4539?focusedCommentId=12674954&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12674954

are no longer valid?

Or to come back to one of my previous questions: is replacing (now
deprecated) secondary namenodes with backup namenodes a future proof
idea, or should I maybe go for the new HA architecture right away?

thanks
Jan

RE: checkpointnode backupnode hdfs HA

Posted by Uma Maheswara Rao G <ma...@huawei.com>.

Community already gone ahead with the explained HA Solution below. HDFS-1623

 Yes, there were alternative solution proposed before (using backupNode approach) as well. like HDFS-2124,HDFS-2064. Not much work done there.

>to come back to one of my previous questions: is replacing (now
>deprecated) secondary namenodes with backup namenodes a future proof
>idea, or should I maybe go for the new HA architecture right away?
I should say yes.

Regards,
Uma
________________________________________
From: Jan Van Besien [janvb@ngdata.com]
Sent: Thursday, August 16, 2012 3:21 PM
To: user@hadoop.apache.org
Subject: Re: checkpointnode backupnode hdfs HA

On 08/16/2012 10:37 AM, Uma Maheswara Rao G wrote:
> Don't confuse with the backupnode/checkpoint nodes here.
>
> The new HA architecture mainly targetted to build HA with Namenode states.

Thanks, your explanation is already helpful. If you say "the new HA
architecture", does this mean that the (older) ideas to extend the
backupnode functionality to provide (warm) standby as explained here

https://issues.apache.org/jira/browse/HADOOP-4539?focusedCommentId=12674954&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12674954

are no longer valid?

Or to come back to one of my previous questions: is replacing (now
deprecated) secondary namenodes with backup namenodes a future proof
idea, or should I maybe go for the new HA architecture right away?

thanks
Jan

RE: checkpointnode backupnode hdfs HA

Posted by Uma Maheswara Rao G <ma...@huawei.com>.

Community already gone ahead with the explained HA Solution below. HDFS-1623

 Yes, there were alternative solution proposed before (using backupNode approach) as well. like HDFS-2124,HDFS-2064. Not much work done there.

>to come back to one of my previous questions: is replacing (now
>deprecated) secondary namenodes with backup namenodes a future proof
>idea, or should I maybe go for the new HA architecture right away?
I should say yes.

Regards,
Uma
________________________________________
From: Jan Van Besien [janvb@ngdata.com]
Sent: Thursday, August 16, 2012 3:21 PM
To: user@hadoop.apache.org
Subject: Re: checkpointnode backupnode hdfs HA

On 08/16/2012 10:37 AM, Uma Maheswara Rao G wrote:
> Don't confuse with the backupnode/checkpoint nodes here.
>
> The new HA architecture mainly targetted to build HA with Namenode states.

Thanks, your explanation is already helpful. If you say "the new HA
architecture", does this mean that the (older) ideas to extend the
backupnode functionality to provide (warm) standby as explained here

https://issues.apache.org/jira/browse/HADOOP-4539?focusedCommentId=12674954&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12674954

are no longer valid?

Or to come back to one of my previous questions: is replacing (now
deprecated) secondary namenodes with backup namenodes a future proof
idea, or should I maybe go for the new HA architecture right away?

thanks
Jan

RE: checkpointnode backupnode hdfs HA

Posted by Uma Maheswara Rao G <ma...@huawei.com>.

Community already gone ahead with the explained HA Solution below. HDFS-1623

 Yes, there were alternative solution proposed before (using backupNode approach) as well. like HDFS-2124,HDFS-2064. Not much work done there.

>to come back to one of my previous questions: is replacing (now
>deprecated) secondary namenodes with backup namenodes a future proof
>idea, or should I maybe go for the new HA architecture right away?
I should say yes.

Regards,
Uma
________________________________________
From: Jan Van Besien [janvb@ngdata.com]
Sent: Thursday, August 16, 2012 3:21 PM
To: user@hadoop.apache.org
Subject: Re: checkpointnode backupnode hdfs HA

On 08/16/2012 10:37 AM, Uma Maheswara Rao G wrote:
> Don't confuse with the backupnode/checkpoint nodes here.
>
> The new HA architecture mainly targetted to build HA with Namenode states.

Thanks, your explanation is already helpful. If you say "the new HA
architecture", does this mean that the (older) ideas to extend the
backupnode functionality to provide (warm) standby as explained here

https://issues.apache.org/jira/browse/HADOOP-4539?focusedCommentId=12674954&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12674954

are no longer valid?

Or to come back to one of my previous questions: is replacing (now
deprecated) secondary namenodes with backup namenodes a future proof
idea, or should I maybe go for the new HA architecture right away?

thanks
Jan

Re: checkpointnode backupnode hdfs HA

Posted by Jan Van Besien <ja...@ngdata.com>.

On 08/16/2012 10:37 AM, Uma Maheswara Rao G wrote:
> Don't confuse with the backupnode/checkpoint nodes here.
>
> The new HA architecture mainly targetted to build HA with Namenode states.

Thanks, your explanation is already helpful. If you say "the new HA 
architecture", does this mean that the (older) ideas to extend the 
backupnode functionality to provide (warm) standby as explained here

https://issues.apache.org/jira/browse/HADOOP-4539?focusedCommentId=12674954&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12674954

are no longer valid?

Or to come back to one of my previous questions: is replacing (now 
deprecated) secondary namenodes with backup namenodes a future proof 
idea, or should I maybe go for the new HA architecture right away?

thanks
Jan

Re: checkpointnode backupnode hdfs HA

Posted by Jan Van Besien <ja...@ngdata.com>.

On 08/16/2012 10:37 AM, Uma Maheswara Rao G wrote:
> Don't confuse with the backupnode/checkpoint nodes here.
>
> The new HA architecture mainly targetted to build HA with Namenode states.

Thanks, your explanation is already helpful. If you say "the new HA 
architecture", does this mean that the (older) ideas to extend the 
backupnode functionality to provide (warm) standby as explained here

https://issues.apache.org/jira/browse/HADOOP-4539?focusedCommentId=12674954&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12674954

are no longer valid?

Or to come back to one of my previous questions: is replacing (now 
deprecated) secondary namenodes with backup namenodes a future proof 
idea, or should I maybe go for the new HA architecture right away?

thanks
Jan

Stopping a single Datanode

Posted by Terry Healy <th...@bnl.gov>.

Sorry - this seems pretty basic, but I could not find a reference on
line or in my books. Is there a graceful way to stop a single datanode,
(for example to move the system to a new rack where it will be put back
on-line) or do you just whack the process ID and let HDFS clean up the
mess?

Thanks

Re: checkpointnode backupnode hdfs HA

Posted by Jan Van Besien <ja...@ngdata.com>.

On 08/16/2012 10:37 AM, Uma Maheswara Rao G wrote:
> Don't confuse with the backupnode/checkpoint nodes here.
>
> The new HA architecture mainly targetted to build HA with Namenode states.

Thanks, your explanation is already helpful. If you say "the new HA 
architecture", does this mean that the (older) ideas to extend the 
backupnode functionality to provide (warm) standby as explained here

https://issues.apache.org/jira/browse/HADOOP-4539?focusedCommentId=12674954&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12674954

are no longer valid?

Or to come back to one of my previous questions: is replacing (now 
deprecated) secondary namenodes with backup namenodes a future proof 
idea, or should I maybe go for the new HA architecture right away?

thanks
Jan

Stopping a single Datanode

Posted by Terry Healy <th...@bnl.gov>.

Sorry - this seems pretty basic, but I could not find a reference on
line or in my books. Is there a graceful way to stop a single datanode,
(for example to move the system to a new rack where it will be put back
on-line) or do you just whack the process ID and let HDFS clean up the
mess?

Thanks

Stopping a single Datanode

Posted by Terry Healy <th...@bnl.gov>.

Sorry - this seems pretty basic, but I could not find a reference on
line or in my books. Is there a graceful way to stop a single datanode,
(for example to move the system to a new rack where it will be put back
on-line) or do you just whack the process ID and let HDFS clean up the
mess?

Thanks

Re: checkpointnode backupnode hdfs HA

Posted by Jan Van Besien <ja...@ngdata.com>.

On 08/16/2012 10:37 AM, Uma Maheswara Rao G wrote:
> Don't confuse with the backupnode/checkpoint nodes here.
>
> The new HA architecture mainly targetted to build HA with Namenode states.

Thanks, your explanation is already helpful. If you say "the new HA 
architecture", does this mean that the (older) ideas to extend the 
backupnode functionality to provide (warm) standby as explained here

https://issues.apache.org/jira/browse/HADOOP-4539?focusedCommentId=12674954&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12674954

are no longer valid?

Or to come back to one of my previous questions: is replacing (now 
deprecated) secondary namenodes with backup namenodes a future proof 
idea, or should I maybe go for the new HA architecture right away?

thanks
Jan

RE: checkpointnode backupnode hdfs HA

Posted by Uma Maheswara Rao G <ma...@huawei.com>.

Hi Jan,

Don't confuse with the backupnode/checkpoint nodes here.

The new HA architecture mainly targetted to build HA with Namenode states.
1) Active Namenode
2) Standby Namenode

When you start NN, they both will start in standby mode bydefault.

then you can switch one NN to active state by giving ha admin commands or by configuring ZKFC( auto failover) process(not release officially yet).
So, the NN state will start required services accordingly.

This is almost like a new implementation for StandbyNode checkpointing process.

Active NN will write edits to local dirs and shared NN dirs. Standby node will keep tail the edits from Shared NN dirs.

Coming to this Shared storage part:
  Currently there are 3 options. 
   
   1) NFS filers ( mey need to buy external devices)
   
   2) BookKeeper ( Its a subproject of open source ZooKeeper). This is mainly inspired by NN. This is high performance write ahead logging system. and also it can scale to more nodes depending on usage dynamically.
       Now the integration with BookKeeper already available and we are running the some clusters with that. HDFS-3399
       
   3) Other option is Quorum based approach, this is under development. This is mainly aimed to develop shared storage nodes inside HDFS itself
      and can make use of proven RPC protocols for unified security mechanisms and use the proven edits storage layers. HDFS-3077.


I hope, this will give more idea on current HA in community.

Regards,
Uma

________________________________________
From: Jan Van Besien [janvb@ngdata.com]
Sent: Thursday, August 16, 2012 1:41 PM
To: user@hadoop.apache.org
Subject: checkpointnode backupnode hdfs HA

I am a bit confused about the different options for namenode high
availability (or something along those lines) in CDH4 (hadoop-2.0.0).

I understand that the secondary namenode is deprecated, and that there
are two options to replace it: checkpoint or backup namenodes. Both are
well explained in the documentation, but the confusion begins when
reading about "HDFS High Availability", for example here:
http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailability.html

Is the topic "HDFS High Availability" as described there (using shared
storage) related to checkpoint/backup nodes. If so, in what way?

If I read about backup nodes, it also seems to be aimed at high
availability. From what I understood, the current implementation doesn't
provide (warm) fail-over yet, but this is planned. So starting to
replace secondary namenodes now with backup namenodes sounds like a
future proof idea?

thanks,
Jan

RE: checkpointnode backupnode hdfs HA

Posted by Uma Maheswara Rao G <ma...@huawei.com>.

Hi Jan,

Don't confuse with the backupnode/checkpoint nodes here.

The new HA architecture mainly targetted to build HA with Namenode states.
1) Active Namenode
2) Standby Namenode

When you start NN, they both will start in standby mode bydefault.

then you can switch one NN to active state by giving ha admin commands or by configuring ZKFC( auto failover) process(not release officially yet).
So, the NN state will start required services accordingly.

This is almost like a new implementation for StandbyNode checkpointing process.

Active NN will write edits to local dirs and shared NN dirs. Standby node will keep tail the edits from Shared NN dirs.

Coming to this Shared storage part:
  Currently there are 3 options. 
   
   1) NFS filers ( mey need to buy external devices)
   
   2) BookKeeper ( Its a subproject of open source ZooKeeper). This is mainly inspired by NN. This is high performance write ahead logging system. and also it can scale to more nodes depending on usage dynamically.
       Now the integration with BookKeeper already available and we are running the some clusters with that. HDFS-3399
       
   3) Other option is Quorum based approach, this is under development. This is mainly aimed to develop shared storage nodes inside HDFS itself
      and can make use of proven RPC protocols for unified security mechanisms and use the proven edits storage layers. HDFS-3077.


I hope, this will give more idea on current HA in community.

Regards,
Uma

________________________________________
From: Jan Van Besien [janvb@ngdata.com]
Sent: Thursday, August 16, 2012 1:41 PM
To: user@hadoop.apache.org
Subject: checkpointnode backupnode hdfs HA

I am a bit confused about the different options for namenode high
availability (or something along those lines) in CDH4 (hadoop-2.0.0).

I understand that the secondary namenode is deprecated, and that there
are two options to replace it: checkpoint or backup namenodes. Both are
well explained in the documentation, but the confusion begins when
reading about "HDFS High Availability", for example here:
http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailability.html

Is the topic "HDFS High Availability" as described there (using shared
storage) related to checkpoint/backup nodes. If so, in what way?

If I read about backup nodes, it also seems to be aimed at high
availability. From what I understood, the current implementation doesn't
provide (warm) fail-over yet, but this is planned. So starting to
replace secondary namenodes now with backup namenodes sounds like a
future proof idea?

thanks,
Jan

RE: checkpointnode backupnode hdfs HA

Posted by Uma Maheswara Rao G <ma...@huawei.com>.

Hi Jan,

Don't confuse with the backupnode/checkpoint nodes here.

The new HA architecture mainly targetted to build HA with Namenode states.
1) Active Namenode
2) Standby Namenode

When you start NN, they both will start in standby mode bydefault.

then you can switch one NN to active state by giving ha admin commands or by configuring ZKFC( auto failover) process(not release officially yet).
So, the NN state will start required services accordingly.

This is almost like a new implementation for StandbyNode checkpointing process.

Active NN will write edits to local dirs and shared NN dirs. Standby node will keep tail the edits from Shared NN dirs.

Coming to this Shared storage part:
  Currently there are 3 options. 
   
   1) NFS filers ( mey need to buy external devices)
   
   2) BookKeeper ( Its a subproject of open source ZooKeeper). This is mainly inspired by NN. This is high performance write ahead logging system. and also it can scale to more nodes depending on usage dynamically.
       Now the integration with BookKeeper already available and we are running the some clusters with that. HDFS-3399
       
   3) Other option is Quorum based approach, this is under development. This is mainly aimed to develop shared storage nodes inside HDFS itself
      and can make use of proven RPC protocols for unified security mechanisms and use the proven edits storage layers. HDFS-3077.


I hope, this will give more idea on current HA in community.

Regards,
Uma

________________________________________
From: Jan Van Besien [janvb@ngdata.com]
Sent: Thursday, August 16, 2012 1:41 PM
To: user@hadoop.apache.org
Subject: checkpointnode backupnode hdfs HA

I am a bit confused about the different options for namenode high
availability (or something along those lines) in CDH4 (hadoop-2.0.0).

I understand that the secondary namenode is deprecated, and that there
are two options to replace it: checkpoint or backup namenodes. Both are
well explained in the documentation, but the confusion begins when
reading about "HDFS High Availability", for example here:
http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailability.html

Is the topic "HDFS High Availability" as described there (using shared
storage) related to checkpoint/backup nodes. If so, in what way?

If I read about backup nodes, it also seems to be aimed at high
availability. From what I understood, the current implementation doesn't
provide (warm) fail-over yet, but this is planned. So starting to
replace secondary namenodes now with backup namenodes sounds like a
future proof idea?

thanks,
Jan

RE: checkpointnode backupnode hdfs HA

Posted by Uma Maheswara Rao G <ma...@huawei.com>.

Hi Jan,

Don't confuse with the backupnode/checkpoint nodes here.

The new HA architecture mainly targetted to build HA with Namenode states.
1) Active Namenode
2) Standby Namenode

When you start NN, they both will start in standby mode bydefault.

then you can switch one NN to active state by giving ha admin commands or by configuring ZKFC( auto failover) process(not release officially yet).
So, the NN state will start required services accordingly.

This is almost like a new implementation for StandbyNode checkpointing process.

Active NN will write edits to local dirs and shared NN dirs. Standby node will keep tail the edits from Shared NN dirs.

Coming to this Shared storage part:
  Currently there are 3 options. 
   
   1) NFS filers ( mey need to buy external devices)
   
   2) BookKeeper ( Its a subproject of open source ZooKeeper). This is mainly inspired by NN. This is high performance write ahead logging system. and also it can scale to more nodes depending on usage dynamically.
       Now the integration with BookKeeper already available and we are running the some clusters with that. HDFS-3399
       
   3) Other option is Quorum based approach, this is under development. This is mainly aimed to develop shared storage nodes inside HDFS itself
      and can make use of proven RPC protocols for unified security mechanisms and use the proven edits storage layers. HDFS-3077.


I hope, this will give more idea on current HA in community.

Regards,
Uma

________________________________________
From: Jan Van Besien [janvb@ngdata.com]
Sent: Thursday, August 16, 2012 1:41 PM
To: user@hadoop.apache.org
Subject: checkpointnode backupnode hdfs HA

I am a bit confused about the different options for namenode high
availability (or something along those lines) in CDH4 (hadoop-2.0.0).

I understand that the secondary namenode is deprecated, and that there
are two options to replace it: checkpoint or backup namenodes. Both are
well explained in the documentation, but the confusion begins when
reading about "HDFS High Availability", for example here:
http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailability.html

Is the topic "HDFS High Availability" as described there (using shared
storage) related to checkpoint/backup nodes. If so, in what way?

If I read about backup nodes, it also seems to be aimed at high
availability. From what I understood, the current implementation doesn't
provide (warm) fail-over yet, but this is planned. So starting to
replace secondary namenodes now with backup namenodes sounds like a
future proof idea?

thanks,
Jan