You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Jan Van Besien <ja...@ngdata.com> on 2012/08/16 10:11:43 UTC
checkpointnode backupnode hdfs HA
I am a bit confused about the different options for namenode high
availability (or something along those lines) in CDH4 (hadoop-2.0.0).
I understand that the secondary namenode is deprecated, and that there
are two options to replace it: checkpoint or backup namenodes. Both are
well explained in the documentation, but the confusion begins when
reading about "HDFS High Availability", for example here:
http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailability.html
Is the topic "HDFS High Availability" as described there (using shared
storage) related to checkpoint/backup nodes. If so, in what way?
If I read about backup nodes, it also seems to be aimed at high
availability. From what I understood, the current implementation doesn't
provide (warm) fail-over yet, but this is planned. So starting to
replace secondary namenodes now with backup namenodes sounds like a
future proof idea?
thanks,
Jan
Re: Stopping a single Datanode
Posted by Nitin Pawar <ni...@gmail.com>.
you can just kill the process id
or there is a command inside bin directory hadoop-daemon.sh
use it with stop datanode and it should stop it
On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> Sorry - this seems pretty basic, but I could not find a reference on
> line or in my books. Is there a graceful way to stop a single datanode,
> (for example to move the system to a new rack where it will be put back
> on-line) or do you just whack the process ID and let HDFS clean up the
> mess?
>
> Thanks
>
--
Nitin Pawar
Re: Stopping a single Datanode
Posted by Mohammad Tariq <do...@gmail.com>.
Hello Terry,
You can ssh the command to the node where you want to stop the DN.
Something like this :
$ cluster@ubuntu:~/hadoop-1.0.3$ bin/hadoop-daemon.sh --config
/home/cluster/hadoop-1.0.3/conf/ stop datanode
Regards,
Mohammad Tariq
On Fri, Aug 17, 2012 at 2:26 AM, Terry Healy <th...@bnl.gov> wrote:
> Thanks guys. I will need the decommission in a few weeks, but for now
> just a simple system move. I found out the hard way not to have a
> masters and slaves file in the conf directory of a slave: when I tried
> bin/stop-all.sh, it stopped processes everywhere.
>
> Gave me an idea to list it's own name as the only one in slaves, which
> might work as expected then....but if I can just kill the process that
> is even easier.
>
>
> On 08/16/2012 03:49 PM, Harsh J wrote:
> > Perhaps what you're looking for is the Decommission feature of HDFS,
> > which lets you safely remove a DN without incurring replica loss? It
> > is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
> > Chapter 10: Administering Hadoop / Maintenance section - Title
> > "Decommissioning old nodes", or at
> > http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?
> >
> > On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> >> Sorry - this seems pretty basic, but I could not find a reference on
> >> line or in my books. Is there a graceful way to stop a single datanode,
> >> (for example to move the system to a new rack where it will be put back
> >> on-line) or do you just whack the process ID and let HDFS clean up the
> >> mess?
> >>
> >> Thanks
> >>
> >
> >
> >
>
Re: Stopping a single Datanode
Posted by Mohammad Tariq <do...@gmail.com>.
Hello Terry,
You can ssh the command to the node where you want to stop the DN.
Something like this :
$ cluster@ubuntu:~/hadoop-1.0.3$ bin/hadoop-daemon.sh --config
/home/cluster/hadoop-1.0.3/conf/ stop datanode
Regards,
Mohammad Tariq
On Fri, Aug 17, 2012 at 2:26 AM, Terry Healy <th...@bnl.gov> wrote:
> Thanks guys. I will need the decommission in a few weeks, but for now
> just a simple system move. I found out the hard way not to have a
> masters and slaves file in the conf directory of a slave: when I tried
> bin/stop-all.sh, it stopped processes everywhere.
>
> Gave me an idea to list it's own name as the only one in slaves, which
> might work as expected then....but if I can just kill the process that
> is even easier.
>
>
> On 08/16/2012 03:49 PM, Harsh J wrote:
> > Perhaps what you're looking for is the Decommission feature of HDFS,
> > which lets you safely remove a DN without incurring replica loss? It
> > is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
> > Chapter 10: Administering Hadoop / Maintenance section - Title
> > "Decommissioning old nodes", or at
> > http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?
> >
> > On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> >> Sorry - this seems pretty basic, but I could not find a reference on
> >> line or in my books. Is there a graceful way to stop a single datanode,
> >> (for example to move the system to a new rack where it will be put back
> >> on-line) or do you just whack the process ID and let HDFS clean up the
> >> mess?
> >>
> >> Thanks
> >>
> >
> >
> >
>
Re: Stopping a single Datanode
Posted by Mohammad Tariq <do...@gmail.com>.
Hello Terry,
You can ssh the command to the node where you want to stop the DN.
Something like this :
$ cluster@ubuntu:~/hadoop-1.0.3$ bin/hadoop-daemon.sh --config
/home/cluster/hadoop-1.0.3/conf/ stop datanode
Regards,
Mohammad Tariq
On Fri, Aug 17, 2012 at 2:26 AM, Terry Healy <th...@bnl.gov> wrote:
> Thanks guys. I will need the decommission in a few weeks, but for now
> just a simple system move. I found out the hard way not to have a
> masters and slaves file in the conf directory of a slave: when I tried
> bin/stop-all.sh, it stopped processes everywhere.
>
> Gave me an idea to list it's own name as the only one in slaves, which
> might work as expected then....but if I can just kill the process that
> is even easier.
>
>
> On 08/16/2012 03:49 PM, Harsh J wrote:
> > Perhaps what you're looking for is the Decommission feature of HDFS,
> > which lets you safely remove a DN without incurring replica loss? It
> > is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
> > Chapter 10: Administering Hadoop / Maintenance section - Title
> > "Decommissioning old nodes", or at
> > http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?
> >
> > On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> >> Sorry - this seems pretty basic, but I could not find a reference on
> >> line or in my books. Is there a graceful way to stop a single datanode,
> >> (for example to move the system to a new rack where it will be put back
> >> on-line) or do you just whack the process ID and let HDFS clean up the
> >> mess?
> >>
> >> Thanks
> >>
> >
> >
> >
>
Re: Stopping a single Datanode
Posted by Mohammad Tariq <do...@gmail.com>.
Hello Terry,
You can ssh the command to the node where you want to stop the DN.
Something like this :
$ cluster@ubuntu:~/hadoop-1.0.3$ bin/hadoop-daemon.sh --config
/home/cluster/hadoop-1.0.3/conf/ stop datanode
Regards,
Mohammad Tariq
On Fri, Aug 17, 2012 at 2:26 AM, Terry Healy <th...@bnl.gov> wrote:
> Thanks guys. I will need the decommission in a few weeks, but for now
> just a simple system move. I found out the hard way not to have a
> masters and slaves file in the conf directory of a slave: when I tried
> bin/stop-all.sh, it stopped processes everywhere.
>
> Gave me an idea to list it's own name as the only one in slaves, which
> might work as expected then....but if I can just kill the process that
> is even easier.
>
>
> On 08/16/2012 03:49 PM, Harsh J wrote:
> > Perhaps what you're looking for is the Decommission feature of HDFS,
> > which lets you safely remove a DN without incurring replica loss? It
> > is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
> > Chapter 10: Administering Hadoop / Maintenance section - Title
> > "Decommissioning old nodes", or at
> > http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?
> >
> > On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> >> Sorry - this seems pretty basic, but I could not find a reference on
> >> line or in my books. Is there a graceful way to stop a single datanode,
> >> (for example to move the system to a new rack where it will be put back
> >> on-line) or do you just whack the process ID and let HDFS clean up the
> >> mess?
> >>
> >> Thanks
> >>
> >
> >
> >
>
Re: Stopping a single Datanode
Posted by Terry Healy <th...@bnl.gov>.
Thanks guys. I will need the decommission in a few weeks, but for now
just a simple system move. I found out the hard way not to have a
masters and slaves file in the conf directory of a slave: when I tried
bin/stop-all.sh, it stopped processes everywhere.
Gave me an idea to list it's own name as the only one in slaves, which
might work as expected then....but if I can just kill the process that
is even easier.
On 08/16/2012 03:49 PM, Harsh J wrote:
> Perhaps what you're looking for is the Decommission feature of HDFS,
> which lets you safely remove a DN without incurring replica loss? It
> is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
> Chapter 10: Administering Hadoop / Maintenance section - Title
> "Decommissioning old nodes", or at
> http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?
>
> On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
>> Sorry - this seems pretty basic, but I could not find a reference on
>> line or in my books. Is there a graceful way to stop a single datanode,
>> (for example to move the system to a new rack where it will be put back
>> on-line) or do you just whack the process ID and let HDFS clean up the
>> mess?
>>
>> Thanks
>>
>
>
>
Re: Stopping a single Datanode
Posted by Terry Healy <th...@bnl.gov>.
Thanks guys. I will need the decommission in a few weeks, but for now
just a simple system move. I found out the hard way not to have a
masters and slaves file in the conf directory of a slave: when I tried
bin/stop-all.sh, it stopped processes everywhere.
Gave me an idea to list it's own name as the only one in slaves, which
might work as expected then....but if I can just kill the process that
is even easier.
On 08/16/2012 03:49 PM, Harsh J wrote:
> Perhaps what you're looking for is the Decommission feature of HDFS,
> which lets you safely remove a DN without incurring replica loss? It
> is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
> Chapter 10: Administering Hadoop / Maintenance section - Title
> "Decommissioning old nodes", or at
> http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?
>
> On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
>> Sorry - this seems pretty basic, but I could not find a reference on
>> line or in my books. Is there a graceful way to stop a single datanode,
>> (for example to move the system to a new rack where it will be put back
>> on-line) or do you just whack the process ID and let HDFS clean up the
>> mess?
>>
>> Thanks
>>
>
>
>
Re: Stopping a single Datanode
Posted by Terry Healy <th...@bnl.gov>.
Thanks guys. I will need the decommission in a few weeks, but for now
just a simple system move. I found out the hard way not to have a
masters and slaves file in the conf directory of a slave: when I tried
bin/stop-all.sh, it stopped processes everywhere.
Gave me an idea to list it's own name as the only one in slaves, which
might work as expected then....but if I can just kill the process that
is even easier.
On 08/16/2012 03:49 PM, Harsh J wrote:
> Perhaps what you're looking for is the Decommission feature of HDFS,
> which lets you safely remove a DN without incurring replica loss? It
> is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
> Chapter 10: Administering Hadoop / Maintenance section - Title
> "Decommissioning old nodes", or at
> http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?
>
> On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
>> Sorry - this seems pretty basic, but I could not find a reference on
>> line or in my books. Is there a graceful way to stop a single datanode,
>> (for example to move the system to a new rack where it will be put back
>> on-line) or do you just whack the process ID and let HDFS clean up the
>> mess?
>>
>> Thanks
>>
>
>
>
Re: Stopping a single Datanode
Posted by Terry Healy <th...@bnl.gov>.
Thanks guys. I will need the decommission in a few weeks, but for now
just a simple system move. I found out the hard way not to have a
masters and slaves file in the conf directory of a slave: when I tried
bin/stop-all.sh, it stopped processes everywhere.
Gave me an idea to list it's own name as the only one in slaves, which
might work as expected then....but if I can just kill the process that
is even easier.
On 08/16/2012 03:49 PM, Harsh J wrote:
> Perhaps what you're looking for is the Decommission feature of HDFS,
> which lets you safely remove a DN without incurring replica loss? It
> is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
> Chapter 10: Administering Hadoop / Maintenance section - Title
> "Decommissioning old nodes", or at
> http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?
>
> On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
>> Sorry - this seems pretty basic, but I could not find a reference on
>> line or in my books. Is there a graceful way to stop a single datanode,
>> (for example to move the system to a new rack where it will be put back
>> on-line) or do you just whack the process ID and let HDFS clean up the
>> mess?
>>
>> Thanks
>>
>
>
>
Re: Stopping a single Datanode
Posted by Harsh J <ha...@cloudera.com>.
Perhaps what you're looking for is the Decommission feature of HDFS,
which lets you safely remove a DN without incurring replica loss? It
is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
Chapter 10: Administering Hadoop / Maintenance section - Title
"Decommissioning old nodes", or at
http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?
On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> Sorry - this seems pretty basic, but I could not find a reference on
> line or in my books. Is there a graceful way to stop a single datanode,
> (for example to move the system to a new rack where it will be put back
> on-line) or do you just whack the process ID and let HDFS clean up the
> mess?
>
> Thanks
>
--
Harsh J
Re: Stopping a single Datanode
Posted by Nitin Pawar <ni...@gmail.com>.
you can just kill the process id
or there is a command inside bin directory hadoop-daemon.sh
use it with stop datanode and it should stop it
On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> Sorry - this seems pretty basic, but I could not find a reference on
> line or in my books. Is there a graceful way to stop a single datanode,
> (for example to move the system to a new rack where it will be put back
> on-line) or do you just whack the process ID and let HDFS clean up the
> mess?
>
> Thanks
>
--
Nitin Pawar
Re: Stopping a single Datanode
Posted by Harsh J <ha...@cloudera.com>.
Perhaps what you're looking for is the Decommission feature of HDFS,
which lets you safely remove a DN without incurring replica loss? It
is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
Chapter 10: Administering Hadoop / Maintenance section - Title
"Decommissioning old nodes", or at
http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?
On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> Sorry - this seems pretty basic, but I could not find a reference on
> line or in my books. Is there a graceful way to stop a single datanode,
> (for example to move the system to a new rack where it will be put back
> on-line) or do you just whack the process ID and let HDFS clean up the
> mess?
>
> Thanks
>
--
Harsh J
Re: Stopping a single Datanode
Posted by Harsh J <ha...@cloudera.com>.
Perhaps what you're looking for is the Decommission feature of HDFS,
which lets you safely remove a DN without incurring replica loss? It
is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
Chapter 10: Administering Hadoop / Maintenance section - Title
"Decommissioning old nodes", or at
http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?
On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> Sorry - this seems pretty basic, but I could not find a reference on
> line or in my books. Is there a graceful way to stop a single datanode,
> (for example to move the system to a new rack where it will be put back
> on-line) or do you just whack the process ID and let HDFS clean up the
> mess?
>
> Thanks
>
--
Harsh J
Re: Stopping a single Datanode
Posted by Nitin Pawar <ni...@gmail.com>.
you can just kill the process id
or there is a command inside bin directory hadoop-daemon.sh
use it with stop datanode and it should stop it
On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> Sorry - this seems pretty basic, but I could not find a reference on
> line or in my books. Is there a graceful way to stop a single datanode,
> (for example to move the system to a new rack where it will be put back
> on-line) or do you just whack the process ID and let HDFS clean up the
> mess?
>
> Thanks
>
--
Nitin Pawar
Re: Stopping a single Datanode
Posted by Nitin Pawar <ni...@gmail.com>.
you can just kill the process id
or there is a command inside bin directory hadoop-daemon.sh
use it with stop datanode and it should stop it
On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> Sorry - this seems pretty basic, but I could not find a reference on
> line or in my books. Is there a graceful way to stop a single datanode,
> (for example to move the system to a new rack where it will be put back
> on-line) or do you just whack the process ID and let HDFS clean up the
> mess?
>
> Thanks
>
--
Nitin Pawar
Re: Stopping a single Datanode
Posted by Harsh J <ha...@cloudera.com>.
Perhaps what you're looking for is the Decommission feature of HDFS,
which lets you safely remove a DN without incurring replica loss? It
is detailed in Hadoop: The Definitive Guide (2nd Edition), page 315 |
Chapter 10: Administering Hadoop / Maintenance section - Title
"Decommissioning old nodes", or at
http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission?
On Fri, Aug 17, 2012 at 12:41 AM, Terry Healy <th...@bnl.gov> wrote:
> Sorry - this seems pretty basic, but I could not find a reference on
> line or in my books. Is there a graceful way to stop a single datanode,
> (for example to move the system to a new rack where it will be put back
> on-line) or do you just whack the process ID and let HDFS clean up the
> mess?
>
> Thanks
>
--
Harsh J
Stopping a single Datanode
Posted by Terry Healy <th...@bnl.gov>.
Sorry - this seems pretty basic, but I could not find a reference on
line or in my books. Is there a graceful way to stop a single datanode,
(for example to move the system to a new rack where it will be put back
on-line) or do you just whack the process ID and let HDFS clean up the
mess?
Thanks
RE: checkpointnode backupnode hdfs HA
Posted by Uma Maheswara Rao G <ma...@huawei.com>.
Community already gone ahead with the explained HA Solution below. HDFS-1623
Yes, there were alternative solution proposed before (using backupNode approach) as well. like HDFS-2124,HDFS-2064. Not much work done there.
>to come back to one of my previous questions: is replacing (now
>deprecated) secondary namenodes with backup namenodes a future proof
>idea, or should I maybe go for the new HA architecture right away?
I should say yes.
Regards,
Uma
________________________________________
From: Jan Van Besien [janvb@ngdata.com]
Sent: Thursday, August 16, 2012 3:21 PM
To: user@hadoop.apache.org
Subject: Re: checkpointnode backupnode hdfs HA
On 08/16/2012 10:37 AM, Uma Maheswara Rao G wrote:
> Don't confuse with the backupnode/checkpoint nodes here.
>
> The new HA architecture mainly targetted to build HA with Namenode states.
Thanks, your explanation is already helpful. If you say "the new HA
architecture", does this mean that the (older) ideas to extend the
backupnode functionality to provide (warm) standby as explained here
https://issues.apache.org/jira/browse/HADOOP-4539?focusedCommentId=12674954&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12674954
are no longer valid?
Or to come back to one of my previous questions: is replacing (now
deprecated) secondary namenodes with backup namenodes a future proof
idea, or should I maybe go for the new HA architecture right away?
thanks
Jan
RE: checkpointnode backupnode hdfs HA
Posted by Uma Maheswara Rao G <ma...@huawei.com>.
Community already gone ahead with the explained HA Solution below. HDFS-1623
Yes, there were alternative solution proposed before (using backupNode approach) as well. like HDFS-2124,HDFS-2064. Not much work done there.
>to come back to one of my previous questions: is replacing (now
>deprecated) secondary namenodes with backup namenodes a future proof
>idea, or should I maybe go for the new HA architecture right away?
I should say yes.
Regards,
Uma
________________________________________
From: Jan Van Besien [janvb@ngdata.com]
Sent: Thursday, August 16, 2012 3:21 PM
To: user@hadoop.apache.org
Subject: Re: checkpointnode backupnode hdfs HA
On 08/16/2012 10:37 AM, Uma Maheswara Rao G wrote:
> Don't confuse with the backupnode/checkpoint nodes here.
>
> The new HA architecture mainly targetted to build HA with Namenode states.
Thanks, your explanation is already helpful. If you say "the new HA
architecture", does this mean that the (older) ideas to extend the
backupnode functionality to provide (warm) standby as explained here
https://issues.apache.org/jira/browse/HADOOP-4539?focusedCommentId=12674954&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12674954
are no longer valid?
Or to come back to one of my previous questions: is replacing (now
deprecated) secondary namenodes with backup namenodes a future proof
idea, or should I maybe go for the new HA architecture right away?
thanks
Jan
RE: checkpointnode backupnode hdfs HA
Posted by Uma Maheswara Rao G <ma...@huawei.com>.
Community already gone ahead with the explained HA Solution below. HDFS-1623
Yes, there were alternative solution proposed before (using backupNode approach) as well. like HDFS-2124,HDFS-2064. Not much work done there.
>to come back to one of my previous questions: is replacing (now
>deprecated) secondary namenodes with backup namenodes a future proof
>idea, or should I maybe go for the new HA architecture right away?
I should say yes.
Regards,
Uma
________________________________________
From: Jan Van Besien [janvb@ngdata.com]
Sent: Thursday, August 16, 2012 3:21 PM
To: user@hadoop.apache.org
Subject: Re: checkpointnode backupnode hdfs HA
On 08/16/2012 10:37 AM, Uma Maheswara Rao G wrote:
> Don't confuse with the backupnode/checkpoint nodes here.
>
> The new HA architecture mainly targetted to build HA with Namenode states.
Thanks, your explanation is already helpful. If you say "the new HA
architecture", does this mean that the (older) ideas to extend the
backupnode functionality to provide (warm) standby as explained here
https://issues.apache.org/jira/browse/HADOOP-4539?focusedCommentId=12674954&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12674954
are no longer valid?
Or to come back to one of my previous questions: is replacing (now
deprecated) secondary namenodes with backup namenodes a future proof
idea, or should I maybe go for the new HA architecture right away?
thanks
Jan
RE: checkpointnode backupnode hdfs HA
Posted by Uma Maheswara Rao G <ma...@huawei.com>.
Community already gone ahead with the explained HA Solution below. HDFS-1623
Yes, there were alternative solution proposed before (using backupNode approach) as well. like HDFS-2124,HDFS-2064. Not much work done there.
>to come back to one of my previous questions: is replacing (now
>deprecated) secondary namenodes with backup namenodes a future proof
>idea, or should I maybe go for the new HA architecture right away?
I should say yes.
Regards,
Uma
________________________________________
From: Jan Van Besien [janvb@ngdata.com]
Sent: Thursday, August 16, 2012 3:21 PM
To: user@hadoop.apache.org
Subject: Re: checkpointnode backupnode hdfs HA
On 08/16/2012 10:37 AM, Uma Maheswara Rao G wrote:
> Don't confuse with the backupnode/checkpoint nodes here.
>
> The new HA architecture mainly targetted to build HA with Namenode states.
Thanks, your explanation is already helpful. If you say "the new HA
architecture", does this mean that the (older) ideas to extend the
backupnode functionality to provide (warm) standby as explained here
https://issues.apache.org/jira/browse/HADOOP-4539?focusedCommentId=12674954&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12674954
are no longer valid?
Or to come back to one of my previous questions: is replacing (now
deprecated) secondary namenodes with backup namenodes a future proof
idea, or should I maybe go for the new HA architecture right away?
thanks
Jan
Re: checkpointnode backupnode hdfs HA
Posted by Jan Van Besien <ja...@ngdata.com>.
On 08/16/2012 10:37 AM, Uma Maheswara Rao G wrote:
> Don't confuse with the backupnode/checkpoint nodes here.
>
> The new HA architecture mainly targetted to build HA with Namenode states.
Thanks, your explanation is already helpful. If you say "the new HA
architecture", does this mean that the (older) ideas to extend the
backupnode functionality to provide (warm) standby as explained here
https://issues.apache.org/jira/browse/HADOOP-4539?focusedCommentId=12674954&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12674954
are no longer valid?
Or to come back to one of my previous questions: is replacing (now
deprecated) secondary namenodes with backup namenodes a future proof
idea, or should I maybe go for the new HA architecture right away?
thanks
Jan
Re: checkpointnode backupnode hdfs HA
Posted by Jan Van Besien <ja...@ngdata.com>.
On 08/16/2012 10:37 AM, Uma Maheswara Rao G wrote:
> Don't confuse with the backupnode/checkpoint nodes here.
>
> The new HA architecture mainly targetted to build HA with Namenode states.
Thanks, your explanation is already helpful. If you say "the new HA
architecture", does this mean that the (older) ideas to extend the
backupnode functionality to provide (warm) standby as explained here
https://issues.apache.org/jira/browse/HADOOP-4539?focusedCommentId=12674954&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12674954
are no longer valid?
Or to come back to one of my previous questions: is replacing (now
deprecated) secondary namenodes with backup namenodes a future proof
idea, or should I maybe go for the new HA architecture right away?
thanks
Jan
Stopping a single Datanode
Posted by Terry Healy <th...@bnl.gov>.
Sorry - this seems pretty basic, but I could not find a reference on
line or in my books. Is there a graceful way to stop a single datanode,
(for example to move the system to a new rack where it will be put back
on-line) or do you just whack the process ID and let HDFS clean up the
mess?
Thanks
Re: checkpointnode backupnode hdfs HA
Posted by Jan Van Besien <ja...@ngdata.com>.
On 08/16/2012 10:37 AM, Uma Maheswara Rao G wrote:
> Don't confuse with the backupnode/checkpoint nodes here.
>
> The new HA architecture mainly targetted to build HA with Namenode states.
Thanks, your explanation is already helpful. If you say "the new HA
architecture", does this mean that the (older) ideas to extend the
backupnode functionality to provide (warm) standby as explained here
https://issues.apache.org/jira/browse/HADOOP-4539?focusedCommentId=12674954&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12674954
are no longer valid?
Or to come back to one of my previous questions: is replacing (now
deprecated) secondary namenodes with backup namenodes a future proof
idea, or should I maybe go for the new HA architecture right away?
thanks
Jan
Stopping a single Datanode
Posted by Terry Healy <th...@bnl.gov>.
Sorry - this seems pretty basic, but I could not find a reference on
line or in my books. Is there a graceful way to stop a single datanode,
(for example to move the system to a new rack where it will be put back
on-line) or do you just whack the process ID and let HDFS clean up the
mess?
Thanks
Stopping a single Datanode
Posted by Terry Healy <th...@bnl.gov>.
Sorry - this seems pretty basic, but I could not find a reference on
line or in my books. Is there a graceful way to stop a single datanode,
(for example to move the system to a new rack where it will be put back
on-line) or do you just whack the process ID and let HDFS clean up the
mess?
Thanks
Re: checkpointnode backupnode hdfs HA
Posted by Jan Van Besien <ja...@ngdata.com>.
On 08/16/2012 10:37 AM, Uma Maheswara Rao G wrote:
> Don't confuse with the backupnode/checkpoint nodes here.
>
> The new HA architecture mainly targetted to build HA with Namenode states.
Thanks, your explanation is already helpful. If you say "the new HA
architecture", does this mean that the (older) ideas to extend the
backupnode functionality to provide (warm) standby as explained here
https://issues.apache.org/jira/browse/HADOOP-4539?focusedCommentId=12674954&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12674954
are no longer valid?
Or to come back to one of my previous questions: is replacing (now
deprecated) secondary namenodes with backup namenodes a future proof
idea, or should I maybe go for the new HA architecture right away?
thanks
Jan
RE: checkpointnode backupnode hdfs HA
Posted by Uma Maheswara Rao G <ma...@huawei.com>.
Hi Jan,
Don't confuse with the backupnode/checkpoint nodes here.
The new HA architecture mainly targetted to build HA with Namenode states.
1) Active Namenode
2) Standby Namenode
When you start NN, they both will start in standby mode bydefault.
then you can switch one NN to active state by giving ha admin commands or by configuring ZKFC( auto failover) process(not release officially yet).
So, the NN state will start required services accordingly.
This is almost like a new implementation for StandbyNode checkpointing process.
Active NN will write edits to local dirs and shared NN dirs. Standby node will keep tail the edits from Shared NN dirs.
Coming to this Shared storage part:
Currently there are 3 options.
1) NFS filers ( mey need to buy external devices)
2) BookKeeper ( Its a subproject of open source ZooKeeper). This is mainly inspired by NN. This is high performance write ahead logging system. and also it can scale to more nodes depending on usage dynamically.
Now the integration with BookKeeper already available and we are running the some clusters with that. HDFS-3399
3) Other option is Quorum based approach, this is under development. This is mainly aimed to develop shared storage nodes inside HDFS itself
and can make use of proven RPC protocols for unified security mechanisms and use the proven edits storage layers. HDFS-3077.
I hope, this will give more idea on current HA in community.
Regards,
Uma
________________________________________
From: Jan Van Besien [janvb@ngdata.com]
Sent: Thursday, August 16, 2012 1:41 PM
To: user@hadoop.apache.org
Subject: checkpointnode backupnode hdfs HA
I am a bit confused about the different options for namenode high
availability (or something along those lines) in CDH4 (hadoop-2.0.0).
I understand that the secondary namenode is deprecated, and that there
are two options to replace it: checkpoint or backup namenodes. Both are
well explained in the documentation, but the confusion begins when
reading about "HDFS High Availability", for example here:
http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailability.html
Is the topic "HDFS High Availability" as described there (using shared
storage) related to checkpoint/backup nodes. If so, in what way?
If I read about backup nodes, it also seems to be aimed at high
availability. From what I understood, the current implementation doesn't
provide (warm) fail-over yet, but this is planned. So starting to
replace secondary namenodes now with backup namenodes sounds like a
future proof idea?
thanks,
Jan
RE: checkpointnode backupnode hdfs HA
Posted by Uma Maheswara Rao G <ma...@huawei.com>.
Hi Jan,
Don't confuse with the backupnode/checkpoint nodes here.
The new HA architecture mainly targetted to build HA with Namenode states.
1) Active Namenode
2) Standby Namenode
When you start NN, they both will start in standby mode bydefault.
then you can switch one NN to active state by giving ha admin commands or by configuring ZKFC( auto failover) process(not release officially yet).
So, the NN state will start required services accordingly.
This is almost like a new implementation for StandbyNode checkpointing process.
Active NN will write edits to local dirs and shared NN dirs. Standby node will keep tail the edits from Shared NN dirs.
Coming to this Shared storage part:
Currently there are 3 options.
1) NFS filers ( mey need to buy external devices)
2) BookKeeper ( Its a subproject of open source ZooKeeper). This is mainly inspired by NN. This is high performance write ahead logging system. and also it can scale to more nodes depending on usage dynamically.
Now the integration with BookKeeper already available and we are running the some clusters with that. HDFS-3399
3) Other option is Quorum based approach, this is under development. This is mainly aimed to develop shared storage nodes inside HDFS itself
and can make use of proven RPC protocols for unified security mechanisms and use the proven edits storage layers. HDFS-3077.
I hope, this will give more idea on current HA in community.
Regards,
Uma
________________________________________
From: Jan Van Besien [janvb@ngdata.com]
Sent: Thursday, August 16, 2012 1:41 PM
To: user@hadoop.apache.org
Subject: checkpointnode backupnode hdfs HA
I am a bit confused about the different options for namenode high
availability (or something along those lines) in CDH4 (hadoop-2.0.0).
I understand that the secondary namenode is deprecated, and that there
are two options to replace it: checkpoint or backup namenodes. Both are
well explained in the documentation, but the confusion begins when
reading about "HDFS High Availability", for example here:
http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailability.html
Is the topic "HDFS High Availability" as described there (using shared
storage) related to checkpoint/backup nodes. If so, in what way?
If I read about backup nodes, it also seems to be aimed at high
availability. From what I understood, the current implementation doesn't
provide (warm) fail-over yet, but this is planned. So starting to
replace secondary namenodes now with backup namenodes sounds like a
future proof idea?
thanks,
Jan
RE: checkpointnode backupnode hdfs HA
Posted by Uma Maheswara Rao G <ma...@huawei.com>.
Hi Jan,
Don't confuse with the backupnode/checkpoint nodes here.
The new HA architecture mainly targetted to build HA with Namenode states.
1) Active Namenode
2) Standby Namenode
When you start NN, they both will start in standby mode bydefault.
then you can switch one NN to active state by giving ha admin commands or by configuring ZKFC( auto failover) process(not release officially yet).
So, the NN state will start required services accordingly.
This is almost like a new implementation for StandbyNode checkpointing process.
Active NN will write edits to local dirs and shared NN dirs. Standby node will keep tail the edits from Shared NN dirs.
Coming to this Shared storage part:
Currently there are 3 options.
1) NFS filers ( mey need to buy external devices)
2) BookKeeper ( Its a subproject of open source ZooKeeper). This is mainly inspired by NN. This is high performance write ahead logging system. and also it can scale to more nodes depending on usage dynamically.
Now the integration with BookKeeper already available and we are running the some clusters with that. HDFS-3399
3) Other option is Quorum based approach, this is under development. This is mainly aimed to develop shared storage nodes inside HDFS itself
and can make use of proven RPC protocols for unified security mechanisms and use the proven edits storage layers. HDFS-3077.
I hope, this will give more idea on current HA in community.
Regards,
Uma
________________________________________
From: Jan Van Besien [janvb@ngdata.com]
Sent: Thursday, August 16, 2012 1:41 PM
To: user@hadoop.apache.org
Subject: checkpointnode backupnode hdfs HA
I am a bit confused about the different options for namenode high
availability (or something along those lines) in CDH4 (hadoop-2.0.0).
I understand that the secondary namenode is deprecated, and that there
are two options to replace it: checkpoint or backup namenodes. Both are
well explained in the documentation, but the confusion begins when
reading about "HDFS High Availability", for example here:
http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailability.html
Is the topic "HDFS High Availability" as described there (using shared
storage) related to checkpoint/backup nodes. If so, in what way?
If I read about backup nodes, it also seems to be aimed at high
availability. From what I understood, the current implementation doesn't
provide (warm) fail-over yet, but this is planned. So starting to
replace secondary namenodes now with backup namenodes sounds like a
future proof idea?
thanks,
Jan
RE: checkpointnode backupnode hdfs HA
Posted by Uma Maheswara Rao G <ma...@huawei.com>.
Hi Jan,
Don't confuse with the backupnode/checkpoint nodes here.
The new HA architecture mainly targetted to build HA with Namenode states.
1) Active Namenode
2) Standby Namenode
When you start NN, they both will start in standby mode bydefault.
then you can switch one NN to active state by giving ha admin commands or by configuring ZKFC( auto failover) process(not release officially yet).
So, the NN state will start required services accordingly.
This is almost like a new implementation for StandbyNode checkpointing process.
Active NN will write edits to local dirs and shared NN dirs. Standby node will keep tail the edits from Shared NN dirs.
Coming to this Shared storage part:
Currently there are 3 options.
1) NFS filers ( mey need to buy external devices)
2) BookKeeper ( Its a subproject of open source ZooKeeper). This is mainly inspired by NN. This is high performance write ahead logging system. and also it can scale to more nodes depending on usage dynamically.
Now the integration with BookKeeper already available and we are running the some clusters with that. HDFS-3399
3) Other option is Quorum based approach, this is under development. This is mainly aimed to develop shared storage nodes inside HDFS itself
and can make use of proven RPC protocols for unified security mechanisms and use the proven edits storage layers. HDFS-3077.
I hope, this will give more idea on current HA in community.
Regards,
Uma
________________________________________
From: Jan Van Besien [janvb@ngdata.com]
Sent: Thursday, August 16, 2012 1:41 PM
To: user@hadoop.apache.org
Subject: checkpointnode backupnode hdfs HA
I am a bit confused about the different options for namenode high
availability (or something along those lines) in CDH4 (hadoop-2.0.0).
I understand that the secondary namenode is deprecated, and that there
are two options to replace it: checkpoint or backup namenodes. Both are
well explained in the documentation, but the confusion begins when
reading about "HDFS High Availability", for example here:
http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailability.html
Is the topic "HDFS High Availability" as described there (using shared
storage) related to checkpoint/backup nodes. If so, in what way?
If I read about backup nodes, it also seems to be aimed at high
availability. From what I understood, the current implementation doesn't
provide (warm) fail-over yet, but this is planned. So starting to
replace secondary namenodes now with backup namenodes sounds like a
future proof idea?
thanks,
Jan