You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Chandrashekhar Kotekar <sh...@gmail.com> on 2014/12/12 11:57:38 UTC

What happens to data nodes when name node has failed for long time?

Hi,

What happens if name node has crashed for more than one hour but secondary
name node, all the data nodes, job tracker, task trackers are running fine?
Do those daemon services also automatically shutdown after some time? Or
those services keep running hoping for namenode to come back?

Regards,
Chandrash3khar Kotekar
Mobile - +91 8600011455

Re: What happens to data nodes when name node has failed for long time?

Posted by Chandrashekhar Kotekar <sh...@gmail.com>.
Hi Mark,

Thanks for giving detailed information about name node failure and High
availability feature.

Wish you all the best in your job search.

Thanks again...


Regards,
Chandrash3khar Kotekar
Mobile - +91 8600011455

On Mon, Dec 15, 2014 at 6:29 AM, mark charts <mc...@yahoo.com> wrote:
>
>
> "Prior to the Hadoop 2.x series, the NameNode was a single point of
> failure in an
> HDFS cluster — in other words, if the machine on which the single NameNode
> was configured became unavailable, the entire cluster would be unavailable
> until the NameNode could be restarted. This was bad news, especially in the
> case of unplanned outages, which could result in significant downtime if
> the
> cluster administrator weren’t available to restart the NameNode.
> The solution to this problem is addressed by the HDFS High Availability
> fea-
> ture. The idea is to run two NameNodes in the same cluster — one active
> NameNode and one hot standby NameNode. If the active NameNode crashes
> or needs to be stopped for planned maintenance, it can be quickly failed
> over
> to the hot standby NameNode, which now becomes the active NameNode.
> The key is to keep the standby node synchronized with the active node; this
> action is now accomplished by having both nodes access a shared NFS direc-
> tory. All namespace changes on the active node are logged in the shared
> directory. The standby node picks up those changes from the directory and
> applies them to its own namespace. In this way, the standby NameNode acts
> as a current backup of the active NameNode. The standby node also has cur-
> rent block location information, because DataNode heartbeats are routinely
> sent to both active and standby NameNodes.
> To ensure that only one NameNode is the “active” node at any given time,
> configure a fencing process for the shared storage directory; then, during
> a
> failover, if it appears that the failed NameNode still carries the active
> state,
> the configured fencing process prevents that node from accessing the shared
> directory and permits the newly active node (the former standby node) to
> complete the failover.
> The machines that will serve as the active and standby NameNodes in your
> High Availability cluster should have equivalent hardware. The shared NFS
> storage directory, which must be accessible to both active and standby
> NameNodes, is usually located on a separate machine and can be mounted on
> each NameNode machine. To prevent this directory from becoming a single
> point of failure, configure multiple network paths to the storage
> directory, and
> ensure that there’s redundancy in the storage itself. Use a dedicated
> network-
> attached storage (NAS) appliance to contain the shared storage directory."
>   *sic*
>
> Courtesy of Dirk deRoos, Paul C. Zikopoulos, Bruce Brown,
> Rafael Coss, and Roman B. Melnyk.
>
>
> Ps. I am looking for work as Hadoop Admin/Developer (I am an Electrical
> Engr w/ MSEE). I've implemented one 6 node cluster successfully at work a
> few months ago for productivity purposes at work (that's my claim to fame).
> I was laid off shortly afterwards. No correlation I suspect. But I am in FL
> and willing to go anywhere to find contract/permanent work. If anyone knows
> of a position for a tenacious Hadoop engineer, I am interested.
>
>
> Thank you.
>
>
> Mark Charts
>
>
>
>   On Sunday, December 14, 2014 5:30 PM, daemeon reiydelle <
> daemeonr@gmail.com> wrote:
>
>
> I found the terminology of primary and secondary to be a bit confusing in
> describing operation after a failure scenario. Perhaps it is helpful to
> think that the Hadoop instance is guided to select a node as primary for
> normal operation. If that node fails, then the backup becomes the new
> primary. In analyzing traffic it appears that the restored node does not
> become primary again until the whole instance restarts. I myself would
> welcome clarification on this observed behavior.
>
>
>
> *.......*
>
>
>
>
>
>
> *“Life should not be a journey to the grave with the intention of arriving
> safely in apretty and well preserved body, but rather to skid in broadside
> in a cloud of smoke,thoroughly used up, totally worn out, and loudly
> proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
> (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
> <%28%2B44%29%20%280%29%2020%208144%209872>*
>
> On Fri, Dec 12, 2014 at 7:56 AM, Rich Haase <rh...@pandora.com> wrote:
>
>   The remaining cluster services will continue to run.  That way when the
> namenode (or other failed processes) is restored the cluster will resume
> healthy operation.  This is part of hadoop’s ability to handle network
> partition events.
>
>  *Rich Haase* | Sr. Software Engineer | Pandora
> m 303.887.1146 | rhaase@pandora.com
>
>   From: Chandrashekhar Kotekar <sh...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Friday, December 12, 2014 at 3:57 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: What happens to data nodes when name node has failed for long
> time?
>
>   Hi,
>
>  What happens if name node has crashed for more than one hour but
> secondary name node, all the data nodes, job tracker, task trackers are
> running fine? Do those daemon services also automatically shutdown after
> some time? Or those services keep running hoping for namenode to come back?
>
> Regards,
> Chandrash3khar Kotekar
> Mobile - +91 8600011455
>
>
>
>
>

Re: What happens to data nodes when name node has failed for long time?

Posted by Chandrashekhar Kotekar <sh...@gmail.com>.
Hi Mark,

Thanks for giving detailed information about name node failure and High
availability feature.

Wish you all the best in your job search.

Thanks again...


Regards,
Chandrash3khar Kotekar
Mobile - +91 8600011455

On Mon, Dec 15, 2014 at 6:29 AM, mark charts <mc...@yahoo.com> wrote:
>
>
> "Prior to the Hadoop 2.x series, the NameNode was a single point of
> failure in an
> HDFS cluster — in other words, if the machine on which the single NameNode
> was configured became unavailable, the entire cluster would be unavailable
> until the NameNode could be restarted. This was bad news, especially in the
> case of unplanned outages, which could result in significant downtime if
> the
> cluster administrator weren’t available to restart the NameNode.
> The solution to this problem is addressed by the HDFS High Availability
> fea-
> ture. The idea is to run two NameNodes in the same cluster — one active
> NameNode and one hot standby NameNode. If the active NameNode crashes
> or needs to be stopped for planned maintenance, it can be quickly failed
> over
> to the hot standby NameNode, which now becomes the active NameNode.
> The key is to keep the standby node synchronized with the active node; this
> action is now accomplished by having both nodes access a shared NFS direc-
> tory. All namespace changes on the active node are logged in the shared
> directory. The standby node picks up those changes from the directory and
> applies them to its own namespace. In this way, the standby NameNode acts
> as a current backup of the active NameNode. The standby node also has cur-
> rent block location information, because DataNode heartbeats are routinely
> sent to both active and standby NameNodes.
> To ensure that only one NameNode is the “active” node at any given time,
> configure a fencing process for the shared storage directory; then, during
> a
> failover, if it appears that the failed NameNode still carries the active
> state,
> the configured fencing process prevents that node from accessing the shared
> directory and permits the newly active node (the former standby node) to
> complete the failover.
> The machines that will serve as the active and standby NameNodes in your
> High Availability cluster should have equivalent hardware. The shared NFS
> storage directory, which must be accessible to both active and standby
> NameNodes, is usually located on a separate machine and can be mounted on
> each NameNode machine. To prevent this directory from becoming a single
> point of failure, configure multiple network paths to the storage
> directory, and
> ensure that there’s redundancy in the storage itself. Use a dedicated
> network-
> attached storage (NAS) appliance to contain the shared storage directory."
>   *sic*
>
> Courtesy of Dirk deRoos, Paul C. Zikopoulos, Bruce Brown,
> Rafael Coss, and Roman B. Melnyk.
>
>
> Ps. I am looking for work as Hadoop Admin/Developer (I am an Electrical
> Engr w/ MSEE). I've implemented one 6 node cluster successfully at work a
> few months ago for productivity purposes at work (that's my claim to fame).
> I was laid off shortly afterwards. No correlation I suspect. But I am in FL
> and willing to go anywhere to find contract/permanent work. If anyone knows
> of a position for a tenacious Hadoop engineer, I am interested.
>
>
> Thank you.
>
>
> Mark Charts
>
>
>
>   On Sunday, December 14, 2014 5:30 PM, daemeon reiydelle <
> daemeonr@gmail.com> wrote:
>
>
> I found the terminology of primary and secondary to be a bit confusing in
> describing operation after a failure scenario. Perhaps it is helpful to
> think that the Hadoop instance is guided to select a node as primary for
> normal operation. If that node fails, then the backup becomes the new
> primary. In analyzing traffic it appears that the restored node does not
> become primary again until the whole instance restarts. I myself would
> welcome clarification on this observed behavior.
>
>
>
> *.......*
>
>
>
>
>
>
> *“Life should not be a journey to the grave with the intention of arriving
> safely in apretty and well preserved body, but rather to skid in broadside
> in a cloud of smoke,thoroughly used up, totally worn out, and loudly
> proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
> (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
> <%28%2B44%29%20%280%29%2020%208144%209872>*
>
> On Fri, Dec 12, 2014 at 7:56 AM, Rich Haase <rh...@pandora.com> wrote:
>
>   The remaining cluster services will continue to run.  That way when the
> namenode (or other failed processes) is restored the cluster will resume
> healthy operation.  This is part of hadoop’s ability to handle network
> partition events.
>
>  *Rich Haase* | Sr. Software Engineer | Pandora
> m 303.887.1146 | rhaase@pandora.com
>
>   From: Chandrashekhar Kotekar <sh...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Friday, December 12, 2014 at 3:57 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: What happens to data nodes when name node has failed for long
> time?
>
>   Hi,
>
>  What happens if name node has crashed for more than one hour but
> secondary name node, all the data nodes, job tracker, task trackers are
> running fine? Do those daemon services also automatically shutdown after
> some time? Or those services keep running hoping for namenode to come back?
>
> Regards,
> Chandrash3khar Kotekar
> Mobile - +91 8600011455
>
>
>
>
>

Re: What happens to data nodes when name node has failed for long time?

Posted by Chandrashekhar Kotekar <sh...@gmail.com>.
Hi Mark,

Thanks for giving detailed information about name node failure and High
availability feature.

Wish you all the best in your job search.

Thanks again...


Regards,
Chandrash3khar Kotekar
Mobile - +91 8600011455

On Mon, Dec 15, 2014 at 6:29 AM, mark charts <mc...@yahoo.com> wrote:
>
>
> "Prior to the Hadoop 2.x series, the NameNode was a single point of
> failure in an
> HDFS cluster — in other words, if the machine on which the single NameNode
> was configured became unavailable, the entire cluster would be unavailable
> until the NameNode could be restarted. This was bad news, especially in the
> case of unplanned outages, which could result in significant downtime if
> the
> cluster administrator weren’t available to restart the NameNode.
> The solution to this problem is addressed by the HDFS High Availability
> fea-
> ture. The idea is to run two NameNodes in the same cluster — one active
> NameNode and one hot standby NameNode. If the active NameNode crashes
> or needs to be stopped for planned maintenance, it can be quickly failed
> over
> to the hot standby NameNode, which now becomes the active NameNode.
> The key is to keep the standby node synchronized with the active node; this
> action is now accomplished by having both nodes access a shared NFS direc-
> tory. All namespace changes on the active node are logged in the shared
> directory. The standby node picks up those changes from the directory and
> applies them to its own namespace. In this way, the standby NameNode acts
> as a current backup of the active NameNode. The standby node also has cur-
> rent block location information, because DataNode heartbeats are routinely
> sent to both active and standby NameNodes.
> To ensure that only one NameNode is the “active” node at any given time,
> configure a fencing process for the shared storage directory; then, during
> a
> failover, if it appears that the failed NameNode still carries the active
> state,
> the configured fencing process prevents that node from accessing the shared
> directory and permits the newly active node (the former standby node) to
> complete the failover.
> The machines that will serve as the active and standby NameNodes in your
> High Availability cluster should have equivalent hardware. The shared NFS
> storage directory, which must be accessible to both active and standby
> NameNodes, is usually located on a separate machine and can be mounted on
> each NameNode machine. To prevent this directory from becoming a single
> point of failure, configure multiple network paths to the storage
> directory, and
> ensure that there’s redundancy in the storage itself. Use a dedicated
> network-
> attached storage (NAS) appliance to contain the shared storage directory."
>   *sic*
>
> Courtesy of Dirk deRoos, Paul C. Zikopoulos, Bruce Brown,
> Rafael Coss, and Roman B. Melnyk.
>
>
> Ps. I am looking for work as Hadoop Admin/Developer (I am an Electrical
> Engr w/ MSEE). I've implemented one 6 node cluster successfully at work a
> few months ago for productivity purposes at work (that's my claim to fame).
> I was laid off shortly afterwards. No correlation I suspect. But I am in FL
> and willing to go anywhere to find contract/permanent work. If anyone knows
> of a position for a tenacious Hadoop engineer, I am interested.
>
>
> Thank you.
>
>
> Mark Charts
>
>
>
>   On Sunday, December 14, 2014 5:30 PM, daemeon reiydelle <
> daemeonr@gmail.com> wrote:
>
>
> I found the terminology of primary and secondary to be a bit confusing in
> describing operation after a failure scenario. Perhaps it is helpful to
> think that the Hadoop instance is guided to select a node as primary for
> normal operation. If that node fails, then the backup becomes the new
> primary. In analyzing traffic it appears that the restored node does not
> become primary again until the whole instance restarts. I myself would
> welcome clarification on this observed behavior.
>
>
>
> *.......*
>
>
>
>
>
>
> *“Life should not be a journey to the grave with the intention of arriving
> safely in apretty and well preserved body, but rather to skid in broadside
> in a cloud of smoke,thoroughly used up, totally worn out, and loudly
> proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
> (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
> <%28%2B44%29%20%280%29%2020%208144%209872>*
>
> On Fri, Dec 12, 2014 at 7:56 AM, Rich Haase <rh...@pandora.com> wrote:
>
>   The remaining cluster services will continue to run.  That way when the
> namenode (or other failed processes) is restored the cluster will resume
> healthy operation.  This is part of hadoop’s ability to handle network
> partition events.
>
>  *Rich Haase* | Sr. Software Engineer | Pandora
> m 303.887.1146 | rhaase@pandora.com
>
>   From: Chandrashekhar Kotekar <sh...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Friday, December 12, 2014 at 3:57 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: What happens to data nodes when name node has failed for long
> time?
>
>   Hi,
>
>  What happens if name node has crashed for more than one hour but
> secondary name node, all the data nodes, job tracker, task trackers are
> running fine? Do those daemon services also automatically shutdown after
> some time? Or those services keep running hoping for namenode to come back?
>
> Regards,
> Chandrash3khar Kotekar
> Mobile - +91 8600011455
>
>
>
>
>

Re: What happens to data nodes when name node has failed for long time?

Posted by Chandrashekhar Kotekar <sh...@gmail.com>.
Hi Mark,

Thanks for giving detailed information about name node failure and High
availability feature.

Wish you all the best in your job search.

Thanks again...


Regards,
Chandrash3khar Kotekar
Mobile - +91 8600011455

On Mon, Dec 15, 2014 at 6:29 AM, mark charts <mc...@yahoo.com> wrote:
>
>
> "Prior to the Hadoop 2.x series, the NameNode was a single point of
> failure in an
> HDFS cluster — in other words, if the machine on which the single NameNode
> was configured became unavailable, the entire cluster would be unavailable
> until the NameNode could be restarted. This was bad news, especially in the
> case of unplanned outages, which could result in significant downtime if
> the
> cluster administrator weren’t available to restart the NameNode.
> The solution to this problem is addressed by the HDFS High Availability
> fea-
> ture. The idea is to run two NameNodes in the same cluster — one active
> NameNode and one hot standby NameNode. If the active NameNode crashes
> or needs to be stopped for planned maintenance, it can be quickly failed
> over
> to the hot standby NameNode, which now becomes the active NameNode.
> The key is to keep the standby node synchronized with the active node; this
> action is now accomplished by having both nodes access a shared NFS direc-
> tory. All namespace changes on the active node are logged in the shared
> directory. The standby node picks up those changes from the directory and
> applies them to its own namespace. In this way, the standby NameNode acts
> as a current backup of the active NameNode. The standby node also has cur-
> rent block location information, because DataNode heartbeats are routinely
> sent to both active and standby NameNodes.
> To ensure that only one NameNode is the “active” node at any given time,
> configure a fencing process for the shared storage directory; then, during
> a
> failover, if it appears that the failed NameNode still carries the active
> state,
> the configured fencing process prevents that node from accessing the shared
> directory and permits the newly active node (the former standby node) to
> complete the failover.
> The machines that will serve as the active and standby NameNodes in your
> High Availability cluster should have equivalent hardware. The shared NFS
> storage directory, which must be accessible to both active and standby
> NameNodes, is usually located on a separate machine and can be mounted on
> each NameNode machine. To prevent this directory from becoming a single
> point of failure, configure multiple network paths to the storage
> directory, and
> ensure that there’s redundancy in the storage itself. Use a dedicated
> network-
> attached storage (NAS) appliance to contain the shared storage directory."
>   *sic*
>
> Courtesy of Dirk deRoos, Paul C. Zikopoulos, Bruce Brown,
> Rafael Coss, and Roman B. Melnyk.
>
>
> Ps. I am looking for work as Hadoop Admin/Developer (I am an Electrical
> Engr w/ MSEE). I've implemented one 6 node cluster successfully at work a
> few months ago for productivity purposes at work (that's my claim to fame).
> I was laid off shortly afterwards. No correlation I suspect. But I am in FL
> and willing to go anywhere to find contract/permanent work. If anyone knows
> of a position for a tenacious Hadoop engineer, I am interested.
>
>
> Thank you.
>
>
> Mark Charts
>
>
>
>   On Sunday, December 14, 2014 5:30 PM, daemeon reiydelle <
> daemeonr@gmail.com> wrote:
>
>
> I found the terminology of primary and secondary to be a bit confusing in
> describing operation after a failure scenario. Perhaps it is helpful to
> think that the Hadoop instance is guided to select a node as primary for
> normal operation. If that node fails, then the backup becomes the new
> primary. In analyzing traffic it appears that the restored node does not
> become primary again until the whole instance restarts. I myself would
> welcome clarification on this observed behavior.
>
>
>
> *.......*
>
>
>
>
>
>
> *“Life should not be a journey to the grave with the intention of arriving
> safely in apretty and well preserved body, but rather to skid in broadside
> in a cloud of smoke,thoroughly used up, totally worn out, and loudly
> proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
> (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
> <%28%2B44%29%20%280%29%2020%208144%209872>*
>
> On Fri, Dec 12, 2014 at 7:56 AM, Rich Haase <rh...@pandora.com> wrote:
>
>   The remaining cluster services will continue to run.  That way when the
> namenode (or other failed processes) is restored the cluster will resume
> healthy operation.  This is part of hadoop’s ability to handle network
> partition events.
>
>  *Rich Haase* | Sr. Software Engineer | Pandora
> m 303.887.1146 | rhaase@pandora.com
>
>   From: Chandrashekhar Kotekar <sh...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Friday, December 12, 2014 at 3:57 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: What happens to data nodes when name node has failed for long
> time?
>
>   Hi,
>
>  What happens if name node has crashed for more than one hour but
> secondary name node, all the data nodes, job tracker, task trackers are
> running fine? Do those daemon services also automatically shutdown after
> some time? Or those services keep running hoping for namenode to come back?
>
> Regards,
> Chandrash3khar Kotekar
> Mobile - +91 8600011455
>
>
>
>
>

Re: What happens to data nodes when name node has failed for long time?

Posted by mark charts <mc...@yahoo.com>.
"Prior to the Hadoop 2.x series, the NameNode was a single point of failure in anHDFS cluster — in other words, if the machine on which the single NameNodewas configured became unavailable, the entire cluster would be unavailableuntil the NameNode could be restarted. This was bad news, especially in thecase of unplanned outages, which could result in significant downtime if thecluster administrator weren’t available to restart the NameNode.The solution to this problem is addressed by the HDFS High Availability fea-ture. The idea is to run two NameNodes in the same cluster — one activeNameNode and one hot standby NameNode. If the active NameNode crashesor needs to be stopped for planned maintenance, it can be quickly failed overto the hot standby NameNode, which now becomes the active NameNode.The key is to keep the standby node synchronized with the active node; thisaction is now accomplished by having both nodes access a shared NFS direc-tory. All namespace changes on the active node are logged in the shareddirectory. The standby node picks up those changes from the directory andapplies them to its own namespace. In this way, the standby NameNode actsas a current backup of the active NameNode. The standby node also has cur-rent block location information, because DataNode heartbeats are routinelysent to both active and standby NameNodes.To ensure that only one NameNode is the “active” node at any given time,configure a fencing process for the shared storage directory; then, during afailover, if it appears that the failed NameNode still carries the active state,the configured fencing process prevents that node from accessing the shareddirectory and permits the newly active node (the former standby node) tocomplete the failover. The machines that will serve as the active and standby NameNodes in yourHigh Availability cluster should have equivalent hardware. The shared NFSstorage directory, which must be accessible to both active and standbyNameNodes, is usually located on a separate machine and can be mounted oneach NameNode machine. To prevent this directory from becoming a singlepoint of failure, configure multiple network paths to the storage directory, andensure that there’s redundancy in the storage itself. Use a dedicated network-attached storage (NAS) appliance to contain the shared storage directory."   sic
Courtesy of Dirk deRoos, Paul C. Zikopoulos, Bruce Brown,Rafael Coss, and Roman B. Melnyk.

Ps. I am looking for work as Hadoop Admin/Developer (I am an Electrical Engr w/ MSEE). I've implemented one 6 node cluster successfully at work a few months ago for productivity purposes at work (that's my claim to fame). I was laid off shortly afterwards. No correlation I suspect. But I am in FL and willing to go anywhere to find contract/permanent work. If anyone knows of a position for a tenacious Hadoop engineer, I am interested.

Thank you.

Mark Charts
 

     On Sunday, December 14, 2014 5:30 PM, daemeon reiydelle <da...@gmail.com> wrote:
   

 I found the terminology of primary and secondary to be a bit confusing in describing operation after a failure scenario. Perhaps it is helpful to think that the Hadoop instance is guided to select a node as primary for normal operation. If that node fails, then the backup becomes the new primary. In analyzing traffic it appears that the restored node does not become primary again until the whole instance restarts. I myself would welcome clarification on this observed behavior.


.......
“Life should not be a journey to the grave with the intention of arriving safely in a
pretty and well preserved body, but rather to skid in broadside in a cloud of smoke,
thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!” 
- Hunter Thompson

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872

On Fri, Dec 12, 2014 at 7:56 AM, Rich Haase <rh...@pandora.com> wrote:

The remaining cluster services will continue to run.  That way when the namenode (or other failed processes) is restored the cluster will resume healthy operation.  This is part of hadoop’s ability to handle network partition events. Rich Haase | Sr. Software Engineer | Pandoram 303.887.1146 | rhaase@pandora.com
From: Chandrashekhar Kotekar <sh...@gmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Date: Friday, December 12, 2014 at 3:57 AM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: What happens to data nodes when name node has failed for long time?

Hi,
What happens if name node has crashed for more than one hour but secondary name node, all the data nodes, job tracker, task trackers are running fine? Do those daemon services also automatically shutdown after some time? Or those services keep running hoping for namenode to come back?

Regards,
Chandrash3khar Kotekar
Mobile - +91 8600011455




   

Re: What happens to data nodes when name node has failed for long time?

Posted by mark charts <mc...@yahoo.com>.
"Prior to the Hadoop 2.x series, the NameNode was a single point of failure in anHDFS cluster — in other words, if the machine on which the single NameNodewas configured became unavailable, the entire cluster would be unavailableuntil the NameNode could be restarted. This was bad news, especially in thecase of unplanned outages, which could result in significant downtime if thecluster administrator weren’t available to restart the NameNode.The solution to this problem is addressed by the HDFS High Availability fea-ture. The idea is to run two NameNodes in the same cluster — one activeNameNode and one hot standby NameNode. If the active NameNode crashesor needs to be stopped for planned maintenance, it can be quickly failed overto the hot standby NameNode, which now becomes the active NameNode.The key is to keep the standby node synchronized with the active node; thisaction is now accomplished by having both nodes access a shared NFS direc-tory. All namespace changes on the active node are logged in the shareddirectory. The standby node picks up those changes from the directory andapplies them to its own namespace. In this way, the standby NameNode actsas a current backup of the active NameNode. The standby node also has cur-rent block location information, because DataNode heartbeats are routinelysent to both active and standby NameNodes.To ensure that only one NameNode is the “active” node at any given time,configure a fencing process for the shared storage directory; then, during afailover, if it appears that the failed NameNode still carries the active state,the configured fencing process prevents that node from accessing the shareddirectory and permits the newly active node (the former standby node) tocomplete the failover. The machines that will serve as the active and standby NameNodes in yourHigh Availability cluster should have equivalent hardware. The shared NFSstorage directory, which must be accessible to both active and standbyNameNodes, is usually located on a separate machine and can be mounted oneach NameNode machine. To prevent this directory from becoming a singlepoint of failure, configure multiple network paths to the storage directory, andensure that there’s redundancy in the storage itself. Use a dedicated network-attached storage (NAS) appliance to contain the shared storage directory."   sic
Courtesy of Dirk deRoos, Paul C. Zikopoulos, Bruce Brown,Rafael Coss, and Roman B. Melnyk.

Ps. I am looking for work as Hadoop Admin/Developer (I am an Electrical Engr w/ MSEE). I've implemented one 6 node cluster successfully at work a few months ago for productivity purposes at work (that's my claim to fame). I was laid off shortly afterwards. No correlation I suspect. But I am in FL and willing to go anywhere to find contract/permanent work. If anyone knows of a position for a tenacious Hadoop engineer, I am interested.

Thank you.

Mark Charts
 

     On Sunday, December 14, 2014 5:30 PM, daemeon reiydelle <da...@gmail.com> wrote:
   

 I found the terminology of primary and secondary to be a bit confusing in describing operation after a failure scenario. Perhaps it is helpful to think that the Hadoop instance is guided to select a node as primary for normal operation. If that node fails, then the backup becomes the new primary. In analyzing traffic it appears that the restored node does not become primary again until the whole instance restarts. I myself would welcome clarification on this observed behavior.


.......
“Life should not be a journey to the grave with the intention of arriving safely in a
pretty and well preserved body, but rather to skid in broadside in a cloud of smoke,
thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!” 
- Hunter Thompson

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872

On Fri, Dec 12, 2014 at 7:56 AM, Rich Haase <rh...@pandora.com> wrote:

The remaining cluster services will continue to run.  That way when the namenode (or other failed processes) is restored the cluster will resume healthy operation.  This is part of hadoop’s ability to handle network partition events. Rich Haase | Sr. Software Engineer | Pandoram 303.887.1146 | rhaase@pandora.com
From: Chandrashekhar Kotekar <sh...@gmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Date: Friday, December 12, 2014 at 3:57 AM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: What happens to data nodes when name node has failed for long time?

Hi,
What happens if name node has crashed for more than one hour but secondary name node, all the data nodes, job tracker, task trackers are running fine? Do those daemon services also automatically shutdown after some time? Or those services keep running hoping for namenode to come back?

Regards,
Chandrash3khar Kotekar
Mobile - +91 8600011455




   

Re: What happens to data nodes when name node has failed for long time?

Posted by mark charts <mc...@yahoo.com>.
"Prior to the Hadoop 2.x series, the NameNode was a single point of failure in anHDFS cluster — in other words, if the machine on which the single NameNodewas configured became unavailable, the entire cluster would be unavailableuntil the NameNode could be restarted. This was bad news, especially in thecase of unplanned outages, which could result in significant downtime if thecluster administrator weren’t available to restart the NameNode.The solution to this problem is addressed by the HDFS High Availability fea-ture. The idea is to run two NameNodes in the same cluster — one activeNameNode and one hot standby NameNode. If the active NameNode crashesor needs to be stopped for planned maintenance, it can be quickly failed overto the hot standby NameNode, which now becomes the active NameNode.The key is to keep the standby node synchronized with the active node; thisaction is now accomplished by having both nodes access a shared NFS direc-tory. All namespace changes on the active node are logged in the shareddirectory. The standby node picks up those changes from the directory andapplies them to its own namespace. In this way, the standby NameNode actsas a current backup of the active NameNode. The standby node also has cur-rent block location information, because DataNode heartbeats are routinelysent to both active and standby NameNodes.To ensure that only one NameNode is the “active” node at any given time,configure a fencing process for the shared storage directory; then, during afailover, if it appears that the failed NameNode still carries the active state,the configured fencing process prevents that node from accessing the shareddirectory and permits the newly active node (the former standby node) tocomplete the failover. The machines that will serve as the active and standby NameNodes in yourHigh Availability cluster should have equivalent hardware. The shared NFSstorage directory, which must be accessible to both active and standbyNameNodes, is usually located on a separate machine and can be mounted oneach NameNode machine. To prevent this directory from becoming a singlepoint of failure, configure multiple network paths to the storage directory, andensure that there’s redundancy in the storage itself. Use a dedicated network-attached storage (NAS) appliance to contain the shared storage directory."   sic
Courtesy of Dirk deRoos, Paul C. Zikopoulos, Bruce Brown,Rafael Coss, and Roman B. Melnyk.

Ps. I am looking for work as Hadoop Admin/Developer (I am an Electrical Engr w/ MSEE). I've implemented one 6 node cluster successfully at work a few months ago for productivity purposes at work (that's my claim to fame). I was laid off shortly afterwards. No correlation I suspect. But I am in FL and willing to go anywhere to find contract/permanent work. If anyone knows of a position for a tenacious Hadoop engineer, I am interested.

Thank you.

Mark Charts
 

     On Sunday, December 14, 2014 5:30 PM, daemeon reiydelle <da...@gmail.com> wrote:
   

 I found the terminology of primary and secondary to be a bit confusing in describing operation after a failure scenario. Perhaps it is helpful to think that the Hadoop instance is guided to select a node as primary for normal operation. If that node fails, then the backup becomes the new primary. In analyzing traffic it appears that the restored node does not become primary again until the whole instance restarts. I myself would welcome clarification on this observed behavior.


.......
“Life should not be a journey to the grave with the intention of arriving safely in a
pretty and well preserved body, but rather to skid in broadside in a cloud of smoke,
thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!” 
- Hunter Thompson

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872

On Fri, Dec 12, 2014 at 7:56 AM, Rich Haase <rh...@pandora.com> wrote:

The remaining cluster services will continue to run.  That way when the namenode (or other failed processes) is restored the cluster will resume healthy operation.  This is part of hadoop’s ability to handle network partition events. Rich Haase | Sr. Software Engineer | Pandoram 303.887.1146 | rhaase@pandora.com
From: Chandrashekhar Kotekar <sh...@gmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Date: Friday, December 12, 2014 at 3:57 AM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: What happens to data nodes when name node has failed for long time?

Hi,
What happens if name node has crashed for more than one hour but secondary name node, all the data nodes, job tracker, task trackers are running fine? Do those daemon services also automatically shutdown after some time? Or those services keep running hoping for namenode to come back?

Regards,
Chandrash3khar Kotekar
Mobile - +91 8600011455




   

Re: What happens to data nodes when name node has failed for long time?

Posted by mark charts <mc...@yahoo.com>.
"Prior to the Hadoop 2.x series, the NameNode was a single point of failure in anHDFS cluster — in other words, if the machine on which the single NameNodewas configured became unavailable, the entire cluster would be unavailableuntil the NameNode could be restarted. This was bad news, especially in thecase of unplanned outages, which could result in significant downtime if thecluster administrator weren’t available to restart the NameNode.The solution to this problem is addressed by the HDFS High Availability fea-ture. The idea is to run two NameNodes in the same cluster — one activeNameNode and one hot standby NameNode. If the active NameNode crashesor needs to be stopped for planned maintenance, it can be quickly failed overto the hot standby NameNode, which now becomes the active NameNode.The key is to keep the standby node synchronized with the active node; thisaction is now accomplished by having both nodes access a shared NFS direc-tory. All namespace changes on the active node are logged in the shareddirectory. The standby node picks up those changes from the directory andapplies them to its own namespace. In this way, the standby NameNode actsas a current backup of the active NameNode. The standby node also has cur-rent block location information, because DataNode heartbeats are routinelysent to both active and standby NameNodes.To ensure that only one NameNode is the “active” node at any given time,configure a fencing process for the shared storage directory; then, during afailover, if it appears that the failed NameNode still carries the active state,the configured fencing process prevents that node from accessing the shareddirectory and permits the newly active node (the former standby node) tocomplete the failover. The machines that will serve as the active and standby NameNodes in yourHigh Availability cluster should have equivalent hardware. The shared NFSstorage directory, which must be accessible to both active and standbyNameNodes, is usually located on a separate machine and can be mounted oneach NameNode machine. To prevent this directory from becoming a singlepoint of failure, configure multiple network paths to the storage directory, andensure that there’s redundancy in the storage itself. Use a dedicated network-attached storage (NAS) appliance to contain the shared storage directory."   sic
Courtesy of Dirk deRoos, Paul C. Zikopoulos, Bruce Brown,Rafael Coss, and Roman B. Melnyk.

Ps. I am looking for work as Hadoop Admin/Developer (I am an Electrical Engr w/ MSEE). I've implemented one 6 node cluster successfully at work a few months ago for productivity purposes at work (that's my claim to fame). I was laid off shortly afterwards. No correlation I suspect. But I am in FL and willing to go anywhere to find contract/permanent work. If anyone knows of a position for a tenacious Hadoop engineer, I am interested.

Thank you.

Mark Charts
 

     On Sunday, December 14, 2014 5:30 PM, daemeon reiydelle <da...@gmail.com> wrote:
   

 I found the terminology of primary and secondary to be a bit confusing in describing operation after a failure scenario. Perhaps it is helpful to think that the Hadoop instance is guided to select a node as primary for normal operation. If that node fails, then the backup becomes the new primary. In analyzing traffic it appears that the restored node does not become primary again until the whole instance restarts. I myself would welcome clarification on this observed behavior.


.......
“Life should not be a journey to the grave with the intention of arriving safely in a
pretty and well preserved body, but rather to skid in broadside in a cloud of smoke,
thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!” 
- Hunter Thompson

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872

On Fri, Dec 12, 2014 at 7:56 AM, Rich Haase <rh...@pandora.com> wrote:

The remaining cluster services will continue to run.  That way when the namenode (or other failed processes) is restored the cluster will resume healthy operation.  This is part of hadoop’s ability to handle network partition events. Rich Haase | Sr. Software Engineer | Pandoram 303.887.1146 | rhaase@pandora.com
From: Chandrashekhar Kotekar <sh...@gmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Date: Friday, December 12, 2014 at 3:57 AM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: What happens to data nodes when name node has failed for long time?

Hi,
What happens if name node has crashed for more than one hour but secondary name node, all the data nodes, job tracker, task trackers are running fine? Do those daemon services also automatically shutdown after some time? Or those services keep running hoping for namenode to come back?

Regards,
Chandrash3khar Kotekar
Mobile - +91 8600011455




   

Re: What happens to data nodes when name node has failed for long time?

Posted by daemeon reiydelle <da...@gmail.com>.
I found the terminology of primary and secondary to be a bit confusing in
describing operation after a failure scenario. Perhaps it is helpful to
think that the Hadoop instance is guided to select a node as primary for
normal operation. If that node fails, then the backup becomes the new
primary. In analyzing traffic it appears that the restored node does not
become primary again until the whole instance restarts. I myself would
welcome clarification on this observed behavior.



*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Fri, Dec 12, 2014 at 7:56 AM, Rich Haase <rh...@pandora.com> wrote:

>   The remaining cluster services will continue to run.  That way when the
> namenode (or other failed processes) is restored the cluster will resume
> healthy operation.  This is part of hadoop’s ability to handle network
> partition events.
>
>  *Rich Haase* | Sr. Software Engineer | Pandora
> m 303.887.1146 | rhaase@pandora.com
>
>   From: Chandrashekhar Kotekar <sh...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Friday, December 12, 2014 at 3:57 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: What happens to data nodes when name node has failed for long
> time?
>
>   Hi,
>
>  What happens if name node has crashed for more than one hour but
> secondary name node, all the data nodes, job tracker, task trackers are
> running fine? Do those daemon services also automatically shutdown after
> some time? Or those services keep running hoping for namenode to come back?
>
> Regards,
> Chandrash3khar Kotekar
> Mobile - +91 8600011455
>

Re: What happens to data nodes when name node has failed for long time?

Posted by daemeon reiydelle <da...@gmail.com>.
I found the terminology of primary and secondary to be a bit confusing in
describing operation after a failure scenario. Perhaps it is helpful to
think that the Hadoop instance is guided to select a node as primary for
normal operation. If that node fails, then the backup becomes the new
primary. In analyzing traffic it appears that the restored node does not
become primary again until the whole instance restarts. I myself would
welcome clarification on this observed behavior.



*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Fri, Dec 12, 2014 at 7:56 AM, Rich Haase <rh...@pandora.com> wrote:

>   The remaining cluster services will continue to run.  That way when the
> namenode (or other failed processes) is restored the cluster will resume
> healthy operation.  This is part of hadoop’s ability to handle network
> partition events.
>
>  *Rich Haase* | Sr. Software Engineer | Pandora
> m 303.887.1146 | rhaase@pandora.com
>
>   From: Chandrashekhar Kotekar <sh...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Friday, December 12, 2014 at 3:57 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: What happens to data nodes when name node has failed for long
> time?
>
>   Hi,
>
>  What happens if name node has crashed for more than one hour but
> secondary name node, all the data nodes, job tracker, task trackers are
> running fine? Do those daemon services also automatically shutdown after
> some time? Or those services keep running hoping for namenode to come back?
>
> Regards,
> Chandrash3khar Kotekar
> Mobile - +91 8600011455
>

Re: What happens to data nodes when name node has failed for long time?

Posted by daemeon reiydelle <da...@gmail.com>.
I found the terminology of primary and secondary to be a bit confusing in
describing operation after a failure scenario. Perhaps it is helpful to
think that the Hadoop instance is guided to select a node as primary for
normal operation. If that node fails, then the backup becomes the new
primary. In analyzing traffic it appears that the restored node does not
become primary again until the whole instance restarts. I myself would
welcome clarification on this observed behavior.



*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Fri, Dec 12, 2014 at 7:56 AM, Rich Haase <rh...@pandora.com> wrote:

>   The remaining cluster services will continue to run.  That way when the
> namenode (or other failed processes) is restored the cluster will resume
> healthy operation.  This is part of hadoop’s ability to handle network
> partition events.
>
>  *Rich Haase* | Sr. Software Engineer | Pandora
> m 303.887.1146 | rhaase@pandora.com
>
>   From: Chandrashekhar Kotekar <sh...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Friday, December 12, 2014 at 3:57 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: What happens to data nodes when name node has failed for long
> time?
>
>   Hi,
>
>  What happens if name node has crashed for more than one hour but
> secondary name node, all the data nodes, job tracker, task trackers are
> running fine? Do those daemon services also automatically shutdown after
> some time? Or those services keep running hoping for namenode to come back?
>
> Regards,
> Chandrash3khar Kotekar
> Mobile - +91 8600011455
>

Re: What happens to data nodes when name node has failed for long time?

Posted by daemeon reiydelle <da...@gmail.com>.
I found the terminology of primary and secondary to be a bit confusing in
describing operation after a failure scenario. Perhaps it is helpful to
think that the Hadoop instance is guided to select a node as primary for
normal operation. If that node fails, then the backup becomes the new
primary. In analyzing traffic it appears that the restored node does not
become primary again until the whole instance restarts. I myself would
welcome clarification on this observed behavior.



*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Fri, Dec 12, 2014 at 7:56 AM, Rich Haase <rh...@pandora.com> wrote:

>   The remaining cluster services will continue to run.  That way when the
> namenode (or other failed processes) is restored the cluster will resume
> healthy operation.  This is part of hadoop’s ability to handle network
> partition events.
>
>  *Rich Haase* | Sr. Software Engineer | Pandora
> m 303.887.1146 | rhaase@pandora.com
>
>   From: Chandrashekhar Kotekar <sh...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Date: Friday, December 12, 2014 at 3:57 AM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: What happens to data nodes when name node has failed for long
> time?
>
>   Hi,
>
>  What happens if name node has crashed for more than one hour but
> secondary name node, all the data nodes, job tracker, task trackers are
> running fine? Do those daemon services also automatically shutdown after
> some time? Or those services keep running hoping for namenode to come back?
>
> Regards,
> Chandrash3khar Kotekar
> Mobile - +91 8600011455
>

Re: What happens to data nodes when name node has failed for long time?

Posted by Rich Haase <rh...@pandora.com>.
The remaining cluster services will continue to run.  That way when the namenode (or other failed processes) is restored the cluster will resume healthy operation.  This is part of hadoop’s ability to handle network partition events.

Rich Haase | Sr. Software Engineer | Pandora
m 303.887.1146 | rhaase@pandora.com

From: Chandrashekhar Kotekar <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, December 12, 2014 at 3:57 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: What happens to data nodes when name node has failed for long time?

Hi,

What happens if name node has crashed for more than one hour but secondary name node, all the data nodes, job tracker, task trackers are running fine? Do those daemon services also automatically shutdown after some time? Or those services keep running hoping for namenode to come back?

Regards,
Chandrash3khar Kotekar
Mobile - +91 8600011455

Re: What happens to data nodes when name node has failed for long time?

Posted by Rich Haase <rh...@pandora.com>.
The remaining cluster services will continue to run.  That way when the namenode (or other failed processes) is restored the cluster will resume healthy operation.  This is part of hadoop’s ability to handle network partition events.

Rich Haase | Sr. Software Engineer | Pandora
m 303.887.1146 | rhaase@pandora.com

From: Chandrashekhar Kotekar <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, December 12, 2014 at 3:57 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: What happens to data nodes when name node has failed for long time?

Hi,

What happens if name node has crashed for more than one hour but secondary name node, all the data nodes, job tracker, task trackers are running fine? Do those daemon services also automatically shutdown after some time? Or those services keep running hoping for namenode to come back?

Regards,
Chandrash3khar Kotekar
Mobile - +91 8600011455

Re: What happens to data nodes when name node has failed for long time?

Posted by Rich Haase <rh...@pandora.com>.
The remaining cluster services will continue to run.  That way when the namenode (or other failed processes) is restored the cluster will resume healthy operation.  This is part of hadoop’s ability to handle network partition events.

Rich Haase | Sr. Software Engineer | Pandora
m 303.887.1146 | rhaase@pandora.com

From: Chandrashekhar Kotekar <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, December 12, 2014 at 3:57 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: What happens to data nodes when name node has failed for long time?

Hi,

What happens if name node has crashed for more than one hour but secondary name node, all the data nodes, job tracker, task trackers are running fine? Do those daemon services also automatically shutdown after some time? Or those services keep running hoping for namenode to come back?

Regards,
Chandrash3khar Kotekar
Mobile - +91 8600011455

Re: What happens to data nodes when name node has failed for long time?

Posted by Rich Haase <rh...@pandora.com>.
The remaining cluster services will continue to run.  That way when the namenode (or other failed processes) is restored the cluster will resume healthy operation.  This is part of hadoop’s ability to handle network partition events.

Rich Haase | Sr. Software Engineer | Pandora
m 303.887.1146 | rhaase@pandora.com

From: Chandrashekhar Kotekar <sh...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, December 12, 2014 at 3:57 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: What happens to data nodes when name node has failed for long time?

Hi,

What happens if name node has crashed for more than one hour but secondary name node, all the data nodes, job tracker, task trackers are running fine? Do those daemon services also automatically shutdown after some time? Or those services keep running hoping for namenode to come back?

Regards,
Chandrash3khar Kotekar
Mobile - +91 8600011455