You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Aji Janis <aj...@gmail.com> on 2012/08/10 20:38:03 UTC

Hadoop hardware failure recovery

I am very new to Hadoop. I am considering setting up a Hadoop cluster
consisting of 5 nodes where each node has 3 internal hard drives. I
understand HDFS has a configurable redundancy feature but what happens if
an entire drive crashes (physically) for whatever reason? How does Hadoop
recover, if it can, from this situation? What else should I know before
setting up my cluster this way? Thanks in advance.

Re: Hadoop hardware failure recovery

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Aji,

   Hadoop's redundancy feature allows data to be replicated over the
entire cluster. So, even if entire disk is gone or even the entire
machine for that matter, your data is still there in other node(s).
But, we need to keep one thing in mind that the 'master' node is the
single point of failure in a Hadoop cluster. If the machine running
master process(es) is down, you are trapped. For more detail you can
visit the home page at : redundancy feature

Regards,
    Mohammad Tariq

On Sat, Aug 11, 2012 at 12:08 AM, Aji Janis <aj...@gmail.com> wrote:
> I am very new to Hadoop. I am considering setting up a Hadoop cluster
> consisting of 5 nodes where each node has 3 internal hard drives. I
> understand HDFS has a configurable redundancy feature but what happens if an
> entire drive crashes (physically) for whatever reason? How does Hadoop
> recover, if it can, from this situation? What else should I know before
> setting up my cluster this way? Thanks in advance.
>
>

Re: Hadoop hardware failure recovery

Posted by Harsh J <ha...@cloudera.com>.

Mohammad,

The HA feature is very much functional, if you've tried it. I would
not say it is "not production ready", for I see it in use at several
places already, including my humbly small MBP (two local NNs, just for
the fun of it) :)

On Sat, Aug 11, 2012 at 12:46 AM, Mohammad Tariq <do...@gmail.com> wrote:
> Very correctly said by Anil. Actually Hadoop HA is not yet production
> ready and you are about to begin your Hadoop journey, so just thought
> of not mentioning it. If you want to use HA, just pull it from the
> trunk and do a build.
>
> Regards,
>     Mohammad Tariq
>
>
> On Sat, Aug 11, 2012 at 12:42 AM, anil gupta <an...@gmail.com> wrote:
>> Hi Aji,
>>
>> Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha then
>> Namenode is not a single point of failure.However, Hadoop 2.0.0 is not of
>> production quality yet(its in Alpha).
>> Namenode use to be a Single Point of Failure in releases prior to Hadoop
>> 2.0.0.
>>
>> HTH,
>> Anil Gupta
>>
>>
>> On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com> wrote:
>>>
>>> Hadoop's file system was (mostly) copied from the concepts of Google's old
>>> file system.
>>>
>>> The original paper is probably the best way to learn about that.
>>>
>>> http://research.google.com/archive/gfs.html
>>>
>>>
>>>
>>> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
>>>>
>>>> I am very new to Hadoop. I am considering setting up a Hadoop cluster
>>>> consisting of 5 nodes where each node has 3 internal hard drives. I
>>>> understand HDFS has a configurable redundancy feature but what happens if an
>>>> entire drive crashes (physically) for whatever reason? How does Hadoop
>>>> recover, if it can, from this situation? What else should I know before
>>>> setting up my cluster this way? Thanks in advance.
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Thanks & Regards,
>> Anil Gupta



-- 
Harsh J

Re: Hadoop hardware failure recovery

Posted by Harsh J <ha...@cloudera.com>.

Mohammad,

The HA feature is very much functional, if you've tried it. I would
not say it is "not production ready", for I see it in use at several
places already, including my humbly small MBP (two local NNs, just for
the fun of it) :)

On Sat, Aug 11, 2012 at 12:46 AM, Mohammad Tariq <do...@gmail.com> wrote:
> Very correctly said by Anil. Actually Hadoop HA is not yet production
> ready and you are about to begin your Hadoop journey, so just thought
> of not mentioning it. If you want to use HA, just pull it from the
> trunk and do a build.
>
> Regards,
>     Mohammad Tariq
>
>
> On Sat, Aug 11, 2012 at 12:42 AM, anil gupta <an...@gmail.com> wrote:
>> Hi Aji,
>>
>> Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha then
>> Namenode is not a single point of failure.However, Hadoop 2.0.0 is not of
>> production quality yet(its in Alpha).
>> Namenode use to be a Single Point of Failure in releases prior to Hadoop
>> 2.0.0.
>>
>> HTH,
>> Anil Gupta
>>
>>
>> On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com> wrote:
>>>
>>> Hadoop's file system was (mostly) copied from the concepts of Google's old
>>> file system.
>>>
>>> The original paper is probably the best way to learn about that.
>>>
>>> http://research.google.com/archive/gfs.html
>>>
>>>
>>>
>>> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
>>>>
>>>> I am very new to Hadoop. I am considering setting up a Hadoop cluster
>>>> consisting of 5 nodes where each node has 3 internal hard drives. I
>>>> understand HDFS has a configurable redundancy feature but what happens if an
>>>> entire drive crashes (physically) for whatever reason? How does Hadoop
>>>> recover, if it can, from this situation? What else should I know before
>>>> setting up my cluster this way? Thanks in advance.
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Thanks & Regards,
>> Anil Gupta



-- 
Harsh J

Re: Hadoop hardware failure recovery

Posted by Harsh J <ha...@cloudera.com>.

Mohammad,

The HA feature is very much functional, if you've tried it. I would
not say it is "not production ready", for I see it in use at several
places already, including my humbly small MBP (two local NNs, just for
the fun of it) :)

On Sat, Aug 11, 2012 at 12:46 AM, Mohammad Tariq <do...@gmail.com> wrote:
> Very correctly said by Anil. Actually Hadoop HA is not yet production
> ready and you are about to begin your Hadoop journey, so just thought
> of not mentioning it. If you want to use HA, just pull it from the
> trunk and do a build.
>
> Regards,
>     Mohammad Tariq
>
>
> On Sat, Aug 11, 2012 at 12:42 AM, anil gupta <an...@gmail.com> wrote:
>> Hi Aji,
>>
>> Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha then
>> Namenode is not a single point of failure.However, Hadoop 2.0.0 is not of
>> production quality yet(its in Alpha).
>> Namenode use to be a Single Point of Failure in releases prior to Hadoop
>> 2.0.0.
>>
>> HTH,
>> Anil Gupta
>>
>>
>> On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com> wrote:
>>>
>>> Hadoop's file system was (mostly) copied from the concepts of Google's old
>>> file system.
>>>
>>> The original paper is probably the best way to learn about that.
>>>
>>> http://research.google.com/archive/gfs.html
>>>
>>>
>>>
>>> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
>>>>
>>>> I am very new to Hadoop. I am considering setting up a Hadoop cluster
>>>> consisting of 5 nodes where each node has 3 internal hard drives. I
>>>> understand HDFS has a configurable redundancy feature but what happens if an
>>>> entire drive crashes (physically) for whatever reason? How does Hadoop
>>>> recover, if it can, from this situation? What else should I know before
>>>> setting up my cluster this way? Thanks in advance.
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Thanks & Regards,
>> Anil Gupta



-- 
Harsh J

Re: Hadoop hardware failure recovery

Posted by Harsh J <ha...@cloudera.com>.

Mohammad,

The HA feature is very much functional, if you've tried it. I would
not say it is "not production ready", for I see it in use at several
places already, including my humbly small MBP (two local NNs, just for
the fun of it) :)

On Sat, Aug 11, 2012 at 12:46 AM, Mohammad Tariq <do...@gmail.com> wrote:
> Very correctly said by Anil. Actually Hadoop HA is not yet production
> ready and you are about to begin your Hadoop journey, so just thought
> of not mentioning it. If you want to use HA, just pull it from the
> trunk and do a build.
>
> Regards,
>     Mohammad Tariq
>
>
> On Sat, Aug 11, 2012 at 12:42 AM, anil gupta <an...@gmail.com> wrote:
>> Hi Aji,
>>
>> Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha then
>> Namenode is not a single point of failure.However, Hadoop 2.0.0 is not of
>> production quality yet(its in Alpha).
>> Namenode use to be a Single Point of Failure in releases prior to Hadoop
>> 2.0.0.
>>
>> HTH,
>> Anil Gupta
>>
>>
>> On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com> wrote:
>>>
>>> Hadoop's file system was (mostly) copied from the concepts of Google's old
>>> file system.
>>>
>>> The original paper is probably the best way to learn about that.
>>>
>>> http://research.google.com/archive/gfs.html
>>>
>>>
>>>
>>> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
>>>>
>>>> I am very new to Hadoop. I am considering setting up a Hadoop cluster
>>>> consisting of 5 nodes where each node has 3 internal hard drives. I
>>>> understand HDFS has a configurable redundancy feature but what happens if an
>>>> entire drive crashes (physically) for whatever reason? How does Hadoop
>>>> recover, if it can, from this situation? What else should I know before
>>>> setting up my cluster this way? Thanks in advance.
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Thanks & Regards,
>> Anil Gupta



-- 
Harsh J

Re: Hadoop hardware failure recovery

Posted by Mohammad Tariq <do...@gmail.com>.

Very correctly said by Anil. Actually Hadoop HA is not yet production
ready and you are about to begin your Hadoop journey, so just thought
of not mentioning it. If you want to use HA, just pull it from the
trunk and do a build.

Regards,
    Mohammad Tariq


On Sat, Aug 11, 2012 at 12:42 AM, anil gupta <an...@gmail.com> wrote:
> Hi Aji,
>
> Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha then
> Namenode is not a single point of failure.However, Hadoop 2.0.0 is not of
> production quality yet(its in Alpha).
> Namenode use to be a Single Point of Failure in releases prior to Hadoop
> 2.0.0.
>
> HTH,
> Anil Gupta
>
>
> On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com> wrote:
>>
>> Hadoop's file system was (mostly) copied from the concepts of Google's old
>> file system.
>>
>> The original paper is probably the best way to learn about that.
>>
>> http://research.google.com/archive/gfs.html
>>
>>
>>
>> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
>>>
>>> I am very new to Hadoop. I am considering setting up a Hadoop cluster
>>> consisting of 5 nodes where each node has 3 internal hard drives. I
>>> understand HDFS has a configurable redundancy feature but what happens if an
>>> entire drive crashes (physically) for whatever reason? How does Hadoop
>>> recover, if it can, from this situation? What else should I know before
>>> setting up my cluster this way? Thanks in advance.
>>>
>>>
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta

RE: Hadoop hardware failure recovery

Posted by Jeffrey Buell <jb...@vmware.com>.

This is never an issue on vSphere.  The ESXi hypervisor does not send the completion interrupt back to the guest until the IO is finished, so if the guest OS thinks an IO is flushed to disk, it really is flushed to disk. hsync() will work in a ESXi VM exactly like in a native OS.

The physical storage layer might lie about completion (e.g., most SANs with redundant battery-backed caches), but this applies equally to native and virtualized OSes.

It is always tempting to implement some kind of write caching in the virtualization layer to try to improve storage performance, but of course this comes at the cost of safety and predictability.

Jeff

From: Steve Loughran [mailto:stevel@hortonworks.com]
Sent: Monday, August 13, 2012 8:08 AM
To: user@hadoop.apache.org
Subject: Re: Hadoop hardware failure recovery


On 13 August 2012 07:55, Harsh J <ha...@cloudera.com>> wrote:

Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a
working hsync() API that allows you to write files with guarantee that
it has been written to the disk (like the fsync() *nix call).

A guarantee that the OS thinks it's been written to HDD.

For anyone using Hadoop or any other program (e.g MySQL) in a virtualized environment , even when the OS thinks it has flushed a virtual disk -know that you may have set some VM params to say "when we said "flush to disk" we meant it":
https://forums.virtualbox.org/viewtopic.php?t=13661

RE: Hadoop hardware failure recovery

Posted by Jeffrey Buell <jb...@vmware.com>.

This is never an issue on vSphere.  The ESXi hypervisor does not send the completion interrupt back to the guest until the IO is finished, so if the guest OS thinks an IO is flushed to disk, it really is flushed to disk. hsync() will work in a ESXi VM exactly like in a native OS.

The physical storage layer might lie about completion (e.g., most SANs with redundant battery-backed caches), but this applies equally to native and virtualized OSes.

It is always tempting to implement some kind of write caching in the virtualization layer to try to improve storage performance, but of course this comes at the cost of safety and predictability.

Jeff

From: Steve Loughran [mailto:stevel@hortonworks.com]
Sent: Monday, August 13, 2012 8:08 AM
To: user@hadoop.apache.org
Subject: Re: Hadoop hardware failure recovery


On 13 August 2012 07:55, Harsh J <ha...@cloudera.com>> wrote:

Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a
working hsync() API that allows you to write files with guarantee that
it has been written to the disk (like the fsync() *nix call).

A guarantee that the OS thinks it's been written to HDD.

For anyone using Hadoop or any other program (e.g MySQL) in a virtualized environment , even when the OS thinks it has flushed a virtual disk -know that you may have set some VM params to say "when we said "flush to disk" we meant it":
https://forums.virtualbox.org/viewtopic.php?t=13661

Re: Hadoop hardware failure recovery

Posted by Steve Loughran <st...@hortonworks.com>.

On 13 August 2012 08:42, Harsh J <ha...@cloudera.com> wrote:

> Hey Steve,
>
> Interesting, thanks for pointing that out! I didn't know that it
> disables this by default :)
>
>
It's always something to watch out for: someone implementing a disk FS, OS,
VM environment discovering that they get great benchmark numbers if they
make flushing async, and thinking "most people don't need it anyway".
That's mostly true -and some programs over-flush-, but if you do want to be
sure your data is saved to disk, these people are being dangerous rather
than helpful

I don't think it's an issue if you are saving to network mounted storage
-which can include the storage of the host OS. If you do some experiments,
NFS chat to the host system is usually as fast as working with a virtual
HDD in the same host OS -which has extra layers of indirection. Virtual
HDDs can get fragmented even when the VM thinks it's just allocated big
linear blocks -you need to defrag the virtual HDD then the physical disk
image to correct that.

Re: Hadoop hardware failure recovery

Posted by Steve Loughran <st...@hortonworks.com>.

On 13 August 2012 08:42, Harsh J <ha...@cloudera.com> wrote:

> Hey Steve,
>
> Interesting, thanks for pointing that out! I didn't know that it
> disables this by default :)
>
>
It's always something to watch out for: someone implementing a disk FS, OS,
VM environment discovering that they get great benchmark numbers if they
make flushing async, and thinking "most people don't need it anyway".
That's mostly true -and some programs over-flush-, but if you do want to be
sure your data is saved to disk, these people are being dangerous rather
than helpful

I don't think it's an issue if you are saving to network mounted storage
-which can include the storage of the host OS. If you do some experiments,
NFS chat to the host system is usually as fast as working with a virtual
HDD in the same host OS -which has extra layers of indirection. Virtual
HDDs can get fragmented even when the VM thinks it's just allocated big
linear blocks -you need to defrag the virtual HDD then the physical disk
image to correct that.

Re: Hadoop hardware failure recovery

Posted by Steve Loughran <st...@hortonworks.com>.

On 13 August 2012 08:42, Harsh J <ha...@cloudera.com> wrote:

> Hey Steve,
>
> Interesting, thanks for pointing that out! I didn't know that it
> disables this by default :)
>
>
It's always something to watch out for: someone implementing a disk FS, OS,
VM environment discovering that they get great benchmark numbers if they
make flushing async, and thinking "most people don't need it anyway".
That's mostly true -and some programs over-flush-, but if you do want to be
sure your data is saved to disk, these people are being dangerous rather
than helpful

I don't think it's an issue if you are saving to network mounted storage
-which can include the storage of the host OS. If you do some experiments,
NFS chat to the host system is usually as fast as working with a virtual
HDD in the same host OS -which has extra layers of indirection. Virtual
HDDs can get fragmented even when the VM thinks it's just allocated big
linear blocks -you need to defrag the virtual HDD then the physical disk
image to correct that.

Re: Hadoop hardware failure recovery

Posted by Steve Loughran <st...@hortonworks.com>.

On 13 August 2012 08:42, Harsh J <ha...@cloudera.com> wrote:

> Hey Steve,
>
> Interesting, thanks for pointing that out! I didn't know that it
> disables this by default :)
>
>
It's always something to watch out for: someone implementing a disk FS, OS,
VM environment discovering that they get great benchmark numbers if they
make flushing async, and thinking "most people don't need it anyway".
That's mostly true -and some programs over-flush-, but if you do want to be
sure your data is saved to disk, these people are being dangerous rather
than helpful

I don't think it's an issue if you are saving to network mounted storage
-which can include the storage of the host OS. If you do some experiments,
NFS chat to the host system is usually as fast as working with a virtual
HDD in the same host OS -which has extra layers of indirection. Virtual
HDDs can get fragmented even when the VM thinks it's just allocated big
linear blocks -you need to defrag the virtual HDD then the physical disk
image to correct that.

Re: Hadoop hardware failure recovery

Posted by Harsh J <ha...@cloudera.com>.

Hey Steve,

Interesting, thanks for pointing that out! I didn't know that it
disables this by default :)

On Mon, Aug 13, 2012 at 8:38 PM, Steve Loughran <st...@hortonworks.com> wrote:
>
>
> On 13 August 2012 07:55, Harsh J <ha...@cloudera.com> wrote:
>>
>>
>>
>> Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a
>> working hsync() API that allows you to write files with guarantee that
>> it has been written to the disk (like the fsync() *nix call).
>
>
> A guarantee that the OS thinks it's been written to HDD.
>
> For anyone using Hadoop or any other program (e.g MySQL) in a virtualized
> environment , even when the OS thinks it has flushed a virtual disk -know
> that you may have set some VM params to say "when we said "flush to disk" we
> meant it":
> https://forums.virtualbox.org/viewtopic.php?t=13661
>
>
>



-- 
Harsh J

Re: Hadoop hardware failure recovery

Posted by Harsh J <ha...@cloudera.com>.

Hey Steve,

Interesting, thanks for pointing that out! I didn't know that it
disables this by default :)

On Mon, Aug 13, 2012 at 8:38 PM, Steve Loughran <st...@hortonworks.com> wrote:
>
>
> On 13 August 2012 07:55, Harsh J <ha...@cloudera.com> wrote:
>>
>>
>>
>> Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a
>> working hsync() API that allows you to write files with guarantee that
>> it has been written to the disk (like the fsync() *nix call).
>
>
> A guarantee that the OS thinks it's been written to HDD.
>
> For anyone using Hadoop or any other program (e.g MySQL) in a virtualized
> environment , even when the OS thinks it has flushed a virtual disk -know
> that you may have set some VM params to say "when we said "flush to disk" we
> meant it":
> https://forums.virtualbox.org/viewtopic.php?t=13661
>
>
>



-- 
Harsh J

Re: Hadoop hardware failure recovery

Posted by Harsh J <ha...@cloudera.com>.

Hey Steve,

Interesting, thanks for pointing that out! I didn't know that it
disables this by default :)

On Mon, Aug 13, 2012 at 8:38 PM, Steve Loughran <st...@hortonworks.com> wrote:
>
>
> On 13 August 2012 07:55, Harsh J <ha...@cloudera.com> wrote:
>>
>>
>>
>> Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a
>> working hsync() API that allows you to write files with guarantee that
>> it has been written to the disk (like the fsync() *nix call).
>
>
> A guarantee that the OS thinks it's been written to HDD.
>
> For anyone using Hadoop or any other program (e.g MySQL) in a virtualized
> environment , even when the OS thinks it has flushed a virtual disk -know
> that you may have set some VM params to say "when we said "flush to disk" we
> meant it":
> https://forums.virtualbox.org/viewtopic.php?t=13661
>
>
>



-- 
Harsh J

RE: Hadoop hardware failure recovery

Posted by Jeffrey Buell <jb...@vmware.com>.

This is never an issue on vSphere.  The ESXi hypervisor does not send the completion interrupt back to the guest until the IO is finished, so if the guest OS thinks an IO is flushed to disk, it really is flushed to disk. hsync() will work in a ESXi VM exactly like in a native OS.

The physical storage layer might lie about completion (e.g., most SANs with redundant battery-backed caches), but this applies equally to native and virtualized OSes.

It is always tempting to implement some kind of write caching in the virtualization layer to try to improve storage performance, but of course this comes at the cost of safety and predictability.

Jeff

From: Steve Loughran [mailto:stevel@hortonworks.com]
Sent: Monday, August 13, 2012 8:08 AM
To: user@hadoop.apache.org
Subject: Re: Hadoop hardware failure recovery


On 13 August 2012 07:55, Harsh J <ha...@cloudera.com>> wrote:

Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a
working hsync() API that allows you to write files with guarantee that
it has been written to the disk (like the fsync() *nix call).

A guarantee that the OS thinks it's been written to HDD.

For anyone using Hadoop or any other program (e.g MySQL) in a virtualized environment , even when the OS thinks it has flushed a virtual disk -know that you may have set some VM params to say "when we said "flush to disk" we meant it":
https://forums.virtualbox.org/viewtopic.php?t=13661

Re: Hadoop hardware failure recovery

Posted by Harsh J <ha...@cloudera.com>.

Hey Steve,

Interesting, thanks for pointing that out! I didn't know that it
disables this by default :)

On Mon, Aug 13, 2012 at 8:38 PM, Steve Loughran <st...@hortonworks.com> wrote:
>
>
> On 13 August 2012 07:55, Harsh J <ha...@cloudera.com> wrote:
>>
>>
>>
>> Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a
>> working hsync() API that allows you to write files with guarantee that
>> it has been written to the disk (like the fsync() *nix call).
>
>
> A guarantee that the OS thinks it's been written to HDD.
>
> For anyone using Hadoop or any other program (e.g MySQL) in a virtualized
> environment , even when the OS thinks it has flushed a virtual disk -know
> that you may have set some VM params to say "when we said "flush to disk" we
> meant it":
> https://forums.virtualbox.org/viewtopic.php?t=13661
>
>
>



-- 
Harsh J

RE: Hadoop hardware failure recovery

Posted by Jeffrey Buell <jb...@vmware.com>.

This is never an issue on vSphere.  The ESXi hypervisor does not send the completion interrupt back to the guest until the IO is finished, so if the guest OS thinks an IO is flushed to disk, it really is flushed to disk. hsync() will work in a ESXi VM exactly like in a native OS.

The physical storage layer might lie about completion (e.g., most SANs with redundant battery-backed caches), but this applies equally to native and virtualized OSes.

It is always tempting to implement some kind of write caching in the virtualization layer to try to improve storage performance, but of course this comes at the cost of safety and predictability.

Jeff

From: Steve Loughran [mailto:stevel@hortonworks.com]
Sent: Monday, August 13, 2012 8:08 AM
To: user@hadoop.apache.org
Subject: Re: Hadoop hardware failure recovery


On 13 August 2012 07:55, Harsh J <ha...@cloudera.com>> wrote:

Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a
working hsync() API that allows you to write files with guarantee that
it has been written to the disk (like the fsync() *nix call).

A guarantee that the OS thinks it's been written to HDD.

For anyone using Hadoop or any other program (e.g MySQL) in a virtualized environment , even when the OS thinks it has flushed a virtual disk -know that you may have set some VM params to say "when we said "flush to disk" we meant it":
https://forums.virtualbox.org/viewtopic.php?t=13661

Re: Hadoop hardware failure recovery

Posted by Steve Loughran <st...@hortonworks.com>.

On 13 August 2012 07:55, Harsh J <ha...@cloudera.com> wrote:

>

Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a
> working hsync() API that allows you to write files with guarantee that
> it has been written to the disk (like the fsync() *nix call).


A guarantee that the OS thinks it's been written to HDD.

For anyone using Hadoop or any other program (e.g MySQL) in a virtualized
environment , even when the OS thinks it has flushed a virtual disk -know
that you may have set some VM params to say "when we said "flush to disk"
we meant it":
https://forums.virtualbox.org/viewtopic.php?t=13661

Re: Hadoop hardware failure recovery

Posted by Steve Loughran <st...@hortonworks.com>.

On 13 August 2012 07:55, Harsh J <ha...@cloudera.com> wrote:

>

Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a
> working hsync() API that allows you to write files with guarantee that
> it has been written to the disk (like the fsync() *nix call).


A guarantee that the OS thinks it's been written to HDD.

For anyone using Hadoop or any other program (e.g MySQL) in a virtualized
environment , even when the OS thinks it has flushed a virtual disk -know
that you may have set some VM params to say "when we said "flush to disk"
we meant it":
https://forums.virtualbox.org/viewtopic.php?t=13661

Re: Hadoop hardware failure recovery

Posted by Steve Loughran <st...@hortonworks.com>.

On 13 August 2012 07:55, Harsh J <ha...@cloudera.com> wrote:

>

Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a
> working hsync() API that allows you to write files with guarantee that
> it has been written to the disk (like the fsync() *nix call).


A guarantee that the OS thinks it's been written to HDD.

For anyone using Hadoop or any other program (e.g MySQL) in a virtualized
environment , even when the OS thinks it has flushed a virtual disk -know
that you may have set some VM params to say "when we said "flush to disk"
we meant it":
https://forums.virtualbox.org/viewtopic.php?t=13661

Re: Hadoop hardware failure recovery

Posted by Steve Loughran <st...@hortonworks.com>.

On 13 August 2012 07:55, Harsh J <ha...@cloudera.com> wrote:

>

Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a
> working hsync() API that allows you to write files with guarantee that
> it has been written to the disk (like the fsync() *nix call).


A guarantee that the OS thinks it's been written to HDD.

For anyone using Hadoop or any other program (e.g MySQL) in a virtualized
environment , even when the OS thinks it has flushed a virtual disk -know
that you may have set some VM params to say "when we said "flush to disk"
we meant it":
https://forums.virtualbox.org/viewtopic.php?t=13661

Re: Hadoop hardware failure recovery

Posted by Harsh J <ha...@cloudera.com>.

Aji,

The best place would be to ask on Apache Accumulo's own user lists,
subscrib-able at http://accumulo.apache.org/mailing_list.html

That said, if Accumulo bases itself on HDFS, then its data safety
should be the same or nearly the same as what HDFS itself can offer.

Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a
working hsync() API that allows you to write files with guarantee that
it has been written to the disk (like the fsync() *nix call). You can
read some more about this at an earlier thread:
http://search-hadoop.com/m/ATVOETSy4X1

HTH, and do let us know what you find on the Accumulo side.

On Mon, Aug 13, 2012 at 7:27 PM, Aji Janis <aj...@gmail.com> wrote:
> Thank you everyone for all the feedback and suggestions. Its good to know
> these details as I move forward.
>
> Piling on to the question, I am curious if any of you have experience with
> Accumulo (a requirement for me hence not optional). I was wondering if the
> data loss (physical crash of the hard drive) in this case would be resolved
> by Hadoop (HDFS I should say). Any suggestions and/or where I could find
> some specs on this would be really appreciated!
>
>
> Thank you again for all the pointers.
> -Aji
>
>
>
>
>
>
>
>
> On Sun, Aug 12, 2012 at 3:07 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
>>
>> Yep, hadoop-2 is alpha but is progressing nicely...
>>
>> However, if you have access to some 'enterprise HA' utilities (VMWare or
>> Linux HA) you can get *very decent* production-grade high-availability in
>> hadoop-1.x too (both NameNode for HDFS and JobTracker for MapReduce).
>>
>> Arun
>>
>> On Aug 10, 2012, at 12:12 PM, anil gupta wrote:
>>
>> Hi Aji,
>>
>> Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha
>> then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not
>> of production quality yet(its in Alpha).
>> Namenode use to be a Single Point of Failure in releases prior to Hadoop
>> 2.0.0.
>>
>> HTH,
>> Anil Gupta
>>
>> On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com>
>> wrote:
>>>
>>> Hadoop's file system was (mostly) copied from the concepts of Google's
>>> old file system.
>>>
>>> The original paper is probably the best way to learn about that.
>>>
>>> http://research.google.com/archive/gfs.html
>>>
>>>
>>>
>>> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
>>>>
>>>> I am very new to Hadoop. I am considering setting up a Hadoop cluster
>>>> consisting of 5 nodes where each node has 3 internal hard drives. I
>>>> understand HDFS has a configurable redundancy feature but what happens if an
>>>> entire drive crashes (physically) for whatever reason? How does Hadoop
>>>> recover, if it can, from this situation? What else should I know before
>>>> setting up my cluster this way? Thanks in advance.
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Thanks & Regards,
>> Anil Gupta
>>
>>
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>



-- 
Harsh J

Re: Hadoop hardware failure recovery

Posted by Harsh J <ha...@cloudera.com>.

Aji,

The best place would be to ask on Apache Accumulo's own user lists,
subscrib-able at http://accumulo.apache.org/mailing_list.html

That said, if Accumulo bases itself on HDFS, then its data safety
should be the same or nearly the same as what HDFS itself can offer.

Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a
working hsync() API that allows you to write files with guarantee that
it has been written to the disk (like the fsync() *nix call). You can
read some more about this at an earlier thread:
http://search-hadoop.com/m/ATVOETSy4X1

HTH, and do let us know what you find on the Accumulo side.

On Mon, Aug 13, 2012 at 7:27 PM, Aji Janis <aj...@gmail.com> wrote:
> Thank you everyone for all the feedback and suggestions. Its good to know
> these details as I move forward.
>
> Piling on to the question, I am curious if any of you have experience with
> Accumulo (a requirement for me hence not optional). I was wondering if the
> data loss (physical crash of the hard drive) in this case would be resolved
> by Hadoop (HDFS I should say). Any suggestions and/or where I could find
> some specs on this would be really appreciated!
>
>
> Thank you again for all the pointers.
> -Aji
>
>
>
>
>
>
>
>
> On Sun, Aug 12, 2012 at 3:07 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
>>
>> Yep, hadoop-2 is alpha but is progressing nicely...
>>
>> However, if you have access to some 'enterprise HA' utilities (VMWare or
>> Linux HA) you can get *very decent* production-grade high-availability in
>> hadoop-1.x too (both NameNode for HDFS and JobTracker for MapReduce).
>>
>> Arun
>>
>> On Aug 10, 2012, at 12:12 PM, anil gupta wrote:
>>
>> Hi Aji,
>>
>> Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha
>> then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not
>> of production quality yet(its in Alpha).
>> Namenode use to be a Single Point of Failure in releases prior to Hadoop
>> 2.0.0.
>>
>> HTH,
>> Anil Gupta
>>
>> On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com>
>> wrote:
>>>
>>> Hadoop's file system was (mostly) copied from the concepts of Google's
>>> old file system.
>>>
>>> The original paper is probably the best way to learn about that.
>>>
>>> http://research.google.com/archive/gfs.html
>>>
>>>
>>>
>>> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
>>>>
>>>> I am very new to Hadoop. I am considering setting up a Hadoop cluster
>>>> consisting of 5 nodes where each node has 3 internal hard drives. I
>>>> understand HDFS has a configurable redundancy feature but what happens if an
>>>> entire drive crashes (physically) for whatever reason? How does Hadoop
>>>> recover, if it can, from this situation? What else should I know before
>>>> setting up my cluster this way? Thanks in advance.
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Thanks & Regards,
>> Anil Gupta
>>
>>
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>



-- 
Harsh J

Re: Hadoop hardware failure recovery

Posted by Harsh J <ha...@cloudera.com>.

Aji,

The best place would be to ask on Apache Accumulo's own user lists,
subscrib-able at http://accumulo.apache.org/mailing_list.html

That said, if Accumulo bases itself on HDFS, then its data safety
should be the same or nearly the same as what HDFS itself can offer.

Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a
working hsync() API that allows you to write files with guarantee that
it has been written to the disk (like the fsync() *nix call). You can
read some more about this at an earlier thread:
http://search-hadoop.com/m/ATVOETSy4X1

HTH, and do let us know what you find on the Accumulo side.

On Mon, Aug 13, 2012 at 7:27 PM, Aji Janis <aj...@gmail.com> wrote:
> Thank you everyone for all the feedback and suggestions. Its good to know
> these details as I move forward.
>
> Piling on to the question, I am curious if any of you have experience with
> Accumulo (a requirement for me hence not optional). I was wondering if the
> data loss (physical crash of the hard drive) in this case would be resolved
> by Hadoop (HDFS I should say). Any suggestions and/or where I could find
> some specs on this would be really appreciated!
>
>
> Thank you again for all the pointers.
> -Aji
>
>
>
>
>
>
>
>
> On Sun, Aug 12, 2012 at 3:07 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
>>
>> Yep, hadoop-2 is alpha but is progressing nicely...
>>
>> However, if you have access to some 'enterprise HA' utilities (VMWare or
>> Linux HA) you can get *very decent* production-grade high-availability in
>> hadoop-1.x too (both NameNode for HDFS and JobTracker for MapReduce).
>>
>> Arun
>>
>> On Aug 10, 2012, at 12:12 PM, anil gupta wrote:
>>
>> Hi Aji,
>>
>> Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha
>> then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not
>> of production quality yet(its in Alpha).
>> Namenode use to be a Single Point of Failure in releases prior to Hadoop
>> 2.0.0.
>>
>> HTH,
>> Anil Gupta
>>
>> On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com>
>> wrote:
>>>
>>> Hadoop's file system was (mostly) copied from the concepts of Google's
>>> old file system.
>>>
>>> The original paper is probably the best way to learn about that.
>>>
>>> http://research.google.com/archive/gfs.html
>>>
>>>
>>>
>>> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
>>>>
>>>> I am very new to Hadoop. I am considering setting up a Hadoop cluster
>>>> consisting of 5 nodes where each node has 3 internal hard drives. I
>>>> understand HDFS has a configurable redundancy feature but what happens if an
>>>> entire drive crashes (physically) for whatever reason? How does Hadoop
>>>> recover, if it can, from this situation? What else should I know before
>>>> setting up my cluster this way? Thanks in advance.
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Thanks & Regards,
>> Anil Gupta
>>
>>
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>



-- 
Harsh J

Re: Hadoop hardware failure recovery

Posted by Harsh J <ha...@cloudera.com>.

Aji,

The best place would be to ask on Apache Accumulo's own user lists,
subscrib-able at http://accumulo.apache.org/mailing_list.html

That said, if Accumulo bases itself on HDFS, then its data safety
should be the same or nearly the same as what HDFS itself can offer.

Note that with 2.1.0 (upcoming) and above releases of HDFS, we offer a
working hsync() API that allows you to write files with guarantee that
it has been written to the disk (like the fsync() *nix call). You can
read some more about this at an earlier thread:
http://search-hadoop.com/m/ATVOETSy4X1

HTH, and do let us know what you find on the Accumulo side.

On Mon, Aug 13, 2012 at 7:27 PM, Aji Janis <aj...@gmail.com> wrote:
> Thank you everyone for all the feedback and suggestions. Its good to know
> these details as I move forward.
>
> Piling on to the question, I am curious if any of you have experience with
> Accumulo (a requirement for me hence not optional). I was wondering if the
> data loss (physical crash of the hard drive) in this case would be resolved
> by Hadoop (HDFS I should say). Any suggestions and/or where I could find
> some specs on this would be really appreciated!
>
>
> Thank you again for all the pointers.
> -Aji
>
>
>
>
>
>
>
>
> On Sun, Aug 12, 2012 at 3:07 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
>>
>> Yep, hadoop-2 is alpha but is progressing nicely...
>>
>> However, if you have access to some 'enterprise HA' utilities (VMWare or
>> Linux HA) you can get *very decent* production-grade high-availability in
>> hadoop-1.x too (both NameNode for HDFS and JobTracker for MapReduce).
>>
>> Arun
>>
>> On Aug 10, 2012, at 12:12 PM, anil gupta wrote:
>>
>> Hi Aji,
>>
>> Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha
>> then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not
>> of production quality yet(its in Alpha).
>> Namenode use to be a Single Point of Failure in releases prior to Hadoop
>> 2.0.0.
>>
>> HTH,
>> Anil Gupta
>>
>> On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com>
>> wrote:
>>>
>>> Hadoop's file system was (mostly) copied from the concepts of Google's
>>> old file system.
>>>
>>> The original paper is probably the best way to learn about that.
>>>
>>> http://research.google.com/archive/gfs.html
>>>
>>>
>>>
>>> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
>>>>
>>>> I am very new to Hadoop. I am considering setting up a Hadoop cluster
>>>> consisting of 5 nodes where each node has 3 internal hard drives. I
>>>> understand HDFS has a configurable redundancy feature but what happens if an
>>>> entire drive crashes (physically) for whatever reason? How does Hadoop
>>>> recover, if it can, from this situation? What else should I know before
>>>> setting up my cluster this way? Thanks in advance.
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Thanks & Regards,
>> Anil Gupta
>>
>>
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>



-- 
Harsh J

Re: Hadoop hardware failure recovery

Posted by Aji Janis <aj...@gmail.com>.

Thank you everyone for all the feedback and suggestions. Its good to know
these details as I move forward.

Piling on to the question, I am curious if any of you have experience with
Accumulo (a requirement for me hence not optional). I was wondering if the
data loss (physical crash of the hard drive) in this case would be resolved
by Hadoop (HDFS I should say). Any suggestions and/or where I could find
some specs on this would be really appreciated!


Thank you again for all the pointers.
-Aji







On Sun, Aug 12, 2012 at 3:07 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Yep, hadoop-2 is alpha but is progressing nicely...
>
> However, if you have access to some 'enterprise HA' utilities (VMWare or
> Linux HA) you can get *very decent* production-grade high-availability in
> hadoop-1.x too (both NameNode for HDFS and JobTracker for MapReduce).
>
> Arun
>
> On Aug 10, 2012, at 12:12 PM, anil gupta wrote:
>
> Hi Aji,
>
> Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha
> then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not
> of production quality yet(its in Alpha).
> Namenode use to be a Single Point of Failure in releases prior to Hadoop
> 2.0.0.
>
> HTH,
> Anil Gupta
>
> On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com>wrote:
>
>> Hadoop's file system was (mostly) copied from the concepts of Google's
>> old file system.
>>
>> The original paper is probably the best way to learn about that.
>>
>> http://research.google.com/archive/gfs.html
>>
>>
>>
>> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
>>
>>> I am very new to Hadoop. I am considering setting up a Hadoop cluster
>>> consisting of 5 nodes where each node has 3 internal hard drives. I
>>> understand HDFS has a configurable redundancy feature but what happens if
>>> an entire drive crashes (physically) for whatever reason? How does Hadoop
>>> recover, if it can, from this situation? What else should I know before
>>> setting up my cluster this way? Thanks in advance.
>>>
>>>
>>>
>>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: Hadoop hardware failure recovery

Posted by Aji Janis <aj...@gmail.com>.

Thank you everyone for all the feedback and suggestions. Its good to know
these details as I move forward.

Piling on to the question, I am curious if any of you have experience with
Accumulo (a requirement for me hence not optional). I was wondering if the
data loss (physical crash of the hard drive) in this case would be resolved
by Hadoop (HDFS I should say). Any suggestions and/or where I could find
some specs on this would be really appreciated!


Thank you again for all the pointers.
-Aji







On Sun, Aug 12, 2012 at 3:07 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Yep, hadoop-2 is alpha but is progressing nicely...
>
> However, if you have access to some 'enterprise HA' utilities (VMWare or
> Linux HA) you can get *very decent* production-grade high-availability in
> hadoop-1.x too (both NameNode for HDFS and JobTracker for MapReduce).
>
> Arun
>
> On Aug 10, 2012, at 12:12 PM, anil gupta wrote:
>
> Hi Aji,
>
> Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha
> then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not
> of production quality yet(its in Alpha).
> Namenode use to be a Single Point of Failure in releases prior to Hadoop
> 2.0.0.
>
> HTH,
> Anil Gupta
>
> On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com>wrote:
>
>> Hadoop's file system was (mostly) copied from the concepts of Google's
>> old file system.
>>
>> The original paper is probably the best way to learn about that.
>>
>> http://research.google.com/archive/gfs.html
>>
>>
>>
>> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
>>
>>> I am very new to Hadoop. I am considering setting up a Hadoop cluster
>>> consisting of 5 nodes where each node has 3 internal hard drives. I
>>> understand HDFS has a configurable redundancy feature but what happens if
>>> an entire drive crashes (physically) for whatever reason? How does Hadoop
>>> recover, if it can, from this situation? What else should I know before
>>> setting up my cluster this way? Thanks in advance.
>>>
>>>
>>>
>>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: Hadoop hardware failure recovery

Posted by Aji Janis <aj...@gmail.com>.

Thank you everyone for all the feedback and suggestions. Its good to know
these details as I move forward.

Piling on to the question, I am curious if any of you have experience with
Accumulo (a requirement for me hence not optional). I was wondering if the
data loss (physical crash of the hard drive) in this case would be resolved
by Hadoop (HDFS I should say). Any suggestions and/or where I could find
some specs on this would be really appreciated!


Thank you again for all the pointers.
-Aji







On Sun, Aug 12, 2012 at 3:07 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Yep, hadoop-2 is alpha but is progressing nicely...
>
> However, if you have access to some 'enterprise HA' utilities (VMWare or
> Linux HA) you can get *very decent* production-grade high-availability in
> hadoop-1.x too (both NameNode for HDFS and JobTracker for MapReduce).
>
> Arun
>
> On Aug 10, 2012, at 12:12 PM, anil gupta wrote:
>
> Hi Aji,
>
> Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha
> then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not
> of production quality yet(its in Alpha).
> Namenode use to be a Single Point of Failure in releases prior to Hadoop
> 2.0.0.
>
> HTH,
> Anil Gupta
>
> On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com>wrote:
>
>> Hadoop's file system was (mostly) copied from the concepts of Google's
>> old file system.
>>
>> The original paper is probably the best way to learn about that.
>>
>> http://research.google.com/archive/gfs.html
>>
>>
>>
>> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
>>
>>> I am very new to Hadoop. I am considering setting up a Hadoop cluster
>>> consisting of 5 nodes where each node has 3 internal hard drives. I
>>> understand HDFS has a configurable redundancy feature but what happens if
>>> an entire drive crashes (physically) for whatever reason? How does Hadoop
>>> recover, if it can, from this situation? What else should I know before
>>> setting up my cluster this way? Thanks in advance.
>>>
>>>
>>>
>>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: Hadoop hardware failure recovery

Posted by Aji Janis <aj...@gmail.com>.

Thank you everyone for all the feedback and suggestions. Its good to know
these details as I move forward.

Piling on to the question, I am curious if any of you have experience with
Accumulo (a requirement for me hence not optional). I was wondering if the
data loss (physical crash of the hard drive) in this case would be resolved
by Hadoop (HDFS I should say). Any suggestions and/or where I could find
some specs on this would be really appreciated!


Thank you again for all the pointers.
-Aji







On Sun, Aug 12, 2012 at 3:07 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Yep, hadoop-2 is alpha but is progressing nicely...
>
> However, if you have access to some 'enterprise HA' utilities (VMWare or
> Linux HA) you can get *very decent* production-grade high-availability in
> hadoop-1.x too (both NameNode for HDFS and JobTracker for MapReduce).
>
> Arun
>
> On Aug 10, 2012, at 12:12 PM, anil gupta wrote:
>
> Hi Aji,
>
> Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha
> then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not
> of production quality yet(its in Alpha).
> Namenode use to be a Single Point of Failure in releases prior to Hadoop
> 2.0.0.
>
> HTH,
> Anil Gupta
>
> On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com>wrote:
>
>> Hadoop's file system was (mostly) copied from the concepts of Google's
>> old file system.
>>
>> The original paper is probably the best way to learn about that.
>>
>> http://research.google.com/archive/gfs.html
>>
>>
>>
>> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
>>
>>> I am very new to Hadoop. I am considering setting up a Hadoop cluster
>>> consisting of 5 nodes where each node has 3 internal hard drives. I
>>> understand HDFS has a configurable redundancy feature but what happens if
>>> an entire drive crashes (physically) for whatever reason? How does Hadoop
>>> recover, if it can, from this situation? What else should I know before
>>> setting up my cluster this way? Thanks in advance.
>>>
>>>
>>>
>>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: Hadoop hardware failure recovery

Posted by Arun C Murthy <ac...@hortonworks.com>.

Yep, hadoop-2 is alpha but is progressing nicely...

However, if you have access to some 'enterprise HA' utilities (VMWare or Linux HA) you can get *very decent* production-grade high-availability in hadoop-1.x too (both NameNode for HDFS and JobTracker for MapReduce).

Arun

On Aug 10, 2012, at 12:12 PM, anil gupta wrote:

> Hi Aji,
> 
> Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not of production quality yet(its in Alpha).
> Namenode use to be a Single Point of Failure in releases prior to Hadoop 2.0.0. 
> 
> HTH,
> Anil Gupta
> 
> On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com> wrote:
> Hadoop's file system was (mostly) copied from the concepts of Google's old file system.
> 
> The original paper is probably the best way to learn about that.
> 
> http://research.google.com/archive/gfs.html
> 
> 
> 
> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
> I am very new to Hadoop. I am considering setting up a Hadoop cluster consisting of 5 nodes where each node has 3 internal hard drives. I understand HDFS has a configurable redundancy feature but what happens if an entire drive crashes (physically) for whatever reason? How does Hadoop recover, if it can, from this situation? What else should I know before setting up my cluster this way? Thanks in advance.
> 
> 
> 
> 
> 
> 
> -- 
> Thanks & Regards,
> Anil Gupta

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: Hadoop hardware failure recovery

Posted by Arun C Murthy <ac...@hortonworks.com>.

Yep, hadoop-2 is alpha but is progressing nicely...

However, if you have access to some 'enterprise HA' utilities (VMWare or Linux HA) you can get *very decent* production-grade high-availability in hadoop-1.x too (both NameNode for HDFS and JobTracker for MapReduce).

Arun

On Aug 10, 2012, at 12:12 PM, anil gupta wrote:

> Hi Aji,
> 
> Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not of production quality yet(its in Alpha).
> Namenode use to be a Single Point of Failure in releases prior to Hadoop 2.0.0. 
> 
> HTH,
> Anil Gupta
> 
> On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com> wrote:
> Hadoop's file system was (mostly) copied from the concepts of Google's old file system.
> 
> The original paper is probably the best way to learn about that.
> 
> http://research.google.com/archive/gfs.html
> 
> 
> 
> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
> I am very new to Hadoop. I am considering setting up a Hadoop cluster consisting of 5 nodes where each node has 3 internal hard drives. I understand HDFS has a configurable redundancy feature but what happens if an entire drive crashes (physically) for whatever reason? How does Hadoop recover, if it can, from this situation? What else should I know before setting up my cluster this way? Thanks in advance.
> 
> 
> 
> 
> 
> 
> -- 
> Thanks & Regards,
> Anil Gupta

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: Hadoop hardware failure recovery

Posted by Mohammad Tariq <do...@gmail.com>.

Very correctly said by Anil. Actually Hadoop HA is not yet production
ready and you are about to begin your Hadoop journey, so just thought
of not mentioning it. If you want to use HA, just pull it from the
trunk and do a build.

Regards,
    Mohammad Tariq


On Sat, Aug 11, 2012 at 12:42 AM, anil gupta <an...@gmail.com> wrote:
> Hi Aji,
>
> Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha then
> Namenode is not a single point of failure.However, Hadoop 2.0.0 is not of
> production quality yet(its in Alpha).
> Namenode use to be a Single Point of Failure in releases prior to Hadoop
> 2.0.0.
>
> HTH,
> Anil Gupta
>
>
> On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com> wrote:
>>
>> Hadoop's file system was (mostly) copied from the concepts of Google's old
>> file system.
>>
>> The original paper is probably the best way to learn about that.
>>
>> http://research.google.com/archive/gfs.html
>>
>>
>>
>> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
>>>
>>> I am very new to Hadoop. I am considering setting up a Hadoop cluster
>>> consisting of 5 nodes where each node has 3 internal hard drives. I
>>> understand HDFS has a configurable redundancy feature but what happens if an
>>> entire drive crashes (physically) for whatever reason? How does Hadoop
>>> recover, if it can, from this situation? What else should I know before
>>> setting up my cluster this way? Thanks in advance.
>>>
>>>
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta

Re: Hadoop hardware failure recovery

Posted by Mohammad Tariq <do...@gmail.com>.

Very correctly said by Anil. Actually Hadoop HA is not yet production
ready and you are about to begin your Hadoop journey, so just thought
of not mentioning it. If you want to use HA, just pull it from the
trunk and do a build.

Regards,
    Mohammad Tariq


On Sat, Aug 11, 2012 at 12:42 AM, anil gupta <an...@gmail.com> wrote:
> Hi Aji,
>
> Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha then
> Namenode is not a single point of failure.However, Hadoop 2.0.0 is not of
> production quality yet(its in Alpha).
> Namenode use to be a Single Point of Failure in releases prior to Hadoop
> 2.0.0.
>
> HTH,
> Anil Gupta
>
>
> On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com> wrote:
>>
>> Hadoop's file system was (mostly) copied from the concepts of Google's old
>> file system.
>>
>> The original paper is probably the best way to learn about that.
>>
>> http://research.google.com/archive/gfs.html
>>
>>
>>
>> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
>>>
>>> I am very new to Hadoop. I am considering setting up a Hadoop cluster
>>> consisting of 5 nodes where each node has 3 internal hard drives. I
>>> understand HDFS has a configurable redundancy feature but what happens if an
>>> entire drive crashes (physically) for whatever reason? How does Hadoop
>>> recover, if it can, from this situation? What else should I know before
>>> setting up my cluster this way? Thanks in advance.
>>>
>>>
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta

Re: Hadoop hardware failure recovery

Posted by Arun C Murthy <ac...@hortonworks.com>.

Yep, hadoop-2 is alpha but is progressing nicely...

However, if you have access to some 'enterprise HA' utilities (VMWare or Linux HA) you can get *very decent* production-grade high-availability in hadoop-1.x too (both NameNode for HDFS and JobTracker for MapReduce).

Arun

On Aug 10, 2012, at 12:12 PM, anil gupta wrote:

> Hi Aji,
> 
> Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not of production quality yet(its in Alpha).
> Namenode use to be a Single Point of Failure in releases prior to Hadoop 2.0.0. 
> 
> HTH,
> Anil Gupta
> 
> On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com> wrote:
> Hadoop's file system was (mostly) copied from the concepts of Google's old file system.
> 
> The original paper is probably the best way to learn about that.
> 
> http://research.google.com/archive/gfs.html
> 
> 
> 
> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
> I am very new to Hadoop. I am considering setting up a Hadoop cluster consisting of 5 nodes where each node has 3 internal hard drives. I understand HDFS has a configurable redundancy feature but what happens if an entire drive crashes (physically) for whatever reason? How does Hadoop recover, if it can, from this situation? What else should I know before setting up my cluster this way? Thanks in advance.
> 
> 
> 
> 
> 
> 
> -- 
> Thanks & Regards,
> Anil Gupta

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: Hadoop hardware failure recovery

Posted by Arun C Murthy <ac...@hortonworks.com>.

Yep, hadoop-2 is alpha but is progressing nicely...

However, if you have access to some 'enterprise HA' utilities (VMWare or Linux HA) you can get *very decent* production-grade high-availability in hadoop-1.x too (both NameNode for HDFS and JobTracker for MapReduce).

Arun

On Aug 10, 2012, at 12:12 PM, anil gupta wrote:

> Hi Aji,
> 
> Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not of production quality yet(its in Alpha).
> Namenode use to be a Single Point of Failure in releases prior to Hadoop 2.0.0. 
> 
> HTH,
> Anil Gupta
> 
> On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com> wrote:
> Hadoop's file system was (mostly) copied from the concepts of Google's old file system.
> 
> The original paper is probably the best way to learn about that.
> 
> http://research.google.com/archive/gfs.html
> 
> 
> 
> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
> I am very new to Hadoop. I am considering setting up a Hadoop cluster consisting of 5 nodes where each node has 3 internal hard drives. I understand HDFS has a configurable redundancy feature but what happens if an entire drive crashes (physically) for whatever reason? How does Hadoop recover, if it can, from this situation? What else should I know before setting up my cluster this way? Thanks in advance.
> 
> 
> 
> 
> 
> 
> -- 
> Thanks & Regards,
> Anil Gupta

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: Hadoop hardware failure recovery

Posted by Mohammad Tariq <do...@gmail.com>.

Very correctly said by Anil. Actually Hadoop HA is not yet production
ready and you are about to begin your Hadoop journey, so just thought
of not mentioning it. If you want to use HA, just pull it from the
trunk and do a build.

Regards,
    Mohammad Tariq


On Sat, Aug 11, 2012 at 12:42 AM, anil gupta <an...@gmail.com> wrote:
> Hi Aji,
>
> Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha then
> Namenode is not a single point of failure.However, Hadoop 2.0.0 is not of
> production quality yet(its in Alpha).
> Namenode use to be a Single Point of Failure in releases prior to Hadoop
> 2.0.0.
>
> HTH,
> Anil Gupta
>
>
> On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com> wrote:
>>
>> Hadoop's file system was (mostly) copied from the concepts of Google's old
>> file system.
>>
>> The original paper is probably the best way to learn about that.
>>
>> http://research.google.com/archive/gfs.html
>>
>>
>>
>> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
>>>
>>> I am very new to Hadoop. I am considering setting up a Hadoop cluster
>>> consisting of 5 nodes where each node has 3 internal hard drives. I
>>> understand HDFS has a configurable redundancy feature but what happens if an
>>> entire drive crashes (physically) for whatever reason? How does Hadoop
>>> recover, if it can, from this situation? What else should I know before
>>> setting up my cluster this way? Thanks in advance.
>>>
>>>
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta

Re: Hadoop hardware failure recovery

Posted by anil gupta <an...@gmail.com>.

Hi Aji,

Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha
then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not
of production quality yet(its in Alpha).
Namenode use to be a Single Point of Failure in releases prior to Hadoop
2.0.0.

HTH,
Anil Gupta

On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com> wrote:

> Hadoop's file system was (mostly) copied from the concepts of Google's old
> file system.
>
> The original paper is probably the best way to learn about that.
>
> http://research.google.com/archive/gfs.html
>
>
>
> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
>
>> I am very new to Hadoop. I am considering setting up a Hadoop cluster
>> consisting of 5 nodes where each node has 3 internal hard drives. I
>> understand HDFS has a configurable redundancy feature but what happens if
>> an entire drive crashes (physically) for whatever reason? How does Hadoop
>> recover, if it can, from this situation? What else should I know before
>> setting up my cluster this way? Thanks in advance.
>>
>>
>>
>

-- 
Thanks & Regards,
Anil Gupta

Re: Hadoop hardware failure recovery

Posted by anil gupta <an...@gmail.com>.

Hi Aji,

Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha
then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not
of production quality yet(its in Alpha).
Namenode use to be a Single Point of Failure in releases prior to Hadoop
2.0.0.

HTH,
Anil Gupta

On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com> wrote:

> Hadoop's file system was (mostly) copied from the concepts of Google's old
> file system.
>
> The original paper is probably the best way to learn about that.
>
> http://research.google.com/archive/gfs.html
>
>
>
> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
>
>> I am very new to Hadoop. I am considering setting up a Hadoop cluster
>> consisting of 5 nodes where each node has 3 internal hard drives. I
>> understand HDFS has a configurable redundancy feature but what happens if
>> an entire drive crashes (physically) for whatever reason? How does Hadoop
>> recover, if it can, from this situation? What else should I know before
>> setting up my cluster this way? Thanks in advance.
>>
>>
>>
>

-- 
Thanks & Regards,
Anil Gupta

Re: Hadoop hardware failure recovery

Posted by anil gupta <an...@gmail.com>.

Hi Aji,

Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha
then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not
of production quality yet(its in Alpha).
Namenode use to be a Single Point of Failure in releases prior to Hadoop
2.0.0.

HTH,
Anil Gupta

On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com> wrote:

> Hadoop's file system was (mostly) copied from the concepts of Google's old
> file system.
>
> The original paper is probably the best way to learn about that.
>
> http://research.google.com/archive/gfs.html
>
>
>
> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
>
>> I am very new to Hadoop. I am considering setting up a Hadoop cluster
>> consisting of 5 nodes where each node has 3 internal hard drives. I
>> understand HDFS has a configurable redundancy feature but what happens if
>> an entire drive crashes (physically) for whatever reason? How does Hadoop
>> recover, if it can, from this situation? What else should I know before
>> setting up my cluster this way? Thanks in advance.
>>
>>
>>
>

-- 
Thanks & Regards,
Anil Gupta

Re: Hadoop hardware failure recovery

Posted by anil gupta <an...@gmail.com>.

Hi Aji,

Adding onto whatever Mohammad Tariq said, If you use Hadoop 2.0.0-Alpha
then Namenode is not a single point of failure.However, Hadoop 2.0.0 is not
of production quality yet(its in Alpha).
Namenode use to be a Single Point of Failure in releases prior to Hadoop
2.0.0.

HTH,
Anil Gupta

On Fri, Aug 10, 2012 at 11:55 AM, Ted Dunning <td...@maprtech.com> wrote:

> Hadoop's file system was (mostly) copied from the concepts of Google's old
> file system.
>
> The original paper is probably the best way to learn about that.
>
> http://research.google.com/archive/gfs.html
>
>
>
> On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:
>
>> I am very new to Hadoop. I am considering setting up a Hadoop cluster
>> consisting of 5 nodes where each node has 3 internal hard drives. I
>> understand HDFS has a configurable redundancy feature but what happens if
>> an entire drive crashes (physically) for whatever reason? How does Hadoop
>> recover, if it can, from this situation? What else should I know before
>> setting up my cluster this way? Thanks in advance.
>>
>>
>>
>

-- 
Thanks & Regards,
Anil Gupta

Re: Hadoop hardware failure recovery

Posted by Ted Dunning <td...@maprtech.com>.

Hadoop's file system was (mostly) copied from the concepts of Google's old
file system.

The original paper is probably the best way to learn about that.

http://research.google.com/archive/gfs.html

On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:

> I am very new to Hadoop. I am considering setting up a Hadoop cluster
> consisting of 5 nodes where each node has 3 internal hard drives. I
> understand HDFS has a configurable redundancy feature but what happens if
> an entire drive crashes (physically) for whatever reason? How does Hadoop
> recover, if it can, from this situation? What else should I know before
> setting up my cluster this way? Thanks in advance.
>
>
>

Re: Hadoop hardware failure recovery

Posted by Ted Dunning <td...@maprtech.com>.

Hadoop's file system was (mostly) copied from the concepts of Google's old
file system.

The original paper is probably the best way to learn about that.

http://research.google.com/archive/gfs.html

On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:

> I am very new to Hadoop. I am considering setting up a Hadoop cluster
> consisting of 5 nodes where each node has 3 internal hard drives. I
> understand HDFS has a configurable redundancy feature but what happens if
> an entire drive crashes (physically) for whatever reason? How does Hadoop
> recover, if it can, from this situation? What else should I know before
> setting up my cluster this way? Thanks in advance.
>
>
>

Re: Hadoop hardware failure recovery

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Aji,

   Hadoop's redundancy feature allows data to be replicated over the
entire cluster. So, even if entire disk is gone or even the entire
machine for that matter, your data is still there in other node(s).
But, we need to keep one thing in mind that the 'master' node is the
single point of failure in a Hadoop cluster. If the machine running
master process(es) is down, you are trapped. For more detail you can
visit the home page at : redundancy feature

Regards,
    Mohammad Tariq

On Sat, Aug 11, 2012 at 12:08 AM, Aji Janis <aj...@gmail.com> wrote:
> I am very new to Hadoop. I am considering setting up a Hadoop cluster
> consisting of 5 nodes where each node has 3 internal hard drives. I
> understand HDFS has a configurable redundancy feature but what happens if an
> entire drive crashes (physically) for whatever reason? How does Hadoop
> recover, if it can, from this situation? What else should I know before
> setting up my cluster this way? Thanks in advance.
>
>

Re: Hadoop hardware failure recovery

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Harsh,

  Thanks a lot for keeping me updated.

@Aji : Apologies for misguiding you unintentionally.

Regards,
    Mohammad Tariq



On Sun, Aug 12, 2012 at 6:15 PM, Harsh J <ha...@cloudera.com> wrote:

> Aji,
>
> Your question seems to be about data (blocks) loss. Hadoop would
> recover by detecting the blocks you've lost from that disk failure
> (auto) and re-replicate their copies from the other available
> redundant copies in the cluster (Controlled via the configurable
> replication factor).
>
> On Sat, Aug 11, 2012 at 12:08 AM, Aji Janis <aj...@gmail.com> wrote:
> > I am very new to Hadoop. I am considering setting up a Hadoop cluster
> > consisting of 5 nodes where each node has 3 internal hard drives. I
> > understand HDFS has a configurable redundancy feature but what happens
> if an
> > entire drive crashes (physically) for whatever reason? How does Hadoop
> > recover, if it can, from this situation? What else should I know before
> > setting up my cluster this way? Thanks in advance.
> >
> >
>
>
>
> --
> Harsh J
>

Re: Hadoop hardware failure recovery

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Harsh,

  Thanks a lot for keeping me updated.

@Aji : Apologies for misguiding you unintentionally.

Regards,
    Mohammad Tariq



On Sun, Aug 12, 2012 at 6:15 PM, Harsh J <ha...@cloudera.com> wrote:

> Aji,
>
> Your question seems to be about data (blocks) loss. Hadoop would
> recover by detecting the blocks you've lost from that disk failure
> (auto) and re-replicate their copies from the other available
> redundant copies in the cluster (Controlled via the configurable
> replication factor).
>
> On Sat, Aug 11, 2012 at 12:08 AM, Aji Janis <aj...@gmail.com> wrote:
> > I am very new to Hadoop. I am considering setting up a Hadoop cluster
> > consisting of 5 nodes where each node has 3 internal hard drives. I
> > understand HDFS has a configurable redundancy feature but what happens
> if an
> > entire drive crashes (physically) for whatever reason? How does Hadoop
> > recover, if it can, from this situation? What else should I know before
> > setting up my cluster this way? Thanks in advance.
> >
> >
>
>
>
> --
> Harsh J
>

Re: Hadoop hardware failure recovery

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Harsh,

  Thanks a lot for keeping me updated.

@Aji : Apologies for misguiding you unintentionally.

Regards,
    Mohammad Tariq



On Sun, Aug 12, 2012 at 6:15 PM, Harsh J <ha...@cloudera.com> wrote:

> Aji,
>
> Your question seems to be about data (blocks) loss. Hadoop would
> recover by detecting the blocks you've lost from that disk failure
> (auto) and re-replicate their copies from the other available
> redundant copies in the cluster (Controlled via the configurable
> replication factor).
>
> On Sat, Aug 11, 2012 at 12:08 AM, Aji Janis <aj...@gmail.com> wrote:
> > I am very new to Hadoop. I am considering setting up a Hadoop cluster
> > consisting of 5 nodes where each node has 3 internal hard drives. I
> > understand HDFS has a configurable redundancy feature but what happens
> if an
> > entire drive crashes (physically) for whatever reason? How does Hadoop
> > recover, if it can, from this situation? What else should I know before
> > setting up my cluster this way? Thanks in advance.
> >
> >
>
>
>
> --
> Harsh J
>

Re: Hadoop hardware failure recovery

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Harsh,

  Thanks a lot for keeping me updated.

@Aji : Apologies for misguiding you unintentionally.

Regards,
    Mohammad Tariq



On Sun, Aug 12, 2012 at 6:15 PM, Harsh J <ha...@cloudera.com> wrote:

> Aji,
>
> Your question seems to be about data (blocks) loss. Hadoop would
> recover by detecting the blocks you've lost from that disk failure
> (auto) and re-replicate their copies from the other available
> redundant copies in the cluster (Controlled via the configurable
> replication factor).
>
> On Sat, Aug 11, 2012 at 12:08 AM, Aji Janis <aj...@gmail.com> wrote:
> > I am very new to Hadoop. I am considering setting up a Hadoop cluster
> > consisting of 5 nodes where each node has 3 internal hard drives. I
> > understand HDFS has a configurable redundancy feature but what happens
> if an
> > entire drive crashes (physically) for whatever reason? How does Hadoop
> > recover, if it can, from this situation? What else should I know before
> > setting up my cluster this way? Thanks in advance.
> >
> >
>
>
>
> --
> Harsh J
>

Re: Hadoop hardware failure recovery

Posted by Harsh J <ha...@cloudera.com>.

Aji,

Your question seems to be about data (blocks) loss. Hadoop would
recover by detecting the blocks you've lost from that disk failure
(auto) and re-replicate their copies from the other available
redundant copies in the cluster (Controlled via the configurable
replication factor).

On Sat, Aug 11, 2012 at 12:08 AM, Aji Janis <aj...@gmail.com> wrote:
> I am very new to Hadoop. I am considering setting up a Hadoop cluster
> consisting of 5 nodes where each node has 3 internal hard drives. I
> understand HDFS has a configurable redundancy feature but what happens if an
> entire drive crashes (physically) for whatever reason? How does Hadoop
> recover, if it can, from this situation? What else should I know before
> setting up my cluster this way? Thanks in advance.
>
>

-- 
Harsh J

Re: Hadoop hardware failure recovery

Posted by Ted Dunning <td...@maprtech.com>.

Hadoop's file system was (mostly) copied from the concepts of Google's old
file system.

The original paper is probably the best way to learn about that.

http://research.google.com/archive/gfs.html

On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:

> I am very new to Hadoop. I am considering setting up a Hadoop cluster
> consisting of 5 nodes where each node has 3 internal hard drives. I
> understand HDFS has a configurable redundancy feature but what happens if
> an entire drive crashes (physically) for whatever reason? How does Hadoop
> recover, if it can, from this situation? What else should I know before
> setting up my cluster this way? Thanks in advance.
>
>
>

Re: Hadoop hardware failure recovery

Posted by Ted Dunning <td...@maprtech.com>.

Hadoop's file system was (mostly) copied from the concepts of Google's old
file system.

The original paper is probably the best way to learn about that.

http://research.google.com/archive/gfs.html

On Fri, Aug 10, 2012 at 11:38 AM, Aji Janis <aj...@gmail.com> wrote:

> I am very new to Hadoop. I am considering setting up a Hadoop cluster
> consisting of 5 nodes where each node has 3 internal hard drives. I
> understand HDFS has a configurable redundancy feature but what happens if
> an entire drive crashes (physically) for whatever reason? How does Hadoop
> recover, if it can, from this situation? What else should I know before
> setting up my cluster this way? Thanks in advance.
>
>
>

Re: Hadoop hardware failure recovery

Posted by Harsh J <ha...@cloudera.com>.

Aji,

Your question seems to be about data (blocks) loss. Hadoop would
recover by detecting the blocks you've lost from that disk failure
(auto) and re-replicate their copies from the other available
redundant copies in the cluster (Controlled via the configurable
replication factor).

On Sat, Aug 11, 2012 at 12:08 AM, Aji Janis <aj...@gmail.com> wrote:
> I am very new to Hadoop. I am considering setting up a Hadoop cluster
> consisting of 5 nodes where each node has 3 internal hard drives. I
> understand HDFS has a configurable redundancy feature but what happens if an
> entire drive crashes (physically) for whatever reason? How does Hadoop
> recover, if it can, from this situation? What else should I know before
> setting up my cluster this way? Thanks in advance.
>
>

-- 
Harsh J

Re: Hadoop hardware failure recovery

Posted by Harsh J <ha...@cloudera.com>.

Aji,

Your question seems to be about data (blocks) loss. Hadoop would
recover by detecting the blocks you've lost from that disk failure
(auto) and re-replicate their copies from the other available
redundant copies in the cluster (Controlled via the configurable
replication factor).

On Sat, Aug 11, 2012 at 12:08 AM, Aji Janis <aj...@gmail.com> wrote:
> I am very new to Hadoop. I am considering setting up a Hadoop cluster
> consisting of 5 nodes where each node has 3 internal hard drives. I
> understand HDFS has a configurable redundancy feature but what happens if an
> entire drive crashes (physically) for whatever reason? How does Hadoop
> recover, if it can, from this situation? What else should I know before
> setting up my cluster this way? Thanks in advance.
>
>

-- 
Harsh J

Re: Hadoop hardware failure recovery

Posted by Harsh J <ha...@cloudera.com>.

Aji,

Your question seems to be about data (blocks) loss. Hadoop would
recover by detecting the blocks you've lost from that disk failure
(auto) and re-replicate their copies from the other available
redundant copies in the cluster (Controlled via the configurable
replication factor).

On Sat, Aug 11, 2012 at 12:08 AM, Aji Janis <aj...@gmail.com> wrote:
> I am very new to Hadoop. I am considering setting up a Hadoop cluster
> consisting of 5 nodes where each node has 3 internal hard drives. I
> understand HDFS has a configurable redundancy feature but what happens if an
> entire drive crashes (physically) for whatever reason? How does Hadoop
> recover, if it can, from this situation? What else should I know before
> setting up my cluster this way? Thanks in advance.
>
>

-- 
Harsh J

Re: Hadoop hardware failure recovery

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Aji,

   Hadoop's redundancy feature allows data to be replicated over the
entire cluster. So, even if entire disk is gone or even the entire
machine for that matter, your data is still there in other node(s).
But, we need to keep one thing in mind that the 'master' node is the
single point of failure in a Hadoop cluster. If the machine running
master process(es) is down, you are trapped. For more detail you can
visit the home page at : redundancy feature

Regards,
    Mohammad Tariq

On Sat, Aug 11, 2012 at 12:08 AM, Aji Janis <aj...@gmail.com> wrote:
> I am very new to Hadoop. I am considering setting up a Hadoop cluster
> consisting of 5 nodes where each node has 3 internal hard drives. I
> understand HDFS has a configurable redundancy feature but what happens if an
> entire drive crashes (physically) for whatever reason? How does Hadoop
> recover, if it can, from this situation? What else should I know before
> setting up my cluster this way? Thanks in advance.
>
>

Re: Hadoop hardware failure recovery

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Aji,

   Hadoop's redundancy feature allows data to be replicated over the
entire cluster. So, even if entire disk is gone or even the entire
machine for that matter, your data is still there in other node(s).
But, we need to keep one thing in mind that the 'master' node is the
single point of failure in a Hadoop cluster. If the machine running
master process(es) is down, you are trapped. For more detail you can
visit the home page at : redundancy feature

Regards,
    Mohammad Tariq

On Sat, Aug 11, 2012 at 12:08 AM, Aji Janis <aj...@gmail.com> wrote:
> I am very new to Hadoop. I am considering setting up a Hadoop cluster
> consisting of 5 nodes where each node has 3 internal hard drives. I
> understand HDFS has a configurable redundancy feature but what happens if an
> entire drive crashes (physically) for whatever reason? How does Hadoop
> recover, if it can, from this situation? What else should I know before
> setting up my cluster this way? Thanks in advance.
>
>