You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Edward Capriolo <ed...@gmail.com> on 2010/02/19 20:08:01 UTC

HDFS vs Giant Direct Attached Arrays

Hadoop is great. Almost every day I live gives me more reasons to like it.

My story for today:
We have a system running a file system with a 48 TB Disk array on 4
shelves. Today I got this information about firmware updates. (don't
you love firmware updates?)

---------------
Any XXXX Controllers configured with any of the XXXX SATA hard drives
listed in the Scope section might exhibit slow virtual disk
initialization/expansion, rare drive stalls and timeouts, scrubbing
errors, and reduced performance.

Data might be at risk if multiple drive failures occur. Proactive hard
drive replacement is neither necessary, nor authorized.

"Updating the firmware of disk drives in a virtual disk risks the loss
of data and causes the drives to be temporarily inaccessible.”
---------------
In a nutshell, the safest way is to offline the system and update
disks one at a time (we don't know how long updating one disk takes).
Or we have to smart fail disks and move them out of this array into
another array, (lucky we have another one) apply the firmware, put the
disk back in, wait for re-stripe! Repeat 47 times!

So the options are:
1) Risky -- do the update online hope we do not corrupt the thing
2) slow -- offline the system update 1 disk at a time as suggested

No option has 0 downtime. Also note that since this updates fixes "
reduced performance." Thus this chassis was never operating at max
efficiency, due to whatever reason, RAID card complexity, firmware,
back plane, whatever.

Now, imagine if this was a 6 node hadoop systems with 8 disks a node,
and we had to do a firmware updates. Wow! this would be easy. We could
accomplish this with no system-wide outage, at our leisure. With a
file replication factor of 3 we could hot swap disks, or even safely
fail an entire node with no outage.

We would not need spare hardware, need to inform people of an outage,
or disable alerts. Hadoop would not care if the firmware on all the
disks did not match. Hadoop did not have some complicated RAID that
was running at " reduced performance." all this time. Hadoop just uses
independent disks, much less complexity.

HDFS ForTheWin!

Re: HDFS vs Giant Direct Attached Arrays

Posted by Edward Capriolo <ed...@gmail.com>.

On Mon, Feb 22, 2010 at 7:23 AM, Steve Loughran <st...@apache.org> wrote:
> Allen Wittenauer wrote:
>>
>>
>> On 2/19/10 11:08 AM, "Edward Capriolo" <ed...@gmail.com> wrote:
>>>
>>> Now, imagine if this was a 6 node hadoop systems with 8 disks a node,
>>> and we had to do a firmware updates. Wow! this would be easy. We could
>>> accomplish this with no system-wide outage, at our leisure. With a
>>> file replication factor of 3 we could hot swap disks, or even safely
>>> fail an entire node with no outage.
>>
>> ... except for the NN and JT nodes. :)
>>
>
> or DNS.
>
> That plane crash in Palo Alto last week took out some of our wifi user
> authentication stuff I needed in the UK to get my laptop on the network in a
> meeting room, thus forcing me to pay attention instead.
>
> Remember, there's always a SPOF.
>
> -steve
>

Sure there is SPOF's all over depending on your paranoia level. I was
looking at the virtues of decentralized storage (hdfs) vs big disk
array.

I just thought I would give a funny update. The manufacturer now tells
us if we do not upgrade the firmware we void our warranty. Also we
tried to smart fail one of the disks out and the management console
locked up. HDFS +2 :)

Re: HDFS vs Giant Direct Attached Arrays

Posted by Steve Loughran <st...@apache.org>.

Allen Wittenauer wrote:
> 
> 
> On 2/19/10 11:08 AM, "Edward Capriolo" <ed...@gmail.com> wrote:
>> Now, imagine if this was a 6 node hadoop systems with 8 disks a node,
>> and we had to do a firmware updates. Wow! this would be easy. We could
>> accomplish this with no system-wide outage, at our leisure. With a
>> file replication factor of 3 we could hot swap disks, or even safely
>> fail an entire node with no outage.
> 
> ... except for the NN and JT nodes. :)
> 

or DNS.

That plane crash in Palo Alto last week took out some of our wifi user 
authentication stuff I needed in the UK to get my laptop on the network 
in a meeting room, thus forcing me to pay attention instead.

Remember, there's always a SPOF.

-steve

Re: HDFS vs Giant Direct Attached Arrays

Posted by Allen Wittenauer <aw...@linkedin.com>.



On 2/19/10 11:08 AM, "Edward Capriolo" <ed...@gmail.com> wrote:
> Now, imagine if this was a 6 node hadoop systems with 8 disks a node,
> and we had to do a firmware updates. Wow! this would be easy. We could
> accomplish this with no system-wide outage, at our leisure. With a
> file replication factor of 3 we could hot swap disks, or even safely
> fail an entire node with no outage.

... except for the NN and JT nodes. :)