You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@gump.apache.org by Sander Temme <sa...@temme.net> on 2006/05/15 01:20:50 UTC

The Status of Clarus

Folks,

You may have noticed (or not) that Clarus has not been doing its Gump  
runs for a week or two. The issue was that both of the drives that  
make up the RAID-1 Gump sits on suddenly went out of commission,  
without any notice or warning. This is not supposed to happen, and is  
exactly the reason those drives are mirrored. However, when I visited  
the colocation facility last week, I shut the box down, pulled and re- 
seated these drives and they are now once again available. The fact  
that they can up and disappear like this is kind of scary, but I'm  
glad they are not actually broken.

So, Gump runs are now back on Clarus, running at the same times as on  
vmgump except using gump/trunk.

Results as always available at http://clarus.apache.org/

S.

-- 
sander@temme.net              http://www.temme.net/sander/
PGP FP: 51B4 8727 466A 0BC3 69F4  B7B8 B2BE BC40 1529 24AF

Re: The Status of Clarus

Posted by Steve Loughran <st...@apache.org>.

Sander Temme wrote:
> Folks,
> 
> You may have noticed (or not) that Clarus has not been doing its Gump 
> runs for a week or two. The issue was that both of the drives that make 
> up the RAID-1 Gump sits on suddenly went out of commission, without any 
> notice or warning. This is not supposed to happen, and is exactly the 
> reason those drives are mirrored.

we call this "Raid minus one", in which you think your disks are 
mirrored, but they arent. It is actually a worse state than raid-0, "no 
raid stuff at all", because at least there you know your data is 
vulnerable.

> However, when I visited the colocation 
> facility last week, I shut the box down, pulled and re-seated these 
> drives and they are now once again available. The fact that they can up 
> and disappear like this is kind of scary, but I'm glad they are not 
> actually broken.

This is one of this things that are really hard to test.

I've seen SCSI controllers take down drives that were taking too long to 
respond; sometimes this can be a transient event, or it can be a 
precursor of trouble to come. It could also be the raid controller that 
is failing too -they have their own MTBF, see.

> So, Gump runs are now back on Clarus, running at the same times as on 
> vmgump except using gump/trunk.
> 
> Results as always available at http://clarus.apache.org/
> 
> S.
> 
> --sander@temme.net              http://www.temme.net/sander/
> PGP FP: 51B4 8727 466A 0BC3 69F4  B7B8 B2BE BC40 1529 24AF
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org