You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Alain RODRIGUEZ <ar...@gmail.com> on 2013/04/02 10:42:50 UTC

Re: weird behavior with RAID 0 on EC2

I didn't launched these commands when in troubles, next time I will. For
now here is what I have (this is working properly for now).

$mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Sun Mar 17 01:46:05 2013
     Raid Level : raid0
     Array Size : 1761459200 (1679.86 GiB 1803.73 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Sun Mar 17 01:46:05 2013
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 256K

           Name : ip-xxx-xxx-xxx-239:0  (local to host ip-xxx-xxx-xxx-239)
           UUID : 2cbc3efe:11f8f35d:b4f55c81:3903c715
         Events : 0

    Number   Major   Minor   RaidDevice State
       0     202       17        0      active sync   /dev/xvdb1
       1     202       33        1      active sync   /dev/xvdc1
       2     202       49        2      active sync   /dev/xvdd1
       3     202       65        3      active sync   /dev/xvde1


$iostat -x -d
Linux 3.2.0-35-virtual (ip-xxx-xxx-xxx-239)       04/02/2013      _x86_64_
 (4 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     0.30    0.29    0.68     4.41     6.03    21.52
    0.01    6.18   18.40    1.03   2.73   0.26
xvdb              0.00     0.05   59.36    3.57  1601.94   144.99    55.51
    0.85   13.50   12.43   31.16   2.56  16.12
xvdc              0.00     0.01   59.33    3.48  1601.75   144.81    55.62
    0.81   12.92   11.83   31.62   2.51  15.77
xvdd              0.00     0.05   59.31    3.53  1601.69   144.83    55.58
    1.25   19.96   18.99   36.28   3.00  18.85
xvde              0.00     0.01   59.30    3.45  1601.62   144.46    55.65
    1.04   16.50   15.34   36.37   2.85  17.87
md0               0.00     0.00  237.36   14.14  6406.99   579.10    55.56
    0.00    0.00    0.00    0.00   0.00   0.00

@Rudolf

Thanks for the insight, I might use this solution too next time.

Alain

2013/3/31 Rudolf van der Leeden <ru...@scoreloop.com>

> I've seen the same behaviour (SLOW ephemeral disk) a few times.
> You can't do anything with a single slow disk except not using it.
> Our solution was always: Replace the m1.xlarge instance asap and
> everything is good.
> -Rudolf.
>
> On 31.03.2013, at 18:58, Alexis Lê-Quôc wrote:
>
> Alain,
>
> Can you post your mdadm --detail /dev/md0 output here as well as your
> iostat -x -d when that happens. A bad ephemeral drive on EC2 is not unheard
> of.
>
> Alexis | @alq | http://datadog.com
>
> P.S. also, disk utilization is not a reliable metric, iostat's await and
> svctm are more useful imho.
>
>
> On Sun, Mar 31, 2013 at 6:03 AM, aaron morton <aa...@thelastpickle.com>wrote:
>
>> Ok, if you're going to look into it, please keep me/us posted.
>>
>> It's not on my radar.
>>
>> Cheers
>>
>>    -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 28/03/2013, at 2:43 PM, Alain RODRIGUEZ <ar...@gmail.com> wrote:
>>
>> Ok, if you're going to look into it, please keep me/us posted.
>>
>> It happen twice for me, the same day, within a few hours on the same node
>> and only happened to 1 node out of 12, making this node almost unreachable.
>>
>>
>> 2013/3/28 aaron morton <aa...@thelastpickle.com>
>>
>>> I noticed this on an m1.xlarge (cassandra 1.1.10) instance today as
>>> well, 1 or 2 disks in a raid 0 running at 85 to 100% the others 35 to
>>> 50ish.
>>>
>>> Have not looked into it.
>>>
>>> Cheers
>>>
>>>    -----------------
>>> Aaron Morton
>>> Freelance Cassandra Consultant
>>> New Zealand
>>>
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 26/03/2013, at 11:57 PM, Alain RODRIGUEZ <ar...@gmail.com> wrote:
>>>
>>> We use C* on m1.xLarge AWS EC2 servers, with 4 disks xvdb, xvdc, xvdd,
>>> xvde parts of a logical Raid0 (md0).
>>>
>>> I use to see their use increasing in the same way. This morning there
>>> was a normal minor compaction followed by messages dropped on one node (out
>>> of 12).
>>>
>>> Looking closely at this node I saw the following:
>>>
>>> http://img69.imageshack.us/img69/9425/opscenterweirddisk.png
>>>
>>> On this node, one of the four disks (xvdd) started working hardly while
>>> other worked less intensively.
>>>
>>> This is quite weird since I always saw this 4 disks being used the exact
>>> same way at every moment (as you can see on 5 other nodes or when the node
>>> ".239" come back to normal).
>>>
>>> Any idea on what happened and on how it can be avoided ?
>>>
>>> Alain
>>>
>>>
>>>
>>
>>
>
>