You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Thibaut (JIRA)" <ji...@apache.org> on 2011/03/27 23:24:05 UTC

[jira] [Created] (CASSANDRA-2394) Faulty hd kills cluster performance

Faulty hd kills cluster performance
-----------------------------------

                 Key: CASSANDRA-2394
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 0.7.4
            Reporter: Thibaut
            Priority: Minor


Hi,

About every week, a node from our main cluster (>100 nodes) has a faulty hd  (Listing the cassandra data storage directoy triggers an input/output error).

Whenever this occurs, I see many timeoutexceptions in our application on various nodes which cause everything to run very very slowly. Keyrange scans just timeout and will sometimes never succeed. If I stop cassandra on the faulty node, everything runs normal again.

It would be great to have some kind of monitoring thread in cassandra which marks a node as "down" if there are multiple read/write errors to the data directories. A single faulty hd on 1 node shouldn't affect global cluster performance.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2394) Faulty hd kills cluster performance

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012070#comment-13012070 ] 

Jonathan Ellis commented on CASSANDRA-2394:
-------------------------------------------

Why wouldn't dynamic snitch fix this though? Fixing that would be 0.7-scope.

> Faulty hd kills cluster performance
> -----------------------------------
>
>                 Key: CASSANDRA-2394
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>            Priority: Minor
>             Fix For: 0.7.5
>
>
> Hi,
> About every week, a node from our main cluster (>100 nodes) has a faulty hd  (Listing the cassandra data storage directoy triggers an input/output error).
> Whenever this occurs, I see many timeoutexceptions in our application on various nodes which cause everything to run very very slowly. Keyrange scans just timeout and will sometimes never succeed. If I stop cassandra on the faulty node, everything runs normal again.
> It would be great to have some kind of monitoring thread in cassandra which marks a node as "down" if there are multiple read/write errors to the data directories. A single faulty hd on 1 node shouldn't affect global cluster performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2394) Faulty hd kills cluster performance

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012187#comment-13012187 ] 

Jonathan Ellis commented on CASSANDRA-2394:
-------------------------------------------

bq. until the snitch on all coordinators decided to quit using the node

but shouldn't that be negligibly slower than in a small cluster, assuming there is enough query volume that each coordinator is routing some queries for the data in question?

> Faulty hd kills cluster performance
> -----------------------------------
>
>                 Key: CASSANDRA-2394
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>            Priority: Minor
>             Fix For: 0.7.5
>
>
> Hi,
> About every week, a node from our main cluster (>100 nodes) has a faulty hd  (Listing the cassandra data storage directoy triggers an input/output error).
> Whenever this occurs, I see many timeoutexceptions in our application on various nodes which cause everything to run very very slowly. Keyrange scans just timeout and will sometimes never succeed. If I stop cassandra on the faulty node, everything runs normal again.
> It would be great to have some kind of monitoring thread in cassandra which marks a node as "down" if there are multiple read/write errors to the data directories. A single faulty hd on 1 node shouldn't affect global cluster performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2394) Faulty hd kills cluster performance

Posted by "Thibaut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029630#comment-13029630 ] 

Thibaut commented on CASSANDRA-2394:
------------------------------------

I did grep the cassandra log for errors and no error showed up. 
There were errors in kern.log. (reading errors) On another server before, after seeing the reading errors, I did try to copy the cassandra data directory and it caused input/output errors. I didn't try this on the latest server where this error occured, but I suppose it would have caused input/output errors as well.

Yes, it will still repond from any node (our application connects locally first, and these nodes are also not replying), but very very very slowly (e.g. maybe 1% of normal cluster performance) . Before it did show up on the entire cluster a massive commands/responds pending queue (> 1000, version 0.7.4 on all 20 nodes as far as I can remember). Normally the queue has about 0-5 entries at most. 
I didn't output it when it occured on the latest server. Iterating over a table with hector will be continoulsy paused by timeout exceptions. Even if our application is turned off and this is the only application running. I'm also using the dynamic snitch with 0.8 badness threshold.
Replication level is set to 4. So if one node goes down, it should still work as 2 nodes are still available. 

And as I said before, stopping cassandra on the faulty node will nearly instantly return proper cluster performance.

What can help you? Shall I do a jstack trace next time the error occurs? But this can take some time until a hd dies. Do you know of a tool where I can simulate a hd error (or very slow read)?


> Faulty hd kills cluster performance
> -----------------------------------
>
>                 Key: CASSANDRA-2394
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>            Priority: Minor
>             Fix For: 0.7.6
>
>
> Hi,
> About every week, a node from our main cluster (>100 nodes) has a faulty hd  (Listing the cassandra data storage directoy triggers an input/output error).
> Whenever this occurs, I see many timeoutexceptions in our application on various nodes which cause everything to run very very slowly. Keyrange scans just timeout and will sometimes never succeed. If I stop cassandra on the faulty node, everything runs normal again.
> It would be great to have some kind of monitoring thread in cassandra which marks a node as "down" if there are multiple read/write errors to the data directories. A single faulty hd on 1 node shouldn't affect global cluster performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2394) Faulty hd kills cluster performance

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033225#comment-13033225 ] 

Brandon Williams commented on CASSANDRA-2394:
---------------------------------------------

>From a healthy node can you a) provide the output from the getScores() jmx on o.a.c.locator.DynamicEndpointSnitch, and b) also provide the output from dumpTimings() for the host that has the dead drive?

> Faulty hd kills cluster performance
> -----------------------------------
>
>                 Key: CASSANDRA-2394
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>            Priority: Minor
>             Fix For: 0.7.7
>
>
> Hi,
> About every week, a node from our main cluster (>100 nodes) has a faulty hd  (Listing the cassandra data storage directoy triggers an input/output error).
> Whenever this occurs, I see many timeoutexceptions in our application on various nodes which cause everything to run very very slowly. Keyrange scans just timeout and will sometimes never succeed. If I stop cassandra on the faulty node, everything runs normal again.
> It would be great to have some kind of monitoring thread in cassandra which marks a node as "down" if there are multiple read/write errors to the data directories. A single faulty hd on 1 node shouldn't affect global cluster performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2394) Faulty hd kills cluster performance

Posted by "Thibaut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034671#comment-13034671 ] 

Thibaut commented on CASSANDRA-2394:
------------------------------------

I will do this next time and post the results.

Could http://www.mail-archive.com/user@cassandra.apache.org/msg13407.html cause this? We are also using read repair 0.



> Faulty hd kills cluster performance
> -----------------------------------
>
>                 Key: CASSANDRA-2394
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>            Priority: Minor
>             Fix For: 0.7.7
>
>
> Hi,
> About every week, a node from our main cluster (>100 nodes) has a faulty hd  (Listing the cassandra data storage directoy triggers an input/output error).
> Whenever this occurs, I see many timeoutexceptions in our application on various nodes which cause everything to run very very slowly. Keyrange scans just timeout and will sometimes never succeed. If I stop cassandra on the faulty node, everything runs normal again.
> It would be great to have some kind of monitoring thread in cassandra which marks a node as "down" if there are multiple read/write errors to the data directories. A single faulty hd on 1 node shouldn't affect global cluster performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2394) Faulty hd kills cluster performance

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012181#comment-13012181 ] 

Brandon Williams commented on CASSANDRA-2394:
---------------------------------------------

The dynamic snitch would fix this in a small cluster.  In a larger cluster, where many nodes are being used as coordinators, each one will have to observe the faulty node performing badly, and this could be observed as a global degradation in performance, at least until the snitch on all coordinators decided to quit using the node.

> Faulty hd kills cluster performance
> -----------------------------------
>
>                 Key: CASSANDRA-2394
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>            Priority: Minor
>             Fix For: 0.7.5
>
>
> Hi,
> About every week, a node from our main cluster (>100 nodes) has a faulty hd  (Listing the cassandra data storage directoy triggers an input/output error).
> Whenever this occurs, I see many timeoutexceptions in our application on various nodes which cause everything to run very very slowly. Keyrange scans just timeout and will sometimes never succeed. If I stop cassandra on the faulty node, everything runs normal again.
> It would be great to have some kind of monitoring thread in cassandra which marks a node as "down" if there are multiple read/write errors to the data directories. A single faulty hd on 1 node shouldn't affect global cluster performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2394) Faulty hd kills cluster performance

Posted by "Thibaut (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thibaut updated CASSANDRA-2394:
-------------------------------

    Fix Version/s: 0.7.5

> Faulty hd kills cluster performance
> -----------------------------------
>
>                 Key: CASSANDRA-2394
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>            Priority: Minor
>             Fix For: 0.7.5
>
>
> Hi,
> About every week, a node from our main cluster (>100 nodes) has a faulty hd  (Listing the cassandra data storage directoy triggers an input/output error).
> Whenever this occurs, I see many timeoutexceptions in our application on various nodes which cause everything to run very very slowly. Keyrange scans just timeout and will sometimes never succeed. If I stop cassandra on the faulty node, everything runs normal again.
> It would be great to have some kind of monitoring thread in cassandra which marks a node as "down" if there are multiple read/write errors to the data directories. A single faulty hd on 1 node shouldn't affect global cluster performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2394) Faulty hd kills cluster performance

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029474#comment-13029474 ] 

Sylvain Lebresne commented on CASSANDRA-2394:
---------------------------------------------

If there is no exceptions whatsoever in the log, I'm not really sure CASSANDRA-2118 would help.

What you're saying is that there is lots of error in Kern.log, but none in the Cassandra log, right ?
And when you say "the cluster won't respond to any queries anymore", do you mean from any node ? And
which consistency level are we talking ?

> Faulty hd kills cluster performance
> -----------------------------------
>
>                 Key: CASSANDRA-2394
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>            Priority: Minor
>             Fix For: 0.7.6
>
>
> Hi,
> About every week, a node from our main cluster (>100 nodes) has a faulty hd  (Listing the cassandra data storage directoy triggers an input/output error).
> Whenever this occurs, I see many timeoutexceptions in our application on various nodes which cause everything to run very very slowly. Keyrange scans just timeout and will sometimes never succeed. If I stop cassandra on the faulty node, everything runs normal again.
> It would be great to have some kind of monitoring thread in cassandra which marks a node as "down" if there are multiple read/write errors to the data directories. A single faulty hd on 1 node shouldn't affect global cluster performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2394) Faulty hd kills cluster performance

Posted by "Thibaut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028227#comment-13028227 ] 

Thibaut edited comment on CASSANDRA-2394 at 5/3/11 1:45 PM:
------------------------------------------------------------

I have the same problem again (with dynamic snitch enabled this time). The cluster won't respont to any queries anymore. Killing the node brings the cluster back to life.

There are only very few Commands and Responses being processed, no exceptions in log. Kern.log is full of hd read errors.

root@intr2n18:~# /software/cassandra/bin/nodetool -h localhost netstats
Mode: Normal
Not sending any streams.
Not receiving any streams.
Pool Name                    Active   Pending      Completed
Commands                        n/a         0        4593983
Responses                       n/a         0        5276499

Is it possible to port this patch back to 0.7? Certainly everybody running cassandra on bigger clusters on non raided hd's is affected by this.

      was (Author: tbritz):
    I have the same problem again (with dynamic snitch enabled this time). The cluster won't respont to any queries anymore.

There are only very few Commands and Responses being processed, no exceptions in log. Kern.log is full of hd read errors.

root@intr2n18:~# /software/cassandra/bin/nodetool -h localhost netstats
Mode: Normal
Not sending any streams.
Not receiving any streams.
Pool Name                    Active   Pending      Completed
Commands                        n/a         0        4593983
Responses                       n/a         0        5276499

Is it possible to port this patch back to 0.7? Certainly everybody running cassandra on bigger clusters on non raided hd's is affected by this.
  
> Faulty hd kills cluster performance
> -----------------------------------
>
>                 Key: CASSANDRA-2394
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>            Priority: Minor
>             Fix For: 0.7.6
>
>
> Hi,
> About every week, a node from our main cluster (>100 nodes) has a faulty hd  (Listing the cassandra data storage directoy triggers an input/output error).
> Whenever this occurs, I see many timeoutexceptions in our application on various nodes which cause everything to run very very slowly. Keyrange scans just timeout and will sometimes never succeed. If I stop cassandra on the faulty node, everything runs normal again.
> It would be great to have some kind of monitoring thread in cassandra which marks a node as "down" if there are multiple read/write errors to the data directories. A single faulty hd on 1 node shouldn't affect global cluster performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2394) Faulty hd kills cluster performance

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018906#comment-13018906 ] 

Brandon Williams commented on CASSANDRA-2394:
---------------------------------------------

Thibaut,

Can you provide more information on how long the degradation occurs and how many nodes are coordinators?

> Faulty hd kills cluster performance
> -----------------------------------
>
>                 Key: CASSANDRA-2394
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>            Priority: Minor
>             Fix For: 0.7.5
>
>
> Hi,
> About every week, a node from our main cluster (>100 nodes) has a faulty hd  (Listing the cassandra data storage directoy triggers an input/output error).
> Whenever this occurs, I see many timeoutexceptions in our application on various nodes which cause everything to run very very slowly. Keyrange scans just timeout and will sometimes never succeed. If I stop cassandra on the faulty node, everything runs normal again.
> It would be great to have some kind of monitoring thread in cassandra which marks a node as "down" if there are multiple read/write errors to the data directories. A single faulty hd on 1 node shouldn't affect global cluster performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CASSANDRA-2394) Faulty hd kills cluster performance

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-2394.
---------------------------------------

       Resolution: Not A Problem
    Fix Version/s:     (was: 0.7.7)

> Faulty hd kills cluster performance
> -----------------------------------
>
>                 Key: CASSANDRA-2394
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>            Priority: Minor
>
> Hi,
> About every week, a node from our main cluster (>100 nodes) has a faulty hd  (Listing the cassandra data storage directoy triggers an input/output error).
> Whenever this occurs, I see many timeoutexceptions in our application on various nodes which cause everything to run very very slowly. Keyrange scans just timeout and will sometimes never succeed. If I stop cassandra on the faulty node, everything runs normal again.
> It would be great to have some kind of monitoring thread in cassandra which marks a node as "down" if there are multiple read/write errors to the data directories. A single faulty hd on 1 node shouldn't affect global cluster performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2394) Faulty hd kills cluster performance

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012057#comment-13012057 ] 

Gary Dusbabek commented on CASSANDRA-2394:
------------------------------------------

This probably belongs in 0.8.

> Faulty hd kills cluster performance
> -----------------------------------
>
>                 Key: CASSANDRA-2394
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>            Priority: Minor
>             Fix For: 0.7.5
>
>
> Hi,
> About every week, a node from our main cluster (>100 nodes) has a faulty hd  (Listing the cassandra data storage directoy triggers an input/output error).
> Whenever this occurs, I see many timeoutexceptions in our application on various nodes which cause everything to run very very slowly. Keyrange scans just timeout and will sometimes never succeed. If I stop cassandra on the faulty node, everything runs normal again.
> It would be great to have some kind of monitoring thread in cassandra which marks a node as "down" if there are multiple read/write errors to the data directories. A single faulty hd on 1 node shouldn't affect global cluster performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2394) Faulty hd kills cluster performance

Posted by "Thibaut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032294#comment-13032294 ] 

Thibaut commented on CASSANDRA-2394:
------------------------------------

Another hd died.

This time, there were ERRORS in the log:


ERROR [ReadStage:336] 2011-05-11 14:35:53,232 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[ReadStage:336,5,main]
java.lang.RuntimeException: java.lang.RuntimeException: corrupt sstable
        at org.apache.cassandra.service.RangeSliceVerbHandler.doVerb(RangeSliceVerbHandler.java:60)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.RuntimeException: corrupt sstable
        at org.apache.cassandra.io.sstable.SSTableScanner.seekTo(SSTableScanner.java:104)
        at org.apache.cassandra.db.RowIteratorFactory.getIterator(RowIteratorFactory.java:96)
        at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1447)
        at org.apache.cassandra.service.RangeSliceVerbHandler.doVerb(RangeSliceVerbHandler.java:49)
        ... 4 more
Caused by: java.io.IOException: Input/output error
        at java.io.RandomAccessFile.readBytes(Native Method)
        at java.io.RandomAccessFile.read(RandomAccessFile.java:322)
        at org.apache.cassandra.io.util.BufferedRandomAccessFile.reBuffer(BufferedRandomAccessFile.java:206)
        at org.apache.cassandra.io.util.BufferedRandomAccessFile.seek(BufferedRandomAccessFile.java:347)
        at org.apache.cassandra.io.sstable.SSTableScanner.seekTo(SSTableScanner.java:99)
        ... 7 more

Together with:

 WARN [ScheduledTasks:1] 2011-05-11 12:24:35,725 MessagingService.java (line 504) Dropped 10 READ messages in the last 5000ms
 WARN [ScheduledTasks:1] 2011-05-11 12:24:35,725 MessagingService.java (line 504) Dropped 17 RANGE_SLICE messages in the last 5000ms
 INFO [ScheduledTasks:1] 2011-05-11 12:24:35,726 StatusLogger.java (line 51) Pool Name                    Active   Pending
 INFO [ScheduledTasks:1] 2011-05-11 12:24:35,726 StatusLogger.java (line 66) ReadStage                        16      1310
 INFO [ScheduledTasks:1] 2011-05-11 12:24:35,726 StatusLogger.java (line 66) RequestResponseStage              0         0
 INFO [ScheduledTasks:1] 2011-05-11 12:24:35,726 StatusLogger.java (line 66) ReadRepairStage                   0         0
 INFO [ScheduledTasks:1] 2011-05-11 12:24:35,726 StatusLogger.java (line 66) MutationStage                     0         0
 INFO [ScheduledTasks:1] 2011-05-11 12:24:35,727 StatusLogger.java (line 66) GossipStage                       0         0
 INFO [ScheduledTasks:1] 2011-05-11 12:24:35,727 StatusLogger.java (line 66) AntiEntropyStage                  0         0
 INFO [ScheduledTasks:1] 2011-05-11 12:24:35,727 StatusLogger.java (line 66) MigrationStage                    0         0
 INFO [ScheduledTasks:1] 2011-05-11 12:24:35,727 StatusLogger.java (line 66) StreamStage                       0         0
 INFO [ScheduledTasks:1] 2011-05-11 12:24:35,727 StatusLogger.java (line 66) MemtablePostFlusher               0         0
 INFO [ScheduledTasks:1] 2011-05-11 12:24:35,727 StatusLogger.java (line 66) FILEUTILS-DELETE-POOL             0         0
 INFO [ScheduledTasks:1] 2011-05-11 12:24:35,728 StatusLogger.java (line 66) FlushWriter                       0         0
 INFO [ScheduledTasks:1] 2011-05-11 12:24:35,728 StatusLogger.java (line 66) MiscStage                         0         0
 INFO [ScheduledTasks:1] 2011-05-11 12:24:35,728 StatusLogger.java (line 66) FlushSorter                       0         0
 INFO [ScheduledTasks:1] 2011-05-11 12:24:35,728 StatusLogger.java (line 66) InternalResponseStage             0         0
 INFO [ScheduledTasks:1] 2011-05-11 12:24:35,728 StatusLogger.java (line 66) HintedHandoff                     0         0
 INFO [ScheduledTasks:1] 2011-05-11 12:24:35,728 StatusLogger.java (line 70) CompactionManager               n/a         0
 INFO [ScheduledTasks:1] 2011-05-11 12:24:35,729 StatusLogger.java (line 82) MessagingService                n/a       0,0





> Faulty hd kills cluster performance
> -----------------------------------
>
>                 Key: CASSANDRA-2394
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>            Priority: Minor
>             Fix For: 0.7.6
>
>
> Hi,
> About every week, a node from our main cluster (>100 nodes) has a faulty hd  (Listing the cassandra data storage directoy triggers an input/output error).
> Whenever this occurs, I see many timeoutexceptions in our application on various nodes which cause everything to run very very slowly. Keyrange scans just timeout and will sometimes never succeed. If I stop cassandra on the faulty node, everything runs normal again.
> It would be great to have some kind of monitoring thread in cassandra which marks a node as "down" if there are multiple read/write errors to the data directories. A single faulty hd on 1 node shouldn't affect global cluster performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2394) Faulty hd kills cluster performance

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034771#comment-13034771 ] 

Jonathan Ellis commented on CASSANDRA-2394:
-------------------------------------------

Yes. Here's what the cli has to say about that:

{noformat}
          Note that disabling read repair entirely means that the dynamic snitch
          will not have any latency information from all the replicas to recognize
          when one is performing worse than usual.
{noformat}

> Faulty hd kills cluster performance
> -----------------------------------
>
>                 Key: CASSANDRA-2394
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>            Priority: Minor
>             Fix For: 0.7.7
>
>
> Hi,
> About every week, a node from our main cluster (>100 nodes) has a faulty hd  (Listing the cassandra data storage directoy triggers an input/output error).
> Whenever this occurs, I see many timeoutexceptions in our application on various nodes which cause everything to run very very slowly. Keyrange scans just timeout and will sometimes never succeed. If I stop cassandra on the faulty node, everything runs normal again.
> It would be great to have some kind of monitoring thread in cassandra which marks a node as "down" if there are multiple read/write errors to the data directories. A single faulty hd on 1 node shouldn't affect global cluster performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2394) Faulty hd kills cluster performance

Posted by "Thibaut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028227#comment-13028227 ] 

Thibaut commented on CASSANDRA-2394:
------------------------------------

I have the same problem again (with dynamic snitch enabled this time). The cluster won't respont to any queries anymore.

There are only very few Commands and Responses being processed, no exceptions in log. Kern.log is full of hd read errors.

root@intr2n18:~# /software/cassandra/bin/nodetool -h localhost netstats
Mode: Normal
Not sending any streams.
Not receiving any streams.
Pool Name                    Active   Pending      Completed
Commands                        n/a         0        4593983
Responses                       n/a         0        5276499

Is it possible to port this patch back to 0.7? Certainly everybody running cassandra on bigger clusters on non raided hd's is affected by this.

> Faulty hd kills cluster performance
> -----------------------------------
>
>                 Key: CASSANDRA-2394
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>            Priority: Minor
>             Fix For: 0.7.6
>
>
> Hi,
> About every week, a node from our main cluster (>100 nodes) has a faulty hd  (Listing the cassandra data storage directoy triggers an input/output error).
> Whenever this occurs, I see many timeoutexceptions in our application on various nodes which cause everything to run very very slowly. Keyrange scans just timeout and will sometimes never succeed. If I stop cassandra on the faulty node, everything runs normal again.
> It would be great to have some kind of monitoring thread in cassandra which marks a node as "down" if there are multiple read/write errors to the data directories. A single faulty hd on 1 node shouldn't affect global cluster performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2394) Faulty hd kills cluster performance

Posted by "Thibaut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019251#comment-13019251 ] 

Thibaut commented on CASSANDRA-2394:
------------------------------------



In our case, the degredation never stopped. It didn't matter if we connected to the node itself, or another node in the cluster. As soon as we killed the offending node, cluster performance returned to normal again. We also use custom Hector loadbalancing policy (always prefering to connect to the local node), before trying another node.

Not sure about what you mean with coordinator nodes? That cluster had 20 nodes, replication level 3.

I will look into it more closely when we have a similar problem in the future.



> Faulty hd kills cluster performance
> -----------------------------------
>
>                 Key: CASSANDRA-2394
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>            Priority: Minor
>             Fix For: 0.7.5
>
>
> Hi,
> About every week, a node from our main cluster (>100 nodes) has a faulty hd  (Listing the cassandra data storage directoy triggers an input/output error).
> Whenever this occurs, I see many timeoutexceptions in our application on various nodes which cause everything to run very very slowly. Keyrange scans just timeout and will sometimes never succeed. If I stop cassandra on the faulty node, everything runs normal again.
> It would be great to have some kind of monitoring thread in cassandra which marks a node as "down" if there are multiple read/write errors to the data directories. A single faulty hd on 1 node shouldn't affect global cluster performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2394) Faulty hd kills cluster performance

Posted by "Thibaut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029630#comment-13029630 ] 

Thibaut edited comment on CASSANDRA-2394 at 5/5/11 10:45 PM:
-------------------------------------------------------------

I did grep the cassandra log for errors and no error showed up. 
There were errors in kern.log. (reading errors) On another server before, after seeing the reading errors, I did try to copy the cassandra data directory and it caused input/output errors. I didn't try this on the latest server where this error occured, but I suppose it would have caused input/output errors as well.

Yes, it will still repond from any node (our application connects locally first, and these nodes are also not replying), but very very very slowly (e.g. maybe 1% of normal cluster performance) . Before it did show up on the entire cluster a massive commands/responds pending queue (> 1000, version 0.7.4 on all 20 nodes as far as I can remember). Normally the queue has about 0-5 entries at most. 
I didn't output it when it occured on the latest server. Iterating over a table with hector will be continoulsy paused by timeout exceptions. Even if our application is turned off and this is the only application running. I'm also using the dynamic snitch with 0.8 badness threshold.
Replication level is set to 3 (EDIT 3 that ist). So if one node goes down, it should still work as 2 nodes are still available. 

And as I said before, stopping cassandra on the faulty node will nearly instantly return proper cluster performance.

What can help you? Shall I do a jstack trace next time the error occurs? But this can take some time until a hd dies. Do you know of a tool where I can simulate a hd error (or very slow read)?


      was (Author: tbritz):
    I did grep the cassandra log for errors and no error showed up. 
There were errors in kern.log. (reading errors) On another server before, after seeing the reading errors, I did try to copy the cassandra data directory and it caused input/output errors. I didn't try this on the latest server where this error occured, but I suppose it would have caused input/output errors as well.

Yes, it will still repond from any node (our application connects locally first, and these nodes are also not replying), but very very very slowly (e.g. maybe 1% of normal cluster performance) . Before it did show up on the entire cluster a massive commands/responds pending queue (> 1000, version 0.7.4 on all 20 nodes as far as I can remember). Normally the queue has about 0-5 entries at most. 
I didn't output it when it occured on the latest server. Iterating over a table with hector will be continoulsy paused by timeout exceptions. Even if our application is turned off and this is the only application running. I'm also using the dynamic snitch with 0.8 badness threshold.
Replication level is set to 4. So if one node goes down, it should still work as 2 nodes are still available. 

And as I said before, stopping cassandra on the faulty node will nearly instantly return proper cluster performance.

What can help you? Shall I do a jstack trace next time the error occurs? But this can take some time until a hd dies. Do you know of a tool where I can simulate a hd error (or very slow read)?

  
> Faulty hd kills cluster performance
> -----------------------------------
>
>                 Key: CASSANDRA-2394
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>            Priority: Minor
>             Fix For: 0.7.6
>
>
> Hi,
> About every week, a node from our main cluster (>100 nodes) has a faulty hd  (Listing the cassandra data storage directoy triggers an input/output error).
> Whenever this occurs, I see many timeoutexceptions in our application on various nodes which cause everything to run very very slowly. Keyrange scans just timeout and will sometimes never succeed. If I stop cassandra on the faulty node, everything runs normal again.
> It would be great to have some kind of monitoring thread in cassandra which marks a node as "down" if there are multiple read/write errors to the data directories. A single faulty hd on 1 node shouldn't affect global cluster performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira