You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Chris Goffinet (JIRA)" <ji...@apache.org> on 2011/02/06 07:10:31 UTC

[jira] Created: (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Provide failure modes if issues with the underlying filesystem of a node
------------------------------------------------------------------------

                 Key: CASSANDRA-2118
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
             Project: Cassandra
          Issue Type: Improvement
    Affects Versions: 0.8
            Reporter: Chris Goffinet
            Assignee: Chris Goffinet


CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:

1) Value '0' means continue on all errors (default)
2) Value '1' means only kill the server if 'reads' fail from drive, writes can continue
3) Value '2' means kill the server if read or write errors.


-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434609#comment-13434609 ] 

Jonathan Ellis commented on CASSANDRA-2118:
-------------------------------------------

bq. The enum is there for logging purposes only

I'd say let's just log the exception object and give it a decent toString.

bq. What if there is no longer an issue

Having the operator clear out the blacklist files on restart isn't unreasonable if that's what he wants.
                
> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Chris Goffinet
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.2
>
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch, 2118-tweaked.txt, CASSANDRA-2118-part1.patch, CASSANDRA-2118-v1.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Goffinet updated CASSANDRA-2118:
--------------------------------------

    Attachment: 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch

Style fix for switch statement.

> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>    Affects Versions: 0.8
>            Reporter: Chris Goffinet
>            Assignee: Chris Goffinet
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Goffinet updated CASSANDRA-2118:
--------------------------------------

    Description: 
CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:

1) Value '0' means continue on all errors (default)
2) Value '1' means only kill the server if 'reads' fail from drive, writes can fail but not kill the server
3) Value '2' means kill the server if read or write errors.


  was:
CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:

1) Value '0' means continue on all errors (default)
2) Value '1' means only kill the server if 'reads' fail from drive, writes can continue
3) Value '2' means kill the server if read or write errors.



> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>    Affects Versions: 0.8
>            Reporter: Chris Goffinet
>            Assignee: Chris Goffinet
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) Value '0' means continue on all errors (default)
> 2) Value '1' means only kill the server if 'reads' fail from drive, writes can fail but not kill the server
> 3) Value '2' means kill the server if read or write errors.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427078#comment-13427078 ] 

Jonathan Ellis commented on CASSANDRA-2118:
-------------------------------------------

My inclination would be to leave it up so that auto-restart watchdogs don't promptly kick it back off again.  Minor advantage would be, we could add a JMX hook for "what disks have failed" and be able to query that.
                
> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Chris Goffinet
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.2
>
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Aleksey Yeschenko (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aleksey Yeschenko updated CASSANDRA-2118:
-----------------------------------------

    Reviewer: jbellis  (was: yukim)
    
> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Chris Goffinet
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.2
>
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2118:
--------------------------------------

    Attachment: 2118-tweaked.txt

Looks reasonable.  Tweaked version attached w/ some minor cleanup.

Other things worth addressing:
- Is there a reason for the FSError.Op enum?  Looks like we don't need it if we just use instanceof instead in handleFSError.
- Instead of trying to catch all the places we iterate sstables, what about either (1) removing unreadable sstables in DataTracker.get[Uncompacting]SSTables or (2) ripping them out of DataTracker when we handle the error?  Either of those seems more foolproof to me.
- Would be nice to persist the blacklisted sstables somehow.  Maybe write a copy to each (other) data directory, so we don't try to read sstables that we've blacklisted, after a restart?
- May be worth adding another option: best_effort_with_repair, where when we detect an unreadable disk we kick off a repair to rebuild that data automatically.
                
> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Chris Goffinet
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.2
>
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch, 2118-tweaked.txt, CASSANDRA-2118-part1.patch, CASSANDRA-2118-v1.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2118:
--------------------------------------

             Reviewer: yukim
          Component/s: Core
    Affects Version/s:     (was: 0.8 beta 1)
        Fix Version/s: 1.2
             Assignee: Aleksey Yeschenko  (was: Chris Goffinet)
    
> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Chris Goffinet
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.2
>
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Aleksey Yeschenko (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aleksey Yeschenko updated CASSANDRA-2118:
-----------------------------------------

    Attachment: CASSANDRA-2118-part1.patch
    
> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Chris Goffinet
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.2
>
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch, CASSANDRA-2118-part1.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Thibaut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028229#comment-13028229 ] 

Thibaut commented on CASSANDRA-2118:
------------------------------------

Could you add another state ("stop") which would kill cassandra?

> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>    Affects Versions: 0.8 beta 1
>            Reporter: Chris Goffinet
>            Assignee: Chris Goffinet
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Aleksey Yeschenko (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434595#comment-13434595 ] 

Aleksey Yeschenko commented on CASSANDRA-2118:
----------------------------------------------

The enum is there for logging purposes only. There used to be two places that logged the error and it was cleaner this way. And since there was the enum already, I used it in the comparison instead of instanceof. If I don't use it in any new places after everything else is done, I'll get rid of the enum.

I like your point 2.

Not sure about the persistence part. What if there is no longer an issue (say, the directory is again available for writes)?
                
> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Chris Goffinet
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.2
>
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch, 2118-tweaked.txt, CASSANDRA-2118-part1.patch, CASSANDRA-2118-v1.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Goffinet updated CASSANDRA-2118:
--------------------------------------

    Attachment: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch

> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>    Affects Versions: 0.8
>            Reporter: Chris Goffinet
>            Assignee: Chris Goffinet
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only kill the server if 'reads' fail from drive, writes can fail but not kill the server
> 3) readwrite - means kill the server if any read or write errors.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Aleksey Yeschenko (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aleksey Yeschenko updated CASSANDRA-2118:
-----------------------------------------

    Attachment: CASSANDRA-2118-v2.patch
    
> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Chris Goffinet
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.2.0
>
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch, 2118-tweaked.txt, CASSANDRA-2118-part1.patch, CASSANDRA-2118-v1.patch, CASSANDRA-2118-v2.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Aleksey Yeschenko (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426973#comment-13426973 ] 

Aleksey Yeschenko commented on CASSANDRA-2118:
----------------------------------------------

Regarding the second option (halting on error) - would it be best to stop the gossiper and rpc server (as the original patch does) or to actually terminate the process (to allow tools like pacemaker/god notice the failure)?
                
> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Chris Goffinet
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.2
>
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021925#comment-13021925 ] 

Stu Hood commented on CASSANDRA-2118:
-------------------------------------

Would you mind rebasing this for trunk when you get the chance?

> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>    Affects Versions: 0.8
>            Reporter: Chris Goffinet
>            Assignee: Chris Goffinet
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Aleksey Yeschenko (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440770#comment-13440770 ] 

Aleksey Yeschenko commented on CASSANDRA-2118:
----------------------------------------------

Removing unreadable sstables in DataTracker.get[Uncompacting]SSTables is not enough because many methods use view.sstables directly, so I had to rip the affected sstables from view.sstables.
When reviewing v2 please look very carefully at DataTracker#maybeRemoveUnreadableSSTables method.
                
> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Chris Goffinet
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.2.0
>
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch, 2118-tweaked.txt, CASSANDRA-2118-part1.patch, CASSANDRA-2118-v1.patch, CASSANDRA-2118-v2.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Jeremiah Jordan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078437#comment-13078437 ] 

Jeremiah Jordan commented on CASSANDRA-2118:
--------------------------------------------

FSReadError is in the parent JIRA task

> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>    Affects Versions: 0.8 beta 1
>            Reporter: Chris Goffinet
>            Assignee: Chris Goffinet
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Aleksey Yeschenko (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aleksey Yeschenko updated CASSANDRA-2118:
-----------------------------------------

    Attachment: CASSANDRA-2118-v1.patch
    
> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Chris Goffinet
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.2
>
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch, CASSANDRA-2118-part1.patch, CASSANDRA-2118-v1.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Thibaut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028229#comment-13028229 ] 

Thibaut edited comment on CASSANDRA-2118 at 5/3/11 1:49 PM:
------------------------------------------------------------

Could you add another state ("stop") which would kill cassandra? Also standart as default is risky, as it can kill an entire cluster (CASSANDRA-2394)


      was (Author: tbritz):
    Could you add another state ("stop") which would kill cassandra?
  
> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>    Affects Versions: 0.8 beta 1
>            Reporter: Chris Goffinet
>            Assignee: Chris Goffinet
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Goffinet updated CASSANDRA-2118:
--------------------------------------

    Description: 
CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:

1) standard - means continue on all errors (default)
2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
3) readwrite - means stop gossip/rpc server if any read or write errors.


  was:
CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:

1) standard - means continue on all errors (default)
2) read - means only kill the server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
3) readwrite - means stop gossip/rpc server if any read or write errors.



> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>    Affects Versions: 0.8
>            Reporter: Chris Goffinet
>            Assignee: Chris Goffinet
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Goffinet updated CASSANDRA-2118:
--------------------------------------

    Description: 
CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:

1) standard - means continue on all errors (default)
2) read - means only kill the server if 'reads' fail from drive, writes can fail but not kill the server
3) readwrite - means kill the server if any read or write errors.


  was:
CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:

1) Value '0' means continue on all errors (default)
2) Value '1' means only kill the server if 'reads' fail from drive, writes can fail but not kill the server
3) Value '2' means kill the server if read or write errors.



> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>    Affects Versions: 0.8
>            Reporter: Chris Goffinet
>            Assignee: Chris Goffinet
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only kill the server if 'reads' fail from drive, writes can fail but not kill the server
> 3) readwrite - means kill the server if any read or write errors.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Goffinet updated CASSANDRA-2118:
--------------------------------------

    Description: 
CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:

1) standard - means continue on all errors (default)
2) read - means only kill the server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
3) readwrite - means stop gossip/rpc server if any read or write errors.


  was:
CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:

1) standard - means continue on all errors (default)
2) read - means only kill the server if 'reads' fail from drive, writes can fail but not kill the server
3) readwrite - means kill the server if any read or write errors.



> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>    Affects Versions: 0.8
>            Reporter: Chris Goffinet
>            Assignee: Chris Goffinet
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only kill the server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027161#comment-13027161 ] 

Jonathan Ellis commented on CASSANDRA-2118:
-------------------------------------------

is this a complete patch?  i don't see FSReadError class anywhere or where we turn IOException (i assume) into FSRE.

> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>    Affects Versions: 0.8 beta 1
>            Reporter: Chris Goffinet
>            Assignee: Chris Goffinet
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284501#comment-13284501 ] 

Jonathan Ellis commented on CASSANDRA-2118:
-------------------------------------------

I don't think we need more than two options.  It's common for disks to become readable-not-writable, but I've never heard of them being writable-not-readable.  Assuming that we address CASSANDRA-2116 at the right level of granularity (the disk) there are two sane options:

# Continue as best we can in the face of errors: If we can't write to a disk, log an error, mark it bad-for-writes, and continue writing to other disks.  If we can't read from a disk, log an error, mark it bad-for-reads-and-writes, and continue serving reads from other disks
# Since option one implies that we can blithely serve up stale data when the most recent version was on the disk that is no longer accessible, I can see the utility of an option to halt on error (which would allow an operator to choose to decommission + rebootstrap to minimize the inconsistencies observed at CL.ONE)
                
> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>    Affects Versions: 0.8 beta 1
>            Reporter: Chris Goffinet
>            Assignee: Chris Goffinet
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434613#comment-13434613 ] 

Jonathan Ellis edited comment on CASSANDRA-2118 at 8/15/12 10:00 AM:
---------------------------------------------------------------------

bq. May be worth adding another option: best_effort_with_repair

Let's save this for a followup after we see how well the blacklisting actually works in production. :)

And since this is the main reason we'd want to persist the blacklist, I'm okay with punting that down the road too.
                
      was (Author: jbellis):
    bq. May be worth adding another option: best_effort_with_repair

Let's save this for a followup after we see how well the blacklisting actually works in production. :)
                  
> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Chris Goffinet
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.2
>
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch, 2118-tweaked.txt, CASSANDRA-2118-part1.patch, CASSANDRA-2118-v1.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Goffinet updated CASSANDRA-2118:
--------------------------------------

    Attachment: 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch

Added log line on startup so we know what mode the operator selected. I also updated cassandra.yaml with the default setting + description of the mode.

> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>    Affects Versions: 0.8
>            Reporter: Chris Goffinet
>            Assignee: Chris Goffinet
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434613#comment-13434613 ] 

Jonathan Ellis commented on CASSANDRA-2118:
-------------------------------------------

bq. May be worth adding another option: best_effort_with_repair

Let's save this for a followup after we see how well the blacklisting actually works in production. :)
                
> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2118
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2118
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Chris Goffinet
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.2
>
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch, 2118-tweaked.txt, CASSANDRA-2118-part1.patch, CASSANDRA-2118-v1.patch
>
>
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira