You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Keith Turner (JIRA)" <ji...@apache.org> on 2013/03/05 22:24:13 UTC

[jira] [Commented] (ACCUMULO-1152) Add more sanity checks to limit the damage of multiple assignment

    [ https://issues.apache.org/jira/browse/ACCUMULO-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593929#comment-13593929 ] 

Keith Turner commented on ACCUMULO-1152:
----------------------------------------

Just had a realization, before a walog can be used it must be written to the metadata table.  So its likely the metadata table check will prevent a tserver from getting new logs.  But this check is just based on cache.

If a tserver does see a metadata table write fail because it does not hold its lock, it should probably do something.
                
> Add more sanity checks to limit the damage of multiple assignment
> -----------------------------------------------------------------
>
>                 Key: ACCUMULO-1152
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1152
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Keith Turner
>             Fix For: 1.6.0
>
>
> When an issue like ACCUMULO-954 comes along and causes a tablet to be hosted on multiple tablet servers, its nice to have sanity checks that limit the damage this can cause.
> Accumulo already has a sanity check on writes to the metadata table that ensures the tablet server making the write still holds a lock.  This has reliably triggered in cases of multiple assignment bugs.  It would be nice to have more checks like this.  Below are some places I think checks would be useful, are there more?
>  * Tserver attempts to positively check it holds its lock before getting a new walog.
>  * Clients take some action to clear lockless tservers from their metadata table cache.  This would help prevent writing data to a zombie tserver that may lose data, or reading stale data from a zombie tserver.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira