You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "B Wyatt (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2011/12/12 18:07:30 UTC

[jira] [Issue Comment Edited] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

    [ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167597#comment-13167597 ] 

B Wyatt edited comment on TS-949 at 12/12/11 5:05 PM:
------------------------------------------------------

Thanks John.  I think the new patch should be more stable.  I apologize for the misread of the previous patch, all of my volumes are matched in size so I had erroneously tuned out the inclusion of vol->len in the initial value of forvol[i].

All of this digging has brought up a new related issue (that I am pretty sure we cannot address at this level): Object loss when adding volumes.  The hash is now consistent, however when a new volume supersedes an existing volume in the hash, any object that maps to that bucket but currently stored on the old volume will become inaccessible.  I will probably create a new issue for that as this one is solved in my book.  

[Edit: removed statement about comments... they are there... my coffee is not, apparently]
                
      was (Author: wanderingbort):
    Thanks John.  I think the new patch should be more stable.  I apologize for the misread of the previous patch, all of my volumes are matched in size so I had erroneously tuned out the inclusion of vol->len in the initial value of forvol[i].

While I am not an enforcer of code quality, I think the particulars of this method should at the very least be documented in the patched code.  I'll let someone else decide whether it is worth the effort to "pretty" it up.

All of this digging has brought up a new related issue (that I am pretty sure we cannot address at this level): Object loss when adding volumes.  The hash is now consistent, however when a new volume supersedes an existing volume in the hash, any object that maps to that bucket but currently stored on the old volume will become inaccessible.  I will probably create a new issue for that as this one is solved in my book.  
                  
> key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
> -----------------------------------------------------------------------------------------------
>
>                 Key: TS-949
>                 URL: https://issues.apache.org/jira/browse/TS-949
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: 3.1.2
>
>         Attachments: TS-949-jp-1.patch, TS-949-jp2.patch, TS949-BW-p1.patch
>
>
> The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira