You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by "B Wyatt (JIRA)" <ji...@apache.org> on 2011/09/09 19:01:09 UTC

[jira] [Created] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
-----------------------------------------------------------------------------------------------

                 Key: TS-949
                 URL: https://issues.apache.org/jira/browse/TS-949
             Project: Traffic Server
          Issue Type: Bug
          Components: Cache
    Affects Versions: 3.1.0
         Environment: Multi-volume cache with apparently faulty drives
            Reporter: B Wyatt


The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.

Background:
The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.

The issue:
Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 

An Example:
||Disk||Draft Sequence||
|A|1,4,7,5|
|B|4,2,8,1|
|C|3,7,5,2|
Pre-failure Hash Table after 2 rounds of draft:
|A|B|C|B|C|?|A|?|

Post-failure of drive B Hash Table after 3 rounds of draft:
|A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|

Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

Posted by "John Plevyak (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163071#comment-13163071 ] 

John Plevyak commented on TS-949:
---------------------------------

There is a relatively easy fix for this.... I'll send out a patch.

                
> key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
> -----------------------------------------------------------------------------------------------
>
>                 Key: TS-949
>                 URL: https://issues.apache.org/jira/browse/TS-949
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>             Fix For: 3.1.2
>
>
> The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

Posted by "John Plevyak (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Plevyak updated TS-949:
----------------------------

    Attachment: TS-949-jp-1.patch

This makes the resulting assignment more stable at the cost of making the distribution probabilistic.  That could be tightened up at the cost of more complexity, but I don't see a good reason. Please read it over...
                
> key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
> -----------------------------------------------------------------------------------------------
>
>                 Key: TS-949
>                 URL: https://issues.apache.org/jira/browse/TS-949
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: 3.1.2
>
>         Attachments: TS-949-jp-1.patch
>
>
> The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

Posted by "John Plevyak (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169597#comment-13169597 ] 

John Plevyak commented on TS-949:
---------------------------------

I am going to leave this bug open to address the issue of a new volume and loss of old content or until a new bug is created for that issue (which hopefully will reference this one to maintain context).
                
> key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
> -----------------------------------------------------------------------------------------------
>
>                 Key: TS-949
>                 URL: https://issues.apache.org/jira/browse/TS-949
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: 3.1.2
>
>         Attachments: TS-949-jp-1.patch, TS-949-jp2.patch, TS949-BW-p1.patch, explicit-pair.patch
>
>
> The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

Posted by "B Wyatt (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167597#comment-13167597 ] 

B Wyatt edited comment on TS-949 at 12/12/11 5:05 PM:
------------------------------------------------------

Thanks John.  I think the new patch should be more stable.  I apologize for the misread of the previous patch, all of my volumes are matched in size so I had erroneously tuned out the inclusion of vol->len in the initial value of forvol[i].

All of this digging has brought up a new related issue (that I am pretty sure we cannot address at this level): Object loss when adding volumes.  The hash is now consistent, however when a new volume supersedes an existing volume in the hash, any object that maps to that bucket but currently stored on the old volume will become inaccessible.  I will probably create a new issue for that as this one is solved in my book.  

[Edit: removed statement about comments... they are there... my coffee is not, apparently]
                
      was (Author: wanderingbort):
    Thanks John.  I think the new patch should be more stable.  I apologize for the misread of the previous patch, all of my volumes are matched in size so I had erroneously tuned out the inclusion of vol->len in the initial value of forvol[i].

While I am not an enforcer of code quality, I think the particulars of this method should at the very least be documented in the patched code.  I'll let someone else decide whether it is worth the effort to "pretty" it up.

All of this digging has brought up a new related issue (that I am pretty sure we cannot address at this level): Object loss when adding volumes.  The hash is now consistent, however when a new volume supersedes an existing volume in the hash, any object that maps to that bucket but currently stored on the old volume will become inaccessible.  I will probably create a new issue for that as this one is solved in my book.  
                  
> key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
> -----------------------------------------------------------------------------------------------
>
>                 Key: TS-949
>                 URL: https://issues.apache.org/jira/browse/TS-949
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: 3.1.2
>
>         Attachments: TS-949-jp-1.patch, TS-949-jp2.patch, TS949-BW-p1.patch
>
>
> The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

Posted by "B Wyatt (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

B Wyatt updated TS-949:
-----------------------

    Attachment: TS949-BW-p1.patch

Attaching a modified version of JP's patch that removes the unnecessary(?) multiplication and division.  Additionally, it uses the volume index mapping.
                
> key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
> -----------------------------------------------------------------------------------------------
>
>                 Key: TS-949
>                 URL: https://issues.apache.org/jira/browse/TS-949
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: 3.1.2
>
>         Attachments: TS-949-jp-1.patch, TS949-BW-p1.patch
>
>
> The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

Posted by "Brian Geffon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Geffon reopened TS-949:
-----------------------------

      Assignee: Brian Geffon  (was: John Plevyak)
    
> key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
> -----------------------------------------------------------------------------------------------
>
>                 Key: TS-949
>                 URL: https://issues.apache.org/jira/browse/TS-949
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>            Assignee: Brian Geffon
>             Fix For: 3.1.2
>
>         Attachments: TS-949-jp-1.patch, TS-949-jp2.patch, TS949-BW-p1.patch, explicit-pair.patch
>
>
> The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

Posted by "B Wyatt (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163641#comment-13163641 ] 

B Wyatt commented on TS-949:
----------------------------

Thanks John, this scheme certainly solves disks failing/being added to the cache in a deterministic way.  I tend to agree with you that the extra effort to guarantee an equal distribution of hash buckets is of questionable value.  

It does look like there is some cruft in the patch.  Score is multiplied by a value which is almost-constant across the volumes and divided by an integer constant.  The comments indicate that this may have been an attempt to even out the distribution, but as it would cause the same type of inconsistency on disk loss as the previous scheme I assume it was disabled on purpose (by not decrementing forvol[*] ever).  Eitherway, the result of the comparison will currently be the same as the un-multiplied un-divided comparison if the integer truncation is not important. 

Also I think "ttable[i] = top;" should be "ttable[i] = mapping[top];" as the range of valid volume indices has holes in the case that a disk(s) have been declare bad.
                
> key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
> -----------------------------------------------------------------------------------------------
>
>                 Key: TS-949
>                 URL: https://issues.apache.org/jira/browse/TS-949
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: 3.1.2
>
>         Attachments: TS-949-jp-1.patch
>
>
> The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

Posted by "B Wyatt (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169622#comment-13169622 ] 

B Wyatt commented on TS-949:
----------------------------

Opened a new bug TS-1050, that refers to this bug and addresses the data loss on volume addition problem. 
                
> key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
> -----------------------------------------------------------------------------------------------
>
>                 Key: TS-949
>                 URL: https://issues.apache.org/jira/browse/TS-949
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: 3.1.2
>
>         Attachments: TS-949-jp-1.patch, TS-949-jp2.patch, TS949-BW-p1.patch, explicit-pair.patch
>
>
> The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

Posted by "John Plevyak (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167680#comment-13167680 ] 

John Plevyak commented on TS-949:
---------------------------------

I agree that this code is too raw.  I wanted to get the bones of a solution out there, but I am definitely not wedded to the implementation.

RE: when a new volume is added; one solution is to probe back into previous configurations (rather than, say, just the second most likely location).  This is the approach that the clustering code takes (see cluster/ClusterConfig.cc configuration_add_machine, cluster_machine_depth_list).

I think that this code and that code should be merged.   The new hash table generator from this code combined with the history mechanism from that code.
The alternative in both cases is to just return the first N most likely locations.  This is probably OK for the cache because it would be a local in-memory probe 99.9% of the time, but would more expensive for clustering as it would require going off-node 100% of the time.
                
> key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
> -----------------------------------------------------------------------------------------------
>
>                 Key: TS-949
>                 URL: https://issues.apache.org/jira/browse/TS-949
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: 3.1.2
>
>         Attachments: TS-949-jp-1.patch, TS-949-jp2.patch, TS949-BW-p1.patch, explicit-pair.patch
>
>
> The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

Posted by "Brian Geffon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Geffon resolved TS-949.
-----------------------------

    Resolution: Fixed
    
> key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
> -----------------------------------------------------------------------------------------------
>
>                 Key: TS-949
>                 URL: https://issues.apache.org/jira/browse/TS-949
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>            Assignee: Brian Geffon
>             Fix For: 3.1.2
>
>         Attachments: TS-949-jp-1.patch, TS-949-jp2.patch, TS949-BW-p1.patch, explicit-pair.patch
>
>
> The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

Posted by "John Plevyak (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Plevyak updated TS-949:
----------------------------

    Attachment: TS-949-jp2.patch

This patch uses a table of random number selected based on the size of the disk partition and selects the closest to the center of each "bucket" as the bucket owner.  This is stable for inserts, removes and never switches between disks which remain present.  This should address the issue.
                
> key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
> -----------------------------------------------------------------------------------------------
>
>                 Key: TS-949
>                 URL: https://issues.apache.org/jira/browse/TS-949
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: 3.1.2
>
>         Attachments: TS-949-jp-1.patch, TS-949-jp2.patch, TS949-BW-p1.patch
>
>
> The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

Posted by "John Plevyak (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164579#comment-13164579 ] 

John Plevyak commented on TS-949:
---------------------------------

I admit I haven't looked at this code in a long time, but isn't the vol->len the length of the volume?  What we want is a hash function H(h) which distributes the key proportional to the size of the volumes.  Let's say we have disk A: 1TB disk B: 200GB and disk C: 500GB.  With your changes they would all get he same number of keys, so that disk B would quickly fill up and start to loose documents while disk A was still mostly empty.

In order to handle this, the random numbers need to be scaled down so that they allocate the right proportions.  This will not cause the earlier problem because it preserves the pairwise order between any two disks, that is if B drops but A and C are still present and A won the first time it will win again (because the random values will be scale with the same proportion and if x > y then x * C > y * C for all 0 <= C).

That said, the proportionality multiplier I was using wasn't right.  I'll send out a new patch with the right multiplier.  Size needs to be
proportional to (total*total)/size rather than just size.

Thanx for the feedback, please check out the new patch and let's work out some examples to make sure it does what we want.

The new multiplier should allocate proportional to the size of each vol and not have any inconsistencies.
                
> key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
> -----------------------------------------------------------------------------------------------
>
>                 Key: TS-949
>                 URL: https://issues.apache.org/jira/browse/TS-949
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: 3.1.2
>
>         Attachments: TS-949-jp-1.patch, TS949-BW-p1.patch
>
>
> The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

Posted by "John Plevyak (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166362#comment-13166362 ] 

John Plevyak commented on TS-949:
---------------------------------

I asked around and it turns out that there is no simple proportion which does not depend on all smaller proportions.  The new patch will use instead use the standard "search for the closest random number" method.
                
> key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
> -----------------------------------------------------------------------------------------------
>
>                 Key: TS-949
>                 URL: https://issues.apache.org/jira/browse/TS-949
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: 3.1.2
>
>         Attachments: TS-949-jp-1.patch, TS949-BW-p1.patch
>
>
> The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

Posted by "B Wyatt (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167597#comment-13167597 ] 

B Wyatt commented on TS-949:
----------------------------

Thanks John.  I think the new patch should be more stable.  I apologize for the misread of the previous patch, all of my volumes are matched in size so I had erroneously tuned out the inclusion of vol->len in the initial value of forvol[i].

While I am not an enforcer of code quality, I think the particulars of this method should at the very least be documented in the patched code.  I'll let someone else decide whether it is worth the effort to "pretty" it up.

All of this digging has brought up a new related issue (that I am pretty sure we cannot address at this level): Object loss when adding volumes.  The hash is now consistent, however when a new volume supersedes an existing volume in the hash, any object that maps to that bucket but currently stored on the old volume will become inaccessible.  I will probably create a new issue for that as this one is solved in my book.  
                
> key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
> -----------------------------------------------------------------------------------------------
>
>                 Key: TS-949
>                 URL: https://issues.apache.org/jira/browse/TS-949
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: 3.1.2
>
>         Attachments: TS-949-jp-1.patch, TS-949-jp2.patch, TS949-BW-p1.patch
>
>
> The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

Posted by "John Plevyak (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169589#comment-13169589 ] 

John Plevyak commented on TS-949:
---------------------------------

Committed revision 1214409.

                
> key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
> -----------------------------------------------------------------------------------------------
>
>                 Key: TS-949
>                 URL: https://issues.apache.org/jira/browse/TS-949
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: 3.1.2
>
>         Attachments: TS-949-jp-1.patch, TS-949-jp2.patch, TS949-BW-p1.patch, explicit-pair.patch
>
>
> The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

Posted by "Leif Hedstrom (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leif Hedstrom updated TS-949:
-----------------------------

    Fix Version/s:     (was: 3.1.1)
                   3.1.2

Moving all unassigned bugs out to 3.1.2
                
> key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
> -----------------------------------------------------------------------------------------------
>
>                 Key: TS-949
>                 URL: https://issues.apache.org/jira/browse/TS-949
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>             Fix For: 3.1.2
>
>
> The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

Posted by "Igor Galić (Updated JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Igor Galić updated TS-949:
--------------------------

    Backport to Version: 3.0.5
    
> key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
> -----------------------------------------------------------------------------------------------
>
>                 Key: TS-949
>                 URL: https://issues.apache.org/jira/browse/TS-949
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: 3.1.2
>
>         Attachments: TS-949-jp-1.patch, TS-949-jp2.patch, TS949-BW-p1.patch, explicit-pair.patch
>
>
> The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Assigned] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

Posted by "Leif Hedstrom (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leif Hedstrom reassigned TS-949:
--------------------------------

    Assignee: John Plevyak
    
> key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
> -----------------------------------------------------------------------------------------------
>
>                 Key: TS-949
>                 URL: https://issues.apache.org/jira/browse/TS-949
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: 3.1.2
>
>
> The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

Posted by "B Wyatt (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

B Wyatt updated TS-949:
-----------------------

    Attachment: explicit-pair.patch

I made a quick patch which converts the implied pairing of elements in the rtable array into an explicit pair (it applies on top of TS-949-jp2.patch).  

It is a non-functional change however, I thought it may make future review/modification a little easier.

Feel free to toss it in the circular file, it won't hurt my feelings. 
                
> key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
> -----------------------------------------------------------------------------------------------
>
>                 Key: TS-949
>                 URL: https://issues.apache.org/jira/browse/TS-949
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: 3.1.2
>
>         Attachments: TS-949-jp-1.patch, TS-949-jp2.patch, TS949-BW-p1.patch, explicit-pair.patch
>
>
> The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

Posted by "Brian Geffon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Geffon reassigned TS-949:
-------------------------------

    Assignee: John Plevyak  (was: Brian Geffon)

Sorry, grabbed the wrong ticket.
                
> key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
> -----------------------------------------------------------------------------------------------
>
>                 Key: TS-949
>                 URL: https://issues.apache.org/jira/browse/TS-949
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>            Assignee: John Plevyak
>             Fix For: 3.1.2
>
>         Attachments: TS-949-jp-1.patch, TS-949-jp2.patch, TS949-BW-p1.patch, explicit-pair.patch
>
>
> The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TS-949) key->volume hash table is not consistent when a disk is marked as bad or removed due to failure

Posted by "Leif Hedstrom (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leif Hedstrom updated TS-949:
-----------------------------

    Fix Version/s: 3.1.1

> key->volume hash table is not consistent when a disk is marked as bad or removed due to failure
> -----------------------------------------------------------------------------------------------
>
>                 Key: TS-949
>                 URL: https://issues.apache.org/jira/browse/TS-949
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.1.0
>         Environment: Multi-volume cache with apparently faulty drives
>            Reporter: B Wyatt
>             Fix For: 3.1.1
>
>
> The method for resolving collisions when distributing hash-table space to volumes for the object_key->volume hash table creates inconsistency when a disk is determined to be bad, or when a failed disk is removed from the volume.config.
> Background:
> The hash space is distributed by round robin draft where each volume "drafts" a random index in the hash table until the hash space is exhausted.  The random order in which a given volume drafts hash table slots is consistent across reboot/crash/disk-failure, however when a volume attempts to draft a slot which has already been occupied, it skips to its next random pick and attempts to draft that slot until it finds an open slot.  This ensures that the hash is partitioned evenly between volumes.
> The issue:
> Resolving slot contention breaks the consistency as it is dependent on the order that the volumes draft.  When rebuilding the hash after disk failure or reboot with fewer drives, a volume may secure an index that was previously occupied by the dead-disk.  In the old hash, the surviving volume would have selected another random index due to contention.  If this index is taken, by the next draft round it will represent an inconsistent key->volume result.  The effects of one inconsistency will then cascade as whichever volume occupies that index after removing a dead disk is now behind on its draft sequence as well. 
> An Example:
> ||Disk||Draft Sequence||
> |A|1,4,7,5|
> |B|4,2,8,1|
> |C|3,7,5,2|
> Pre-failure Hash Table after 2 rounds of draft:
> |A|B|C|B|C|?|A|?|
> Post-failure of drive B Hash Table after 3 rounds of draft:
> |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?|
> Two slots have become inconsistent and more will probably follow.  These inconsistencies become objects stored in a volume but lost to the top level cache for open/lookup.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira