You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2018/06/21 18:08:51 UTC

[GitHub] keith-turner opened a new issue #537: Recovery of WAL may see an incomplete set of logs

keith-turner opened a new issue #537: Recovery of WAL may see an incomplete set of logs
URL: https://github.com/apache/accumulo/issues/537
 
 
   Tablet servers track the active set of WALs (write ahead logs) in zookeeper.  When a tablet server dies all WALs listed in zookeeper are used for recovery.  Tablet servers determine which write ahead logs are active based on which tablets reference WALs.  If a tablet server allocates three WALs over time W1, W2, and W3 then its possible that tablets only reference W1 and W3.  If that tablet server dies, then only W1 and W3 would be used for recovery.  However, W2 may contain information that is important to some tablets.   Consider the following data.
   
    * Data in W1 :
      * Mutation for tablet T1 setting rowX:colY=valZ
    * Data in W2 :
       * Mutation for tablet T1 deleting rowX:colY
       * Start Minor Compaction event for T1
       * Finish Minor Compaction event for T2 
    *   Data in W3
      * Other data unrelated to T1
   
   So if the tablet server dies and only W1 and W3 are used for recovery, then tablet T1 will bring back the deleted rowX:colY.  It does this because it does not see the data in W2 during recovery.  If the data in W2 was seen during recovery, then the tablet would know it had minor compacted and no data needed to be recovered.
   
   Discovered this issue as a result of looking into and discussing #535 with @ctubbsii .  This bug only impacts Accumulo 1.8.0 and later.  The bug is a result of the change in 1.8.0 to track WALs per tablet servers instead of per tablet.   Before 1.8.0, the tablet T1 would have had not WALs associated with it after minor compacting.    
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services