You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/04/26 14:47:48 UTC

[GitHub] [accumulo] keith-turner opened a new issue #2031: Periodically sanity check tablet metadata

keith-turner opened a new issue #2031:
URL: https://github.com/apache/accumulo/issues/2031


   **Is your feature request related to a problem? Please describe.**
   When a tablet is loaded on a tablet server, it will load everything from the the metadata table for the tablet into memory.  The tablet assumes its the only thing writing to the metadata table from that point on and only writes out updates to the tablet metadata.  When a table closes it will do a sanity check where it reads tablet metadata and compares that to what it has in memory.  The expectation is that they are the same.  For a long assigned tablet, it could be loaded on a tserver for months. If the metadata table and what is in memory were to get out of sync for any reason, it could go undetected for a long time.
   
   **Describe the solution you'd like**
   Before writing the the metadata table, tablets could periodically do this sanity check.  For example they could do the sanity check prior to write if the last sanity check was more than 10 minutes ago.
   
   **Describe alternatives you've considered**
   A tablet server could use a batch scanner to get metadata for all tablets its hosting every X minutes and then use this for a sanity check. This could use #1974.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on issue #2031: Periodically sanity check tablet metadata

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2031:
URL: https://github.com/apache/accumulo/issues/2031#issuecomment-826973456


   One way to do this sanity check and handle race conditions is to have a one update counter for each tablet that is used to detect race conditions.  Each tablet could update this counter prior to writing to the metadata table.  Tablet servers could do the following when doing the sanity check.
   
    1. For each tablet on the tserver get its current update counter, storing this in a map like `Map<KeyExtent, Long>`
    2. For each tablet get its metadata from the metadata table, storing this in a map like `Map<KeyExtent, TabletMetadata>`
    3. For each tablet call a method to check consistency passing in the previously acquired update counter and TabletMetadata.  The tablet could then do the following.
         1. Get the tablet lock
         2. Check if the passed in update counter matches the current update counter.  If not return something to the tserver code indicating a retry is needed for the tablet.
         3. Check if the passed in TabletMetadata matches what the tablet currently has in memory.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner edited a comment on issue #2031: Periodically sanity check tablet metadata

Posted by GitBox <gi...@apache.org>.
keith-turner edited a comment on issue #2031:
URL: https://github.com/apache/accumulo/issues/2031#issuecomment-826900567


   For the concurrency aspect, thinking it would be nice to not try to prevent race conditions but just be able to detect them and retry for the subset of tablets where a race condition happened.  However not exactly sure how to do the detection, need to something that actually converges to avoid infinite retries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] DomGarguilo closed issue #2031: Periodically sanity check tablet metadata

Posted by GitBox <gi...@apache.org>.
DomGarguilo closed issue #2031:
URL: https://github.com/apache/accumulo/issues/2031


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner edited a comment on issue #2031: Periodically sanity check tablet metadata

Posted by GitBox <gi...@apache.org>.
keith-turner edited a comment on issue #2031:
URL: https://github.com/apache/accumulo/issues/2031#issuecomment-826973456


   One way to do this sanity check and handle race conditions is to have a one up update counter for each tablet that is used to detect race conditions.  Each tablet could update this counter prior to writing to the metadata table.  Tablet servers could do the following when doing the sanity check.
   
    1. For each tablet on the tserver get its current update counter, storing this in a map like `Map<KeyExtent, Long>`
    2. For each tablet get its metadata from the metadata table, storing this in a map like `Map<KeyExtent, TabletMetadata>`
    3. For each tablet call a method to check consistency passing in the previously acquired update counter and TabletMetadata.  The tablet could then do the following.
         1. Get the tablet lock
         2. Check if the passed in update counter matches the current update counter.  If not return something to the tserver code indicating a retry is needed for the tablet.
         3. Check if the passed in TabletMetadata matches what the tablet currently has in memory.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on issue #2031: Periodically sanity check tablet metadata

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2031:
URL: https://github.com/apache/accumulo/issues/2031#issuecomment-826900567


   For the concurrency aspect, thinking it would be nice to not try to prevent race conditions but just be able to detect them and retry for the subset of tablets where a race condition happened.  However not exactly sure how to do the detection, need to something that actually converges to avoid live lock.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner edited a comment on issue #2031: Periodically sanity check tablet metadata

Posted by GitBox <gi...@apache.org>.
keith-turner edited a comment on issue #2031:
URL: https://github.com/apache/accumulo/issues/2031#issuecomment-826973456


   One way to do this sanity check and handle race conditions is to have a one up update counter for each tablet that is used to detect race conditions.  Each tablet could update this counter prior to writing to the metadata table while holding the tablet lock.  Tablet servers could do the following when doing the sanity check.
   
    1. For each tablet on the tserver get its current update counter, storing this in a map like `Map<KeyExtent, Long>`
    2. For each tablet get its metadata from the metadata table, storing this in a map like `Map<KeyExtent, TabletMetadata>`
    3. For each tablet call a method to check consistency passing in the previously acquired update counter and TabletMetadata.  The tablet could then do the following.
         1. Get the tablet lock
         2. Check if the passed in update counter matches the current update counter.  If not return something to the tserver code indicating a retry is needed for the tablet.
         3. Check if the passed in TabletMetadata matches what the tablet currently has in memory.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org