You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2019/12/04 19:18:34 UTC

[GitHub] [accumulo] joshelser commented on issue #1447: Perform lease recovery for wasb filesystem

joshelser commented on issue #1447: Perform lease recovery for wasb filesystem
URL: https://github.com/apache/accumulo/pull/1447#issuecomment-561798128
 
 
   One thing I'm not 100% sure about is the relationship of semantics from AzureBlobStore and HDFS. They both have these things we call "leases", but are their semantics the same?
   
   I can see that for HBase, lease recovery is only done for directories set up for "atomic rename" (aka configured with Page Blobs) https://github.com/apache/hadoop/blob/branch-2.8/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/AzureNativeFileSystemStore.java#L445-L459. However, it seems like this is just an implementation detail for HBase (not that lease recovery requires Page Blobs to be used). I base this finding on https://docs.microsoft.com/en-us/java/api/com.microsoft.azure.storage.blob._cloud_blob.acquirelease?view=azure-java-legacy not saying anything about page/block blobs.
   
   From the presentation I put together the other week, I don't recall Accumulo having any WAL renames that would need to be atomic (in contrast to HBase which moves the WAL once a RS starts working on it).
   
   The above aside: we do want to make sure that fencing the WALs still works for Accumulo, to prevent zombie'd Tservers from causing a ruckus. One thing I haven't been able to figure out is if the following scenario will work as we want it to:
   * Tserver is in half-dead state, not talking to Master, but is able to keep a lease with ABFS (renew'ed every 40s by default)
   * Master whacks that ZNode for the tserver to try to kill it (normally, a max of 60s, but maybe longer until we notice if things are really messed up), and starts reassigning things
   * Tserver doesn't yet observe ZK change, and is still renewing the lease with ABFS
   * A new TServer calls `acquireLease()` on this WAL which is still being renewed by the half-dead Tserver.
   
   I can't figure out from docs/code what the expected outcome of this action is. Does it work like HDFS works (the TServer making the `acquireLease()` call "overriding" the old lease that the zombie tserver is holding)? Maybe you can find someone on the storage team at Azure to ask about the semantics of https://docs.microsoft.com/en-us/java/api/com.microsoft.azure.storage.blob._cloud_blob.acquirelease?view=azure-java-legacy. If not, maybe you can just write a test which simulates the above happening (makes sure that an old client who once held the lease, can no longer append to a file after another clietn called `acquireLease`).
   
   Assuming the semantics for `acquireLease` in ABFS are the same as `recoverLease` in HDFS, I think your change is fine. I'm lamenting the "ugliness" of the if/elseif/else conditional block in LogCloser, but it's not the end of the world.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services