You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by mjwall <gi...@git.apache.org> on 2016/06/02 15:52:21 UTC

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

GitHub user mjwall opened a pull request:

    https://github.com/apache/accumulo/pull/107

    ACCUMULO-4157 Bug fix for removing WALs to quickly

    Keep track of first time a tserver is seen down and only remove WALs for that server if past configurated threshhold
    
    Much work left to be done here, but trying to keep the changes to the GarbageCollectWriteAheadLogs class small to get the bug fixed.  I'll create another ticket to refactor and cleanup.
    
    Includes an end to end test calling the collect method simulating a dead tserver.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mjwall/accumulo ACCUMULO-4157

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/accumulo/pull/107.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #107
    
----
commit 8f30dbcd243e5db2b1d17c7620183f9e855ce65a
Author: Michael Wall <mj...@apache.org>
Date:   2016-05-17T20:06:04Z

    ACCUMULO-4157 Bug fix for removing WALs to quickly
    
    Keep track of first time a tserver is seen down and only remove WALs for that server if past configurated threshhold
    
    Much work left to be done here, but trying to keep the changes small to fix the bug.  I'll create another ticket to refactor and cleanup
    
    Includes an end to end test calling the collect method simulating a dead tserver.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65567462
  
    --- Diff: server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java ---
    @@ -170,76 +175,117 @@ boolean holdsLock(HostAndPort addr) {
         }
       }
     
    -  private int removeFiles(Map<String,Path> nameToFileMap, Map<String,ArrayList<Path>> serverToFileMap, Map<String,Path> sortedWALogs, final GCStatus status) {
    -    AccumuloConfiguration conf = ServerConfiguration.getSystemConfiguration(instance);
    +  private AccumuloConfiguration getConfig() {
    +    return ServerConfiguration.getSystemConfiguration(instance);
    +  }
    +
    +  @VisibleForTesting
    +  int removeFiles(Map<String,Path> nameToFileMap, Map<String,ArrayList<Path>> serverToFileMap, Map<String,Path> sortedWALogs, final GCStatus status) {
    +    // TODO: remove nameToFileMap from method signature, not used here I don't think
    +    AccumuloConfiguration conf = getConfig();
         for (Entry<String,ArrayList<Path>> entry : serverToFileMap.entrySet()) {
           if (entry.getKey().isEmpty()) {
    -        // old-style log entry, just remove it
    -        for (Path path : entry.getValue()) {
    -          log.debug("Removing old-style WAL " + path);
    -          try {
    -            if (!useTrash || !fs.moveToTrash(path))
    -              fs.deleteRecursively(path);
    -            status.currentLog.deleted++;
    -          } catch (FileNotFoundException ex) {
    -            // ignored
    -          } catch (IOException ex) {
    -            log.error("Unable to delete wal " + path + ": " + ex);
    -          }
    -        }
    +        removeOldStyleWAL(entry, status);
           } else {
    -        HostAndPort address = AddressUtil.parseAddress(entry.getKey(), false);
    -        if (!holdsLock(address)) {
    -          for (Path path : entry.getValue()) {
    -            log.debug("Removing WAL for offline server " + path);
    -            try {
    -              if (!useTrash || !fs.moveToTrash(path))
    -                fs.deleteRecursively(path);
    -              status.currentLog.deleted++;
    -            } catch (FileNotFoundException ex) {
    -              // ignored
    -            } catch (IOException ex) {
    -              log.error("Unable to delete wal " + path + ": " + ex);
    -            }
    -          }
    -          continue;
    -        } else {
    -          Client tserver = null;
    -          try {
    -            tserver = ThriftUtil.getClient(new TabletClientService.Client.Factory(), address, conf);
    -            tserver.removeLogs(Tracer.traceInfo(), SystemCredentials.get().toThrift(instance), paths2strings(entry.getValue()));
    -            log.debug("deleted " + entry.getValue() + " from " + entry.getKey());
    -            status.currentLog.deleted += entry.getValue().size();
    -          } catch (TException e) {
    -            log.warn("Error talking to " + address + ": " + e);
    -          } finally {
    -            if (tserver != null)
    -              ThriftUtil.returnClient(tserver);
    -          }
    -        }
    +        removeWALFile(entry, conf, status);
           }
         }
    -
         for (Path swalog : sortedWALogs.values()) {
    -      log.debug("Removing sorted WAL " + swalog);
    +      removeSortedWAL(swalog);
    +    }
    +    return 0;
    +  }
    +
    +  @VisibleForTesting
    +  void removeSortedWAL(Path swalog) {
    +    log.debug("Removing sorted WAL " + swalog);
    +    try {
    +      if (!useTrash || !fs.moveToTrash(swalog)) {
    --- End diff --
    
    Maybe lift this into a method? The general "move to trash or delete" is repeated a couple of places.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #107: ACCUMULO-4157 Bug fix for removing WALs to quickly

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on the issue:

    https://github.com/apache/accumulo/pull/107
  
    @mjwall I am finished looking at this +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65651340
  
    --- Diff: server/gc/src/test/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogsTest.java ---
    @@ -234,4 +257,288 @@ public void testIsUUID() {
         assertFalse(GarbageCollectWriteAheadLogs.isUUID("0" + UUID.randomUUID().toString()));
         assertFalse(GarbageCollectWriteAheadLogs.isUUID(null));
       }
    +
    +  @Test
    +  public void testTimeToDeleteTrue() throws InterruptedException {
    +    gcwal.clearFirstSeenDead();
    +    HostAndPort address = HostAndPort.fromString("tserver1:9998");
    +    long wait = AccumuloConfiguration.getTimeInMillis("1s");
    +    gcwal.timeToDelete(address, wait); // to store first seen dead
    +    sleep(wait * 2);
    +    assertTrue(gcwal.timeToDelete(address, wait));
    +  }
    +
    +  @Test
    +  public void testTimeToDeleteFalse() {
    +    HostAndPort address = HostAndPort.fromString("tserver1:9998");
    +    long wait = AccumuloConfiguration.getTimeInMillis("1h");
    +    long t1, t2;
    +    boolean ttd;
    +    do {
    +      t1 = System.nanoTime();
    +      gcwal.clearFirstSeenDead();
    +      gcwal.timeToDelete(address, wait);
    +      ttd = gcwal.timeToDelete(address, wait);
    +      t2 = System.nanoTime();
    +    } while (t2 - t1 > (wait / 2) * 1000000); // as long as it took less than half of the configured wait
    --- End diff --
    
    I saw you used TimeUnit earlier in the patch to convert time.  Could use TimeUnit earlier in this test method to convert `wait` to nanos.  IMO would be slightly more readable than multiplying by 1000000.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by mjwall <gi...@git.apache.org>.
Github user mjwall commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65575725
  
    --- Diff: server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java ---
    @@ -170,76 +175,117 @@ boolean holdsLock(HostAndPort addr) {
         }
       }
     
    -  private int removeFiles(Map<String,Path> nameToFileMap, Map<String,ArrayList<Path>> serverToFileMap, Map<String,Path> sortedWALogs, final GCStatus status) {
    -    AccumuloConfiguration conf = ServerConfiguration.getSystemConfiguration(instance);
    +  private AccumuloConfiguration getConfig() {
    +    return ServerConfiguration.getSystemConfiguration(instance);
    +  }
    +
    +  @VisibleForTesting
    +  int removeFiles(Map<String,Path> nameToFileMap, Map<String,ArrayList<Path>> serverToFileMap, Map<String,Path> sortedWALogs, final GCStatus status) {
    +    // TODO: remove nameToFileMap from method signature, not used here I don't think
    +    AccumuloConfiguration conf = getConfig();
         for (Entry<String,ArrayList<Path>> entry : serverToFileMap.entrySet()) {
           if (entry.getKey().isEmpty()) {
    -        // old-style log entry, just remove it
    -        for (Path path : entry.getValue()) {
    -          log.debug("Removing old-style WAL " + path);
    -          try {
    -            if (!useTrash || !fs.moveToTrash(path))
    -              fs.deleteRecursively(path);
    -            status.currentLog.deleted++;
    -          } catch (FileNotFoundException ex) {
    -            // ignored
    -          } catch (IOException ex) {
    -            log.error("Unable to delete wal " + path + ": " + ex);
    -          }
    -        }
    +        removeOldStyleWAL(entry, status);
           } else {
    -        HostAndPort address = AddressUtil.parseAddress(entry.getKey(), false);
    -        if (!holdsLock(address)) {
    -          for (Path path : entry.getValue()) {
    -            log.debug("Removing WAL for offline server " + path);
    -            try {
    -              if (!useTrash || !fs.moveToTrash(path))
    -                fs.deleteRecursively(path);
    -              status.currentLog.deleted++;
    -            } catch (FileNotFoundException ex) {
    -              // ignored
    -            } catch (IOException ex) {
    -              log.error("Unable to delete wal " + path + ": " + ex);
    -            }
    -          }
    -          continue;
    -        } else {
    -          Client tserver = null;
    -          try {
    -            tserver = ThriftUtil.getClient(new TabletClientService.Client.Factory(), address, conf);
    -            tserver.removeLogs(Tracer.traceInfo(), SystemCredentials.get().toThrift(instance), paths2strings(entry.getValue()));
    -            log.debug("deleted " + entry.getValue() + " from " + entry.getKey());
    -            status.currentLog.deleted += entry.getValue().size();
    -          } catch (TException e) {
    -            log.warn("Error talking to " + address + ": " + e);
    -          } finally {
    -            if (tserver != null)
    -              ThriftUtil.returnClient(tserver);
    -          }
    -        }
    +        removeWALFile(entry, conf, status);
           }
         }
    -
         for (Path swalog : sortedWALogs.values()) {
    -      log.debug("Removing sorted WAL " + swalog);
    +      removeSortedWAL(swalog);
    +    }
    +    return 0;
    +  }
    +
    +  @VisibleForTesting
    +  void removeSortedWAL(Path swalog) {
    +    log.debug("Removing sorted WAL " + swalog);
    +    try {
    +      if (!useTrash || !fs.moveToTrash(swalog)) {
    --- End diff --
    
    I'll add a note to the follow on ticket to really cleanup this class if that is ok?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65568599
  
    --- Diff: server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java ---
    @@ -55,12 +56,16 @@
     import org.apache.zookeeper.KeeperException;
     
     import com.google.common.net.HostAndPort;
    +import java.util.concurrent.TimeUnit;
    +import org.apache.accumulo.core.conf.Property;
     
     public class GarbageCollectWriteAheadLogs {
       private static final Logger log = Logger.getLogger(GarbageCollectWriteAheadLogs.class);
     
       private final Instance instance;
       private final VolumeManager fs;
    +  private final Map<HostAndPort,Long> firstSeenDead = new HashMap<HostAndPort,Long>();
    --- End diff --
    
    The safety of this not being a concurrent/synchronized map is likely due to the use of GCWALogs by the SimpleGarbageCollector? Maybe we just need to document that this class is not threadsafe now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by mjwall <gi...@git.apache.org>.
Github user mjwall commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65703924
  
    --- Diff: server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java ---
    @@ -170,76 +184,203 @@ boolean holdsLock(HostAndPort addr) {
         }
       }
     
    -  private int removeFiles(Map<String,Path> nameToFileMap, Map<String,ArrayList<Path>> serverToFileMap, Map<String,Path> sortedWALogs, final GCStatus status) {
    -    AccumuloConfiguration conf = ServerConfiguration.getSystemConfiguration(instance);
    +  private AccumuloConfiguration getConfig() {
    +    return ServerConfiguration.getSystemConfiguration(instance);
    +  }
    +
    +  /**
    +   * Top level method for removing WAL files.
    +   * <p>
    +   * Loops over all the gathered WAL and sortedWAL entries and calls the appropriate methods for removal
    +   *
    +   * @param nameToFileMap
    +   *          Map of filename to Path
    +   * @param serverToFileMap
    +   *          Map of HostAndPort string to a list of Paths
    +   * @param sortedWALogs
    +   *          Map of sorted WAL names to Path
    +   * @param status
    +   *          GCStatus object for tracking what is done
    +   * @return 0 always
    +   */
    +  @VisibleForTesting
    +  int removeFiles(Map<String,Path> nameToFileMap, Map<String,ArrayList<Path>> serverToFileMap, Map<String,Path> sortedWALogs, final GCStatus status) {
    +    // TODO: remove nameToFileMap from method signature, not used here I don't think
    +    AccumuloConfiguration conf = getConfig();
         for (Entry<String,ArrayList<Path>> entry : serverToFileMap.entrySet()) {
           if (entry.getKey().isEmpty()) {
    -        // old-style log entry, just remove it
    -        for (Path path : entry.getValue()) {
    -          log.debug("Removing old-style WAL " + path);
    -          try {
    -            if (!useTrash || !fs.moveToTrash(path))
    -              fs.deleteRecursively(path);
    -            status.currentLog.deleted++;
    -          } catch (FileNotFoundException ex) {
    -            // ignored
    -          } catch (IOException ex) {
    -            log.error("Unable to delete wal " + path + ": " + ex);
    -          }
    -        }
    +        removeOldStyleWAL(entry, status);
           } else {
    -        HostAndPort address = AddressUtil.parseAddress(entry.getKey(), false);
    -        if (!holdsLock(address)) {
    -          for (Path path : entry.getValue()) {
    -            log.debug("Removing WAL for offline server " + path);
    -            try {
    -              if (!useTrash || !fs.moveToTrash(path))
    -                fs.deleteRecursively(path);
    -              status.currentLog.deleted++;
    -            } catch (FileNotFoundException ex) {
    -              // ignored
    -            } catch (IOException ex) {
    -              log.error("Unable to delete wal " + path + ": " + ex);
    -            }
    -          }
    -          continue;
    -        } else {
    -          Client tserver = null;
    -          try {
    -            tserver = ThriftUtil.getClient(new TabletClientService.Client.Factory(), address, conf);
    -            tserver.removeLogs(Tracer.traceInfo(), SystemCredentials.get().toThrift(instance), paths2strings(entry.getValue()));
    -            log.debug("deleted " + entry.getValue() + " from " + entry.getKey());
    -            status.currentLog.deleted += entry.getValue().size();
    -          } catch (TException e) {
    -            log.warn("Error talking to " + address + ": " + e);
    -          } finally {
    -            if (tserver != null)
    -              ThriftUtil.returnClient(tserver);
    -          }
    -        }
    +        removeWALFile(entry, conf, status);
           }
         }
    -
         for (Path swalog : sortedWALogs.values()) {
    -      log.debug("Removing sorted WAL " + swalog);
    +      removeSortedWAL(swalog);
    +    }
    +    return 0;
    +  }
    +
    +  /**
    +   * Removes sortedWALs.
    +   * <p>
    +   * Sorted WALs are WALs that are in the recovery directory and have already been used.
    +   *
    +   * @param swalog
    +   *          Path to the WAL
    +   */
    +  @VisibleForTesting
    +  void removeSortedWAL(Path swalog) {
    +    log.debug("Removing sorted WAL " + swalog);
    +    try {
    +      if (!useTrash || !fs.moveToTrash(swalog)) {
    +        fs.deleteRecursively(swalog);
    +      }
    +    } catch (FileNotFoundException ex) {
    +      // ignored
    +    } catch (IOException ioe) {
           try {
    -        if (!useTrash || !fs.moveToTrash(swalog)) {
    -          fs.deleteRecursively(swalog);
    +        if (fs.exists(swalog)) {
    +          log.error("Unable to delete sorted walog " + swalog + ": " + ioe);
             }
    -      } catch (FileNotFoundException ex) {
    -        // ignored
    -      } catch (IOException ioe) {
    +      } catch (IOException ex) {
    +        log.error("Unable to check for the existence of " + swalog, ex);
    +      }
    +    }
    +  }
    +
    +  /**
    +   * A wrapper method to check if the tserver using the WAL is still alive
    +   * <p>
    +   * Delegates to the deletion to #removeWALfromDownTserver if the ZK lock is gone or #askTserverToRemoveWAL if the server is known to still be alive
    +   *
    +   * @param entry
    +   *          WAL information gathered
    +   * @param conf
    +   *          AccumuloConfiguration object
    +   * @param status
    +   *          GCStatus object
    +   */
    +  @VisibleForTesting
    +  void removeWALFile(Entry<String,ArrayList<Path>> entry, AccumuloConfiguration conf, final GCStatus status) {
    --- End diff --
    
    Yep, good catch.  I'll remove the annotation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by mjwall <gi...@git.apache.org>.
Github user mjwall commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65704590
  
    --- Diff: server/gc/src/test/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogsTest.java ---
    @@ -234,4 +257,288 @@ public void testIsUUID() {
         assertFalse(GarbageCollectWriteAheadLogs.isUUID("0" + UUID.randomUUID().toString()));
         assertFalse(GarbageCollectWriteAheadLogs.isUUID(null));
       }
    +
    +  @Test
    +  public void testTimeToDeleteTrue() throws InterruptedException {
    +    gcwal.clearFirstSeenDead();
    +    HostAndPort address = HostAndPort.fromString("tserver1:9998");
    +    long wait = AccumuloConfiguration.getTimeInMillis("1s");
    +    gcwal.timeToDelete(address, wait); // to store first seen dead
    +    sleep(wait * 2);
    +    assertTrue(gcwal.timeToDelete(address, wait));
    +  }
    +
    +  @Test
    +  public void testTimeToDeleteFalse() {
    +    HostAndPort address = HostAndPort.fromString("tserver1:9998");
    +    long wait = AccumuloConfiguration.getTimeInMillis("1h");
    +    long t1, t2;
    +    boolean ttd;
    +    do {
    +      t1 = System.nanoTime();
    +      gcwal.clearFirstSeenDead();
    +      gcwal.timeToDelete(address, wait);
    +      ttd = gcwal.timeToDelete(address, wait);
    +      t2 = System.nanoTime();
    +    } while (t2 - t1 > (wait / 2) * 1000000); // as long as it took less than half of the configured wait
    --- End diff --
    
    yep, will change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by mjwall <gi...@git.apache.org>.
Github user mjwall commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65704137
  
    --- Diff: server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java ---
    @@ -410,4 +556,41 @@ static boolean isUUID(String name) {
         }
       }
     
    +  /**
    +   * Determine if TServer has been dead long enough to remove associated WALs.
    +   * <p>
    +   * Uses a map where the key is the address and the value is the time first seen dead. If the address is not in the map, it is added with the current system
    +   * nanoTime. When the passed in wait time has elapsed, this method returns true and removes the key and value from the map.
    +   *
    +   * @param address
    +   *          HostAndPort of dead tserver
    +   * @param wait
    --- End diff --
    
    Will add


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by mjwall <gi...@git.apache.org>.
Github user mjwall commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65704867
  
    --- Diff: server/gc/src/test/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogsTest.java ---
    @@ -234,4 +257,288 @@ public void testIsUUID() {
         assertFalse(GarbageCollectWriteAheadLogs.isUUID("0" + UUID.randomUUID().toString()));
         assertFalse(GarbageCollectWriteAheadLogs.isUUID(null));
       }
    +
    +  @Test
    +  public void testTimeToDeleteTrue() throws InterruptedException {
    +    gcwal.clearFirstSeenDead();
    +    HostAndPort address = HostAndPort.fromString("tserver1:9998");
    +    long wait = AccumuloConfiguration.getTimeInMillis("1s");
    +    gcwal.timeToDelete(address, wait); // to store first seen dead
    +    sleep(wait * 2);
    +    assertTrue(gcwal.timeToDelete(address, wait));
    +  }
    +
    +  @Test
    +  public void testTimeToDeleteFalse() {
    +    HostAndPort address = HostAndPort.fromString("tserver1:9998");
    +    long wait = AccumuloConfiguration.getTimeInMillis("1h");
    +    long t1, t2;
    +    boolean ttd;
    +    do {
    +      t1 = System.nanoTime();
    +      gcwal.clearFirstSeenDead();
    +      gcwal.timeToDelete(address, wait);
    +      ttd = gcwal.timeToDelete(address, wait);
    +      t2 = System.nanoTime();
    +    } while (t2 - t1 > (wait / 2) * 1000000); // as long as it took less than half of the configured wait
    +
    +    assertFalse(gcwal.timeToDelete(address, wait));
    +  }
    +
    +  @Test
    +  public void testTimeToDeleteWithNullAddress() {
    +    assertFalse(gcwal.timeToDelete(null, 123l));
    +  }
    +
    +  /**
    +   * Wrapper class with some helper methods
    +   * <p>
    +   * Just a wrapper around a LinkedHashMap that store method name and argument information. Also includes some convenience methods to make usage cleaner.
    +   */
    +  class MethodCalls {
    +
    +    private LinkedHashMap<String,List<Object>> mapWrapper;
    +
    +    public MethodCalls() {
    +      mapWrapper = new LinkedHashMap<String,List<Object>>();
    +    }
    +
    +    public void put(String methodName, Object... args) {
    +      mapWrapper.put(methodName, Arrays.asList(args));
    +    }
    +
    +    public int size() {
    +      return mapWrapper.size();
    +    }
    +
    +    public boolean hasOneEntry() {
    +      return size() == 1;
    +    }
    +
    +    public Map.Entry<String,List<Object>> getFirstEntry() {
    +      return mapWrapper.entrySet().iterator().next();
    +    }
    +
    +    public String getFirstEntryMethod() {
    +      return getFirstEntry().getKey();
    +    }
    +
    +    public List<Object> getFirstEntryArgs() {
    +      return getFirstEntry().getValue();
    +    }
    +
    +    public Object getFirstEntryArg(int number) {
    +      return getFirstEntryArgs().get(number);
    +    }
    +  }
    +
    +  /**
    +   * Partial mock of the GarbageCollectWriteAheadLogs for testing the removeFile method
    +   * <p>
    +   * There is a map named methodCalls that can be used to assert parameters on methods called inside the removeFile method
    +   */
    +  class GCWALPartialMock extends GarbageCollectWriteAheadLogs {
    +
    +    private boolean holdsLockBool = false;
    +
    +    public GCWALPartialMock(Instance i, VolumeManager vm, boolean useTrash, boolean holdLock) throws IOException {
    +      super(i, vm, useTrash);
    +      this.holdsLockBool = holdLock;
    +    }
    +
    +    public MethodCalls methodCalls = new MethodCalls();
    +
    +    @Override
    +    boolean holdsLock(HostAndPort addr) {
    +      return holdsLockBool;
    +    }
    +
    +    @Override
    +    void removeWALfromDownTserver(HostAndPort address, AccumuloConfiguration conf, Entry<String,ArrayList<Path>> entry, final GCStatus status) {
    +      methodCalls.put("removeWALFromDownTserver", address, conf, entry, status);
    +    }
    +
    +    @Override
    +    void askTserverToRemoveWAL(HostAndPort address, AccumuloConfiguration conf, Entry<String,ArrayList<Path>> entry, final GCStatus status) {
    +      methodCalls.put("askTserverToRemoveWAL", address, conf, entry, status);
    +    }
    +
    +    @Override
    +    void removeOldStyleWAL(Entry<String,ArrayList<Path>> entry, final GCStatus status) {
    +      methodCalls.put("removeOldStyleWAL", entry, status);
    +    }
    +
    +    @Override
    +    void removeSortedWAL(Path swalog) {
    +      methodCalls.put("removeSortedWAL", swalog);
    +    }
    +  }
    +
    +  private GCWALPartialMock getGCWALForRemoveFileTest(GCStatus s, final boolean locked) throws IOException {
    +    return new GCWALPartialMock(new MockInstance("accumulo"), VolumeManagerImpl.get(), false, locked);
    +  }
    +
    +  private Map<String,Path> getEmptyMap() {
    +    return new HashMap<String,Path>();
    +  }
    +
    +  private Map<String,ArrayList<Path>> getServerToFileMap1(String key, Path singlePath) {
    +    Map<String,ArrayList<Path>> serverToFileMap = new HashMap<String,ArrayList<Path>>();
    +    serverToFileMap.put(key, new ArrayList<Path>(Arrays.asList(singlePath)));
    +    return serverToFileMap;
    +  }
    +
    +  @Test
    +  public void testRemoveFilesWithOldStyle() throws IOException {
    +    GCStatus status = new GCStatus();
    +    GarbageCollectWriteAheadLogs realGCWAL = getGCWALForRemoveFileTest(status, true);
    +    Path p1 = new Path("hdfs://localhost:9000/accumulo/wal/tserver1+9997/" + UUID.randomUUID().toString());
    +    Map<String,ArrayList<Path>> serverToFileMap = getServerToFileMap1("", p1);
    +
    +    realGCWAL.removeFiles(getEmptyMap(), serverToFileMap, getEmptyMap(), status);
    +
    +    MethodCalls calls = ((GCWALPartialMock) realGCWAL).methodCalls;
    +    assertEquals("Only one method should have been called", 1, calls.size());
    +    assertEquals("Method should be removeOldStyleWAL", "removeOldStyleWAL", calls.getFirstEntryMethod());
    +    Entry<String,ArrayList<Path>> firstServerToFileMap = serverToFileMap.entrySet().iterator().next();
    +    assertEquals("First param should be empty", firstServerToFileMap, calls.getFirstEntryArg(0));
    +    assertEquals("Second param should be the status", status, calls.getFirstEntryArg(1));
    +  }
    +
    +  @Test
    +  public void testRemoveFilesWithDeadTservers() throws IOException {
    +    GCStatus status = new GCStatus();
    +    GarbageCollectWriteAheadLogs realGCWAL = getGCWALForRemoveFileTest(status, false);
    +    String server = "tserver1+9997";
    +    Path p1 = new Path("hdfs://localhost:9000/accumulo/wal/" + server + "/" + UUID.randomUUID().toString());
    +    Map<String,ArrayList<Path>> serverToFileMap = getServerToFileMap1(server, p1);
    +
    +    realGCWAL.removeFiles(getEmptyMap(), serverToFileMap, getEmptyMap(), status);
    +
    +    MethodCalls calls = ((GCWALPartialMock) realGCWAL).methodCalls;
    +    assertEquals("Only one method should have been called", 1, calls.size());
    +    assertEquals("Method should be removeWALfromDownTserver", "removeWALFromDownTserver", calls.getFirstEntryMethod());
    +    assertEquals("First param should be address", HostAndPort.fromString(server.replaceAll("[+]", ":")), calls.getFirstEntryArg(0));
    +    assertTrue("Second param should be an AccumuloConfiguration", calls.getFirstEntryArg(1) instanceof AccumuloConfiguration);
    +    Entry<String,ArrayList<Path>> firstServerToFileMap = serverToFileMap.entrySet().iterator().next();
    +    assertEquals("Third param should be the entry", firstServerToFileMap, calls.getFirstEntryArg(2));
    +    assertEquals("Forth param should be the status", status, calls.getFirstEntryArg(3));
    +  }
    +
    +  @Test
    +  public void testRemoveFilesWithLiveTservers() throws IOException {
    +    GCStatus status = new GCStatus();
    +    GarbageCollectWriteAheadLogs realGCWAL = getGCWALForRemoveFileTest(status, true);
    +    String server = "tserver1+9997";
    +    Path p1 = new Path("hdfs://localhost:9000/accumulo/wal/" + server + "/" + UUID.randomUUID().toString());
    +    Map<String,ArrayList<Path>> serverToFileMap = getServerToFileMap1(server, p1);
    +
    +    realGCWAL.removeFiles(getEmptyMap(), serverToFileMap, getEmptyMap(), status);
    +
    +    MethodCalls calls = ((GCWALPartialMock) realGCWAL).methodCalls;
    +    assertEquals("Only one method should have been called", 1, calls.size());
    +    assertEquals("Method should be askTserverToRemoveWAL", "askTserverToRemoveWAL", calls.getFirstEntryMethod());
    +    assertEquals("First param should be address", HostAndPort.fromString(server.replaceAll("[+]", ":")), calls.getFirstEntryArg(0));
    +    assertTrue("Second param should be an AccumuloConfiguration", calls.getFirstEntryArg(1) instanceof AccumuloConfiguration);
    +    Entry<String,ArrayList<Path>> firstServerToFileMap = serverToFileMap.entrySet().iterator().next();
    +    assertEquals("Third param should be the entry", firstServerToFileMap, calls.getFirstEntryArg(2));
    +    assertEquals("Forth param should be the status", status, calls.getFirstEntryArg(3));
    +  }
    +
    +  @Test
    +  public void testRemoveFilesRemovesSortedWALs() throws IOException {
    +    GCStatus status = new GCStatus();
    +    GarbageCollectWriteAheadLogs realGCWAL = getGCWALForRemoveFileTest(status, true);
    +    Map<String,ArrayList<Path>> serverToFileMap = new HashMap<String,ArrayList<Path>>();
    +    Map<String,Path> sortedWALogs = new HashMap<String,Path>();
    +    Path p1 = new Path("hdfs://localhost:9000/accumulo/wal/tserver1+9997/" + UUID.randomUUID().toString());
    +    sortedWALogs.put("junk", p1); // TODO: see if this key is actually used here, maybe can be removed
    +
    +    realGCWAL.removeFiles(getEmptyMap(), serverToFileMap, sortedWALogs, status);
    +    MethodCalls calls = ((GCWALPartialMock) realGCWAL).methodCalls;
    +    assertEquals("Only one method should have been called", 1, calls.size());
    +    assertEquals("Method should be removeSortedWAL", "removeSortedWAL", calls.getFirstEntryMethod());
    +    assertEquals("First param should be the Path", p1, calls.getFirstEntryArg(0));
    +
    +  }
    +
    +  static String GCWAL_DEAD_DIR = "gcwal-collect-deadtserver";
    +  static String GCWAL_DEAD_TSERVER = "tserver1";
    +  static String GCWAL_DEAD_TSERVER_PORT = "9995";
    +  static String GCWAL_DEAD_TSERVER_COLLECT_FILE = UUID.randomUUID().toString();
    +
    +  class GCWALDeadTserverCollectMock extends GarbageCollectWriteAheadLogs {
    +
    +    public GCWALDeadTserverCollectMock(Instance i, VolumeManager vm, boolean useTrash) throws IOException {
    +      super(i, vm, useTrash);
    +    }
    +
    +    @Override
    +    boolean holdsLock(HostAndPort addr) {
    +      // tries use zookeeper
    +      return false;
    +    }
    +
    +    @Override
    +    Map<String,Path> getSortedWALogs() {
    +      return new HashMap<String,Path>();
    +    }
    +
    +    @Override
    +    int scanServers(Map<Path,String> fileToServerMap, Map<String,Path> nameToFileMap) throws Exception {
    +      String sep = File.separator;
    +      Path p = new Path(System.getProperty("java.io.tmpdir") + sep + GCWAL_DEAD_DIR + sep + GCWAL_DEAD_TSERVER + "+" + GCWAL_DEAD_TSERVER_PORT + sep
    +          + GCWAL_DEAD_TSERVER_COLLECT_FILE);
    +      fileToServerMap.put(p, GCWAL_DEAD_TSERVER + ":" + GCWAL_DEAD_TSERVER_PORT);
    +      nameToFileMap.put(GCWAL_DEAD_TSERVER_COLLECT_FILE, p);
    +      return 1;
    +    }
    +
    +    @Override
    +    int removeMetadataEntries(Map<String,Path> nameToFileMap, Map<String,Path> sortedWALogs, GCStatus status) throws IOException, KeeperException,
    +        InterruptedException {
    +      return 0;
    +    }
    +
    +    long getGCWALDeadServerWaitTime(AccumuloConfiguration conf) {
    +      // tries to use zookeeper
    +      return 1000l;
    +    }
    +  }
    +
    +  @Test
    +  public void testCollectWithDeadTserver() throws IOException, InterruptedException {
    +    Instance i = new MockInstance();
    +    File walDir = new File(System.getProperty("java.io.tmpdir") + File.separator + GCWAL_DEAD_DIR);
    --- End diff --
    
    will update, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #107: ACCUMULO-4157 Bug fix for removing WALs to quickly

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/107
  
    >  I am ready to merge
    
    :+1: Great detective work here!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65652662
  
    --- Diff: server/gc/src/test/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogsTest.java ---
    @@ -234,4 +257,288 @@ public void testIsUUID() {
         assertFalse(GarbageCollectWriteAheadLogs.isUUID("0" + UUID.randomUUID().toString()));
         assertFalse(GarbageCollectWriteAheadLogs.isUUID(null));
       }
    +
    +  @Test
    +  public void testTimeToDeleteTrue() throws InterruptedException {
    +    gcwal.clearFirstSeenDead();
    +    HostAndPort address = HostAndPort.fromString("tserver1:9998");
    +    long wait = AccumuloConfiguration.getTimeInMillis("1s");
    +    gcwal.timeToDelete(address, wait); // to store first seen dead
    +    sleep(wait * 2);
    +    assertTrue(gcwal.timeToDelete(address, wait));
    +  }
    +
    +  @Test
    +  public void testTimeToDeleteFalse() {
    +    HostAndPort address = HostAndPort.fromString("tserver1:9998");
    +    long wait = AccumuloConfiguration.getTimeInMillis("1h");
    +    long t1, t2;
    +    boolean ttd;
    +    do {
    +      t1 = System.nanoTime();
    +      gcwal.clearFirstSeenDead();
    +      gcwal.timeToDelete(address, wait);
    +      ttd = gcwal.timeToDelete(address, wait);
    +      t2 = System.nanoTime();
    +    } while (t2 - t1 > (wait / 2) * 1000000); // as long as it took less than half of the configured wait
    +
    +    assertFalse(gcwal.timeToDelete(address, wait));
    +  }
    +
    +  @Test
    +  public void testTimeToDeleteWithNullAddress() {
    +    assertFalse(gcwal.timeToDelete(null, 123l));
    +  }
    +
    +  /**
    +   * Wrapper class with some helper methods
    +   * <p>
    +   * Just a wrapper around a LinkedHashMap that store method name and argument information. Also includes some convenience methods to make usage cleaner.
    +   */
    +  class MethodCalls {
    +
    +    private LinkedHashMap<String,List<Object>> mapWrapper;
    +
    +    public MethodCalls() {
    +      mapWrapper = new LinkedHashMap<String,List<Object>>();
    +    }
    +
    +    public void put(String methodName, Object... args) {
    +      mapWrapper.put(methodName, Arrays.asList(args));
    +    }
    +
    +    public int size() {
    +      return mapWrapper.size();
    +    }
    +
    +    public boolean hasOneEntry() {
    +      return size() == 1;
    +    }
    +
    +    public Map.Entry<String,List<Object>> getFirstEntry() {
    +      return mapWrapper.entrySet().iterator().next();
    +    }
    +
    +    public String getFirstEntryMethod() {
    +      return getFirstEntry().getKey();
    +    }
    +
    +    public List<Object> getFirstEntryArgs() {
    +      return getFirstEntry().getValue();
    +    }
    +
    +    public Object getFirstEntryArg(int number) {
    +      return getFirstEntryArgs().get(number);
    +    }
    +  }
    +
    +  /**
    +   * Partial mock of the GarbageCollectWriteAheadLogs for testing the removeFile method
    +   * <p>
    +   * There is a map named methodCalls that can be used to assert parameters on methods called inside the removeFile method
    +   */
    +  class GCWALPartialMock extends GarbageCollectWriteAheadLogs {
    +
    +    private boolean holdsLockBool = false;
    +
    +    public GCWALPartialMock(Instance i, VolumeManager vm, boolean useTrash, boolean holdLock) throws IOException {
    +      super(i, vm, useTrash);
    +      this.holdsLockBool = holdLock;
    +    }
    +
    +    public MethodCalls methodCalls = new MethodCalls();
    +
    +    @Override
    +    boolean holdsLock(HostAndPort addr) {
    +      return holdsLockBool;
    +    }
    +
    +    @Override
    +    void removeWALfromDownTserver(HostAndPort address, AccumuloConfiguration conf, Entry<String,ArrayList<Path>> entry, final GCStatus status) {
    +      methodCalls.put("removeWALFromDownTserver", address, conf, entry, status);
    +    }
    +
    +    @Override
    +    void askTserverToRemoveWAL(HostAndPort address, AccumuloConfiguration conf, Entry<String,ArrayList<Path>> entry, final GCStatus status) {
    +      methodCalls.put("askTserverToRemoveWAL", address, conf, entry, status);
    +    }
    +
    +    @Override
    +    void removeOldStyleWAL(Entry<String,ArrayList<Path>> entry, final GCStatus status) {
    +      methodCalls.put("removeOldStyleWAL", entry, status);
    +    }
    +
    +    @Override
    +    void removeSortedWAL(Path swalog) {
    +      methodCalls.put("removeSortedWAL", swalog);
    +    }
    +  }
    +
    +  private GCWALPartialMock getGCWALForRemoveFileTest(GCStatus s, final boolean locked) throws IOException {
    +    return new GCWALPartialMock(new MockInstance("accumulo"), VolumeManagerImpl.get(), false, locked);
    +  }
    +
    +  private Map<String,Path> getEmptyMap() {
    +    return new HashMap<String,Path>();
    +  }
    +
    +  private Map<String,ArrayList<Path>> getServerToFileMap1(String key, Path singlePath) {
    +    Map<String,ArrayList<Path>> serverToFileMap = new HashMap<String,ArrayList<Path>>();
    +    serverToFileMap.put(key, new ArrayList<Path>(Arrays.asList(singlePath)));
    +    return serverToFileMap;
    +  }
    +
    +  @Test
    +  public void testRemoveFilesWithOldStyle() throws IOException {
    +    GCStatus status = new GCStatus();
    +    GarbageCollectWriteAheadLogs realGCWAL = getGCWALForRemoveFileTest(status, true);
    +    Path p1 = new Path("hdfs://localhost:9000/accumulo/wal/tserver1+9997/" + UUID.randomUUID().toString());
    +    Map<String,ArrayList<Path>> serverToFileMap = getServerToFileMap1("", p1);
    +
    +    realGCWAL.removeFiles(getEmptyMap(), serverToFileMap, getEmptyMap(), status);
    +
    +    MethodCalls calls = ((GCWALPartialMock) realGCWAL).methodCalls;
    +    assertEquals("Only one method should have been called", 1, calls.size());
    +    assertEquals("Method should be removeOldStyleWAL", "removeOldStyleWAL", calls.getFirstEntryMethod());
    +    Entry<String,ArrayList<Path>> firstServerToFileMap = serverToFileMap.entrySet().iterator().next();
    +    assertEquals("First param should be empty", firstServerToFileMap, calls.getFirstEntryArg(0));
    +    assertEquals("Second param should be the status", status, calls.getFirstEntryArg(1));
    +  }
    +
    +  @Test
    +  public void testRemoveFilesWithDeadTservers() throws IOException {
    +    GCStatus status = new GCStatus();
    +    GarbageCollectWriteAheadLogs realGCWAL = getGCWALForRemoveFileTest(status, false);
    +    String server = "tserver1+9997";
    +    Path p1 = new Path("hdfs://localhost:9000/accumulo/wal/" + server + "/" + UUID.randomUUID().toString());
    +    Map<String,ArrayList<Path>> serverToFileMap = getServerToFileMap1(server, p1);
    +
    +    realGCWAL.removeFiles(getEmptyMap(), serverToFileMap, getEmptyMap(), status);
    +
    +    MethodCalls calls = ((GCWALPartialMock) realGCWAL).methodCalls;
    +    assertEquals("Only one method should have been called", 1, calls.size());
    +    assertEquals("Method should be removeWALfromDownTserver", "removeWALFromDownTserver", calls.getFirstEntryMethod());
    +    assertEquals("First param should be address", HostAndPort.fromString(server.replaceAll("[+]", ":")), calls.getFirstEntryArg(0));
    +    assertTrue("Second param should be an AccumuloConfiguration", calls.getFirstEntryArg(1) instanceof AccumuloConfiguration);
    +    Entry<String,ArrayList<Path>> firstServerToFileMap = serverToFileMap.entrySet().iterator().next();
    +    assertEquals("Third param should be the entry", firstServerToFileMap, calls.getFirstEntryArg(2));
    +    assertEquals("Forth param should be the status", status, calls.getFirstEntryArg(3));
    +  }
    +
    +  @Test
    +  public void testRemoveFilesWithLiveTservers() throws IOException {
    +    GCStatus status = new GCStatus();
    +    GarbageCollectWriteAheadLogs realGCWAL = getGCWALForRemoveFileTest(status, true);
    +    String server = "tserver1+9997";
    +    Path p1 = new Path("hdfs://localhost:9000/accumulo/wal/" + server + "/" + UUID.randomUUID().toString());
    +    Map<String,ArrayList<Path>> serverToFileMap = getServerToFileMap1(server, p1);
    +
    +    realGCWAL.removeFiles(getEmptyMap(), serverToFileMap, getEmptyMap(), status);
    +
    +    MethodCalls calls = ((GCWALPartialMock) realGCWAL).methodCalls;
    +    assertEquals("Only one method should have been called", 1, calls.size());
    +    assertEquals("Method should be askTserverToRemoveWAL", "askTserverToRemoveWAL", calls.getFirstEntryMethod());
    +    assertEquals("First param should be address", HostAndPort.fromString(server.replaceAll("[+]", ":")), calls.getFirstEntryArg(0));
    +    assertTrue("Second param should be an AccumuloConfiguration", calls.getFirstEntryArg(1) instanceof AccumuloConfiguration);
    +    Entry<String,ArrayList<Path>> firstServerToFileMap = serverToFileMap.entrySet().iterator().next();
    +    assertEquals("Third param should be the entry", firstServerToFileMap, calls.getFirstEntryArg(2));
    +    assertEquals("Forth param should be the status", status, calls.getFirstEntryArg(3));
    +  }
    +
    +  @Test
    +  public void testRemoveFilesRemovesSortedWALs() throws IOException {
    +    GCStatus status = new GCStatus();
    +    GarbageCollectWriteAheadLogs realGCWAL = getGCWALForRemoveFileTest(status, true);
    +    Map<String,ArrayList<Path>> serverToFileMap = new HashMap<String,ArrayList<Path>>();
    +    Map<String,Path> sortedWALogs = new HashMap<String,Path>();
    +    Path p1 = new Path("hdfs://localhost:9000/accumulo/wal/tserver1+9997/" + UUID.randomUUID().toString());
    +    sortedWALogs.put("junk", p1); // TODO: see if this key is actually used here, maybe can be removed
    +
    +    realGCWAL.removeFiles(getEmptyMap(), serverToFileMap, sortedWALogs, status);
    +    MethodCalls calls = ((GCWALPartialMock) realGCWAL).methodCalls;
    +    assertEquals("Only one method should have been called", 1, calls.size());
    +    assertEquals("Method should be removeSortedWAL", "removeSortedWAL", calls.getFirstEntryMethod());
    +    assertEquals("First param should be the Path", p1, calls.getFirstEntryArg(0));
    +
    +  }
    +
    +  static String GCWAL_DEAD_DIR = "gcwal-collect-deadtserver";
    +  static String GCWAL_DEAD_TSERVER = "tserver1";
    +  static String GCWAL_DEAD_TSERVER_PORT = "9995";
    +  static String GCWAL_DEAD_TSERVER_COLLECT_FILE = UUID.randomUUID().toString();
    +
    +  class GCWALDeadTserverCollectMock extends GarbageCollectWriteAheadLogs {
    +
    +    public GCWALDeadTserverCollectMock(Instance i, VolumeManager vm, boolean useTrash) throws IOException {
    +      super(i, vm, useTrash);
    +    }
    +
    +    @Override
    +    boolean holdsLock(HostAndPort addr) {
    +      // tries use zookeeper
    +      return false;
    +    }
    +
    +    @Override
    +    Map<String,Path> getSortedWALogs() {
    +      return new HashMap<String,Path>();
    +    }
    +
    +    @Override
    +    int scanServers(Map<Path,String> fileToServerMap, Map<String,Path> nameToFileMap) throws Exception {
    +      String sep = File.separator;
    +      Path p = new Path(System.getProperty("java.io.tmpdir") + sep + GCWAL_DEAD_DIR + sep + GCWAL_DEAD_TSERVER + "+" + GCWAL_DEAD_TSERVER_PORT + sep
    +          + GCWAL_DEAD_TSERVER_COLLECT_FILE);
    +      fileToServerMap.put(p, GCWAL_DEAD_TSERVER + ":" + GCWAL_DEAD_TSERVER_PORT);
    +      nameToFileMap.put(GCWAL_DEAD_TSERVER_COLLECT_FILE, p);
    +      return 1;
    +    }
    +
    +    @Override
    +    int removeMetadataEntries(Map<String,Path> nameToFileMap, Map<String,Path> sortedWALogs, GCStatus status) throws IOException, KeeperException,
    +        InterruptedException {
    +      return 0;
    +    }
    +
    +    long getGCWALDeadServerWaitTime(AccumuloConfiguration conf) {
    +      // tries to use zookeeper
    +      return 1000l;
    +    }
    +  }
    +
    +  @Test
    +  public void testCollectWithDeadTserver() throws IOException, InterruptedException {
    +    Instance i = new MockInstance();
    +    File walDir = new File(System.getProperty("java.io.tmpdir") + File.separator + GCWAL_DEAD_DIR);
    +    File walFileDir = new File(walDir + File.separator + GCWAL_DEAD_TSERVER + "+" + GCWAL_DEAD_TSERVER_PORT);
    +    File walFile = new File(walFileDir + File.separator + GCWAL_DEAD_TSERVER_COLLECT_FILE);
    +    if (!walFileDir.exists()) {
    +      walFileDir.mkdirs();
    +      new FileOutputStream(walFile).close();
    +    }
    +
    +    try {
    +      VolumeManager vm = VolumeManagerImpl.getLocal(walDir.toString());
    +      GarbageCollectWriteAheadLogs gcwal2 = new GCWALDeadTserverCollectMock(i, vm, false);
    +      GCStatus status = new GCStatus(new GcCycleStats(), new GcCycleStats(), new GcCycleStats(), new GcCycleStats());
    +
    +      gcwal2.collect(status);
    +
    +      assertTrue("File should not be deleted", walFile.exists());
    +      assertEquals("Should have one candidate", 1, status.lastLog.getCandidates());
    +      assertEquals("Should not have deleted that file", 0, status.lastLog.getDeleted());
    +
    +      sleep(2000);
    +      gcwal2.collect(status);
    --- End diff --
    
    nice test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by madrob <gi...@git.apache.org>.
Github user madrob commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65654141
  
    --- Diff: server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java ---
    @@ -410,4 +556,41 @@ static boolean isUUID(String name) {
         }
       }
     
    +  /**
    +   * Determine if TServer has been dead long enough to remove associated WALs.
    +   * <p>
    +   * Uses a map where the key is the address and the value is the time first seen dead. If the address is not in the map, it is added with the current system
    +   * nanoTime. When the passed in wait time has elapsed, this method returns true and removes the key and value from the map.
    +   *
    +   * @param address
    +   *          HostAndPort of dead tserver
    +   * @param wait
    +   *          long value of elapsed time to wait
    +   * @return boolean whether enough time elapsed since the server was first seen as dead.
    +   */
    +  @VisibleForTesting
    +  protected boolean timeToDelete(HostAndPort address, long wait) {
    +    // check whether the tserver has been dead long enough
    +    Long firstSeen = firstSeenDead.get(address);
    +    if (firstSeen != null) {
    +      long elapsedTime = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - firstSeen);
    --- End diff --
    
    Log message claims seconds but code prints millis.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by mjwall <gi...@git.apache.org>.
Github user mjwall commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65749645
  
    --- Diff: server/gc/src/test/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogsTest.java ---
    @@ -234,4 +257,288 @@ public void testIsUUID() {
         assertFalse(GarbageCollectWriteAheadLogs.isUUID("0" + UUID.randomUUID().toString()));
         assertFalse(GarbageCollectWriteAheadLogs.isUUID(null));
       }
    +
    +  @Test
    +  public void testTimeToDeleteTrue() throws InterruptedException {
    +    gcwal.clearFirstSeenDead();
    +    HostAndPort address = HostAndPort.fromString("tserver1:9998");
    +    long wait = AccumuloConfiguration.getTimeInMillis("1s");
    +    gcwal.timeToDelete(address, wait); // to store first seen dead
    +    sleep(wait * 2);
    +    assertTrue(gcwal.timeToDelete(address, wait));
    +  }
    +
    +  @Test
    +  public void testTimeToDeleteFalse() {
    +    HostAndPort address = HostAndPort.fromString("tserver1:9998");
    +    long wait = AccumuloConfiguration.getTimeInMillis("1h");
    +    long t1, t2;
    +    boolean ttd;
    +    do {
    +      t1 = System.nanoTime();
    +      gcwal.clearFirstSeenDead();
    +      gcwal.timeToDelete(address, wait);
    --- End diff --
    
    Added, rebased on 1.6 and and force pushed my branch.  I am ready to merge this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by madrob <gi...@git.apache.org>.
Github user madrob commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65654201
  
    --- Diff: server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java ---
    @@ -410,4 +556,41 @@ static boolean isUUID(String name) {
         }
       }
     
    +  /**
    +   * Determine if TServer has been dead long enough to remove associated WALs.
    +   * <p>
    +   * Uses a map where the key is the address and the value is the time first seen dead. If the address is not in the map, it is added with the current system
    +   * nanoTime. When the passed in wait time has elapsed, this method returns true and removes the key and value from the map.
    +   *
    +   * @param address
    +   *          HostAndPort of dead tserver
    +   * @param wait
    --- End diff --
    
    Java doc units


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65649944
  
    --- Diff: server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java ---
    @@ -170,76 +184,203 @@ boolean holdsLock(HostAndPort addr) {
         }
       }
     
    -  private int removeFiles(Map<String,Path> nameToFileMap, Map<String,ArrayList<Path>> serverToFileMap, Map<String,Path> sortedWALogs, final GCStatus status) {
    -    AccumuloConfiguration conf = ServerConfiguration.getSystemConfiguration(instance);
    +  private AccumuloConfiguration getConfig() {
    +    return ServerConfiguration.getSystemConfiguration(instance);
    +  }
    +
    +  /**
    +   * Top level method for removing WAL files.
    +   * <p>
    +   * Loops over all the gathered WAL and sortedWAL entries and calls the appropriate methods for removal
    +   *
    +   * @param nameToFileMap
    +   *          Map of filename to Path
    +   * @param serverToFileMap
    +   *          Map of HostAndPort string to a list of Paths
    +   * @param sortedWALogs
    +   *          Map of sorted WAL names to Path
    +   * @param status
    +   *          GCStatus object for tracking what is done
    +   * @return 0 always
    +   */
    +  @VisibleForTesting
    +  int removeFiles(Map<String,Path> nameToFileMap, Map<String,ArrayList<Path>> serverToFileMap, Map<String,Path> sortedWALogs, final GCStatus status) {
    +    // TODO: remove nameToFileMap from method signature, not used here I don't think
    +    AccumuloConfiguration conf = getConfig();
         for (Entry<String,ArrayList<Path>> entry : serverToFileMap.entrySet()) {
           if (entry.getKey().isEmpty()) {
    -        // old-style log entry, just remove it
    -        for (Path path : entry.getValue()) {
    -          log.debug("Removing old-style WAL " + path);
    -          try {
    -            if (!useTrash || !fs.moveToTrash(path))
    -              fs.deleteRecursively(path);
    -            status.currentLog.deleted++;
    -          } catch (FileNotFoundException ex) {
    -            // ignored
    -          } catch (IOException ex) {
    -            log.error("Unable to delete wal " + path + ": " + ex);
    -          }
    -        }
    +        removeOldStyleWAL(entry, status);
           } else {
    -        HostAndPort address = AddressUtil.parseAddress(entry.getKey(), false);
    -        if (!holdsLock(address)) {
    -          for (Path path : entry.getValue()) {
    -            log.debug("Removing WAL for offline server " + path);
    -            try {
    -              if (!useTrash || !fs.moveToTrash(path))
    -                fs.deleteRecursively(path);
    -              status.currentLog.deleted++;
    -            } catch (FileNotFoundException ex) {
    -              // ignored
    -            } catch (IOException ex) {
    -              log.error("Unable to delete wal " + path + ": " + ex);
    -            }
    -          }
    -          continue;
    -        } else {
    -          Client tserver = null;
    -          try {
    -            tserver = ThriftUtil.getClient(new TabletClientService.Client.Factory(), address, conf);
    -            tserver.removeLogs(Tracer.traceInfo(), SystemCredentials.get().toThrift(instance), paths2strings(entry.getValue()));
    -            log.debug("deleted " + entry.getValue() + " from " + entry.getKey());
    -            status.currentLog.deleted += entry.getValue().size();
    -          } catch (TException e) {
    -            log.warn("Error talking to " + address + ": " + e);
    -          } finally {
    -            if (tserver != null)
    -              ThriftUtil.returnClient(tserver);
    -          }
    -        }
    +        removeWALFile(entry, conf, status);
           }
         }
    -
         for (Path swalog : sortedWALogs.values()) {
    -      log.debug("Removing sorted WAL " + swalog);
    +      removeSortedWAL(swalog);
    +    }
    +    return 0;
    +  }
    +
    +  /**
    +   * Removes sortedWALs.
    +   * <p>
    +   * Sorted WALs are WALs that are in the recovery directory and have already been used.
    +   *
    +   * @param swalog
    +   *          Path to the WAL
    +   */
    +  @VisibleForTesting
    +  void removeSortedWAL(Path swalog) {
    +    log.debug("Removing sorted WAL " + swalog);
    +    try {
    +      if (!useTrash || !fs.moveToTrash(swalog)) {
    +        fs.deleteRecursively(swalog);
    +      }
    +    } catch (FileNotFoundException ex) {
    +      // ignored
    +    } catch (IOException ioe) {
           try {
    -        if (!useTrash || !fs.moveToTrash(swalog)) {
    -          fs.deleteRecursively(swalog);
    +        if (fs.exists(swalog)) {
    +          log.error("Unable to delete sorted walog " + swalog + ": " + ioe);
             }
    -      } catch (FileNotFoundException ex) {
    -        // ignored
    -      } catch (IOException ioe) {
    +      } catch (IOException ex) {
    +        log.error("Unable to check for the existence of " + swalog, ex);
    +      }
    +    }
    +  }
    +
    +  /**
    +   * A wrapper method to check if the tserver using the WAL is still alive
    +   * <p>
    +   * Delegates to the deletion to #removeWALfromDownTserver if the ZK lock is gone or #askTserverToRemoveWAL if the server is known to still be alive
    +   *
    +   * @param entry
    +   *          WAL information gathered
    +   * @param conf
    +   *          AccumuloConfiguration object
    +   * @param status
    +   *          GCStatus object
    +   */
    +  @VisibleForTesting
    +  void removeWALFile(Entry<String,ArrayList<Path>> entry, AccumuloConfiguration conf, final GCStatus status) {
    --- End diff --
    
    This method is marked as visible for testing, but I don't see the unit test calling it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by mjwall <gi...@git.apache.org>.
Github user mjwall commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65704291
  
    --- Diff: server/gc/src/test/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogsTest.java ---
    @@ -234,4 +257,288 @@ public void testIsUUID() {
         assertFalse(GarbageCollectWriteAheadLogs.isUUID("0" + UUID.randomUUID().toString()));
         assertFalse(GarbageCollectWriteAheadLogs.isUUID(null));
       }
    +
    +  @Test
    +  public void testTimeToDeleteTrue() throws InterruptedException {
    +    gcwal.clearFirstSeenDead();
    +    HostAndPort address = HostAndPort.fromString("tserver1:9998");
    +    long wait = AccumuloConfiguration.getTimeInMillis("1s");
    +    gcwal.timeToDelete(address, wait); // to store first seen dead
    +    sleep(wait * 2);
    +    assertTrue(gcwal.timeToDelete(address, wait));
    +  }
    +
    +  @Test
    +  public void testTimeToDeleteFalse() {
    +    HostAndPort address = HostAndPort.fromString("tserver1:9998");
    +    long wait = AccumuloConfiguration.getTimeInMillis("1h");
    +    long t1, t2;
    +    boolean ttd;
    +    do {
    +      t1 = System.nanoTime();
    +      gcwal.clearFirstSeenDead();
    +      gcwal.timeToDelete(address, wait);
    +      ttd = gcwal.timeToDelete(address, wait);
    +      t2 = System.nanoTime();
    +    } while (t2 - t1 > (wait / 2) * 1000000); // as long as it took less than half of the configured wait
    +
    +    assertFalse(gcwal.timeToDelete(address, wait));
    --- End diff --
    
    It should, good catch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by mjwall <gi...@git.apache.org>.
Github user mjwall commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65577463
  
    --- Diff: server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java ---
    @@ -55,12 +56,16 @@
     import org.apache.zookeeper.KeeperException;
     
     import com.google.common.net.HostAndPort;
    +import java.util.concurrent.TimeUnit;
    +import org.apache.accumulo.core.conf.Property;
     
     public class GarbageCollectWriteAheadLogs {
       private static final Logger log = Logger.getLogger(GarbageCollectWriteAheadLogs.class);
     
       private final Instance instance;
       private final VolumeManager fs;
    +  private final Map<HostAndPort,Long> firstSeenDead = new HashMap<HostAndPort,Long>();
    --- End diff --
    
    Ok, great.  I'll add a javadoc to the collect method, the only public method in the class currently.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65711567
  
    --- Diff: server/gc/src/test/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogsTest.java ---
    @@ -234,4 +257,288 @@ public void testIsUUID() {
         assertFalse(GarbageCollectWriteAheadLogs.isUUID("0" + UUID.randomUUID().toString()));
         assertFalse(GarbageCollectWriteAheadLogs.isUUID(null));
       }
    +
    +  @Test
    +  public void testTimeToDeleteTrue() throws InterruptedException {
    +    gcwal.clearFirstSeenDead();
    +    HostAndPort address = HostAndPort.fromString("tserver1:9998");
    +    long wait = AccumuloConfiguration.getTimeInMillis("1s");
    +    gcwal.timeToDelete(address, wait); // to store first seen dead
    +    sleep(wait * 2);
    +    assertTrue(gcwal.timeToDelete(address, wait));
    +  }
    +
    +  @Test
    +  public void testTimeToDeleteFalse() {
    +    HostAndPort address = HostAndPort.fromString("tserver1:9998");
    +    long wait = AccumuloConfiguration.getTimeInMillis("1h");
    +    long t1, t2;
    +    boolean ttd;
    +    do {
    +      t1 = System.nanoTime();
    +      gcwal.clearFirstSeenDead();
    +      gcwal.timeToDelete(address, wait);
    --- End diff --
    
    should this first call to timeToDelete always return false?  If so could wrap it w/ `assertFalse()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #107: ACCUMULO-4157 Bug fix for removing WALs to quickly

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/107
  
    The general approach seems to be pretty solid. The edge cases I could think about were correctly handled (e.g. if the GC dies, the "time to wait" would be reset and we woudn't delete the WAL too quickly). The cleanup of the GCWAL class is nice as are the new test cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by mjwall <gi...@git.apache.org>.
Github user mjwall commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65703992
  
    --- Diff: server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java ---
    @@ -170,76 +184,203 @@ boolean holdsLock(HostAndPort addr) {
         }
       }
     
    -  private int removeFiles(Map<String,Path> nameToFileMap, Map<String,ArrayList<Path>> serverToFileMap, Map<String,Path> sortedWALogs, final GCStatus status) {
    -    AccumuloConfiguration conf = ServerConfiguration.getSystemConfiguration(instance);
    +  private AccumuloConfiguration getConfig() {
    +    return ServerConfiguration.getSystemConfiguration(instance);
    +  }
    +
    +  /**
    +   * Top level method for removing WAL files.
    +   * <p>
    +   * Loops over all the gathered WAL and sortedWAL entries and calls the appropriate methods for removal
    +   *
    +   * @param nameToFileMap
    +   *          Map of filename to Path
    +   * @param serverToFileMap
    +   *          Map of HostAndPort string to a list of Paths
    +   * @param sortedWALogs
    +   *          Map of sorted WAL names to Path
    +   * @param status
    +   *          GCStatus object for tracking what is done
    +   * @return 0 always
    +   */
    +  @VisibleForTesting
    +  int removeFiles(Map<String,Path> nameToFileMap, Map<String,ArrayList<Path>> serverToFileMap, Map<String,Path> sortedWALogs, final GCStatus status) {
    +    // TODO: remove nameToFileMap from method signature, not used here I don't think
    +    AccumuloConfiguration conf = getConfig();
         for (Entry<String,ArrayList<Path>> entry : serverToFileMap.entrySet()) {
           if (entry.getKey().isEmpty()) {
    -        // old-style log entry, just remove it
    -        for (Path path : entry.getValue()) {
    -          log.debug("Removing old-style WAL " + path);
    -          try {
    -            if (!useTrash || !fs.moveToTrash(path))
    -              fs.deleteRecursively(path);
    -            status.currentLog.deleted++;
    -          } catch (FileNotFoundException ex) {
    -            // ignored
    -          } catch (IOException ex) {
    -            log.error("Unable to delete wal " + path + ": " + ex);
    -          }
    -        }
    +        removeOldStyleWAL(entry, status);
           } else {
    -        HostAndPort address = AddressUtil.parseAddress(entry.getKey(), false);
    -        if (!holdsLock(address)) {
    -          for (Path path : entry.getValue()) {
    -            log.debug("Removing WAL for offline server " + path);
    -            try {
    -              if (!useTrash || !fs.moveToTrash(path))
    -                fs.deleteRecursively(path);
    -              status.currentLog.deleted++;
    -            } catch (FileNotFoundException ex) {
    -              // ignored
    -            } catch (IOException ex) {
    -              log.error("Unable to delete wal " + path + ": " + ex);
    -            }
    -          }
    -          continue;
    -        } else {
    -          Client tserver = null;
    -          try {
    -            tserver = ThriftUtil.getClient(new TabletClientService.Client.Factory(), address, conf);
    -            tserver.removeLogs(Tracer.traceInfo(), SystemCredentials.get().toThrift(instance), paths2strings(entry.getValue()));
    -            log.debug("deleted " + entry.getValue() + " from " + entry.getKey());
    -            status.currentLog.deleted += entry.getValue().size();
    -          } catch (TException e) {
    -            log.warn("Error talking to " + address + ": " + e);
    -          } finally {
    -            if (tserver != null)
    -              ThriftUtil.returnClient(tserver);
    -          }
    -        }
    +        removeWALFile(entry, conf, status);
           }
         }
    -
         for (Path swalog : sortedWALogs.values()) {
    -      log.debug("Removing sorted WAL " + swalog);
    +      removeSortedWAL(swalog);
    +    }
    +    return 0;
    +  }
    +
    +  /**
    +   * Removes sortedWALs.
    +   * <p>
    +   * Sorted WALs are WALs that are in the recovery directory and have already been used.
    +   *
    +   * @param swalog
    +   *          Path to the WAL
    +   */
    +  @VisibleForTesting
    +  void removeSortedWAL(Path swalog) {
    +    log.debug("Removing sorted WAL " + swalog);
    +    try {
    +      if (!useTrash || !fs.moveToTrash(swalog)) {
    +        fs.deleteRecursively(swalog);
    +      }
    +    } catch (FileNotFoundException ex) {
    +      // ignored
    +    } catch (IOException ioe) {
           try {
    -        if (!useTrash || !fs.moveToTrash(swalog)) {
    -          fs.deleteRecursively(swalog);
    +        if (fs.exists(swalog)) {
    +          log.error("Unable to delete sorted walog " + swalog + ": " + ioe);
             }
    -      } catch (FileNotFoundException ex) {
    -        // ignored
    -      } catch (IOException ioe) {
    +      } catch (IOException ex) {
    +        log.error("Unable to check for the existence of " + swalog, ex);
    +      }
    +    }
    +  }
    +
    +  /**
    +   * A wrapper method to check if the tserver using the WAL is still alive
    +   * <p>
    +   * Delegates to the deletion to #removeWALfromDownTserver if the ZK lock is gone or #askTserverToRemoveWAL if the server is known to still be alive
    +   *
    +   * @param entry
    +   *          WAL information gathered
    +   * @param conf
    +   *          AccumuloConfiguration object
    +   * @param status
    +   *          GCStatus object
    +   */
    +  @VisibleForTesting
    +  void removeWALFile(Entry<String,ArrayList<Path>> entry, AccumuloConfiguration conf, final GCStatus status) {
    +    HostAndPort address = AddressUtil.parseAddress(entry.getKey(), false);
    +    if (!holdsLock(address)) {
    +      removeWALfromDownTserver(address, conf, entry, status);
    +    } else {
    +      askTserverToRemoveWAL(address, conf, entry, status);
    +    }
    +  }
    +
    +  /**
    +   * Asks a currently running tserver to remove it's WALs.
    +   * <p>
    +   * A tserver has more information about whether a WAL is still being used for current mutations. It is safer to ask the tserver to remove the file instead of
    +   * just relying on information in the metadata table.
    +   *
    +   * @param address
    +   *          HostAndPort of the tserver
    +   * @param conf
    +   *          AccumuloConfiguration entry
    +   * @param entry
    +   *          WAL information gathered
    +   * @param status
    +   *          GCStatus object
    +   */
    +  @VisibleForTesting
    +  void askTserverToRemoveWAL(HostAndPort address, AccumuloConfiguration conf, Entry<String,ArrayList<Path>> entry, final GCStatus status) {
    +    firstSeenDead.remove(address);
    +    Client tserver = null;
    +    try {
    +      tserver = ThriftUtil.getClient(new TabletClientService.Client.Factory(), address, conf);
    +      tserver.removeLogs(Tracer.traceInfo(), SystemCredentials.get().toThrift(instance), paths2strings(entry.getValue()));
    +      log.debug("asking tserver to delete " + entry.getValue() + " from " + entry.getKey());
    --- End diff --
    
    Yep, will change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #107: ACCUMULO-4157 Bug fix for removing WALs to quickly

Posted by mjwall <gi...@git.apache.org>.
Github user mjwall commented on the issue:

    https://github.com/apache/accumulo/pull/107
  
    Thanks for reviewing @joshelser 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65651454
  
    --- Diff: server/gc/src/test/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogsTest.java ---
    @@ -234,4 +257,288 @@ public void testIsUUID() {
         assertFalse(GarbageCollectWriteAheadLogs.isUUID("0" + UUID.randomUUID().toString()));
         assertFalse(GarbageCollectWriteAheadLogs.isUUID(null));
       }
    +
    +  @Test
    +  public void testTimeToDeleteTrue() throws InterruptedException {
    +    gcwal.clearFirstSeenDead();
    +    HostAndPort address = HostAndPort.fromString("tserver1:9998");
    +    long wait = AccumuloConfiguration.getTimeInMillis("1s");
    +    gcwal.timeToDelete(address, wait); // to store first seen dead
    +    sleep(wait * 2);
    +    assertTrue(gcwal.timeToDelete(address, wait));
    +  }
    +
    +  @Test
    +  public void testTimeToDeleteFalse() {
    +    HostAndPort address = HostAndPort.fromString("tserver1:9998");
    +    long wait = AccumuloConfiguration.getTimeInMillis("1h");
    +    long t1, t2;
    +    boolean ttd;
    +    do {
    +      t1 = System.nanoTime();
    +      gcwal.clearFirstSeenDead();
    +      gcwal.timeToDelete(address, wait);
    +      ttd = gcwal.timeToDelete(address, wait);
    +      t2 = System.nanoTime();
    +    } while (t2 - t1 > (wait / 2) * 1000000); // as long as it took less than half of the configured wait
    +
    +    assertFalse(gcwal.timeToDelete(address, wait));
    --- End diff --
    
    should this be `assertFalse(ttd)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by mjwall <gi...@git.apache.org>.
Github user mjwall commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65604304
  
    --- Diff: server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java ---
    @@ -55,12 +56,16 @@
     import org.apache.zookeeper.KeeperException;
     
     import com.google.common.net.HostAndPort;
    +import java.util.concurrent.TimeUnit;
    +import org.apache.accumulo.core.conf.Property;
     
     public class GarbageCollectWriteAheadLogs {
       private static final Logger log = Logger.getLogger(GarbageCollectWriteAheadLogs.class);
     
       private final Instance instance;
       private final VolumeManager fs;
    +  private final Map<HostAndPort,Long> firstSeenDead = new HashMap<HostAndPort,Long>();
    --- End diff --
    
    Javadocs added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65652304
  
    --- Diff: server/gc/src/test/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogsTest.java ---
    @@ -234,4 +257,288 @@ public void testIsUUID() {
         assertFalse(GarbageCollectWriteAheadLogs.isUUID("0" + UUID.randomUUID().toString()));
         assertFalse(GarbageCollectWriteAheadLogs.isUUID(null));
       }
    +
    +  @Test
    +  public void testTimeToDeleteTrue() throws InterruptedException {
    +    gcwal.clearFirstSeenDead();
    +    HostAndPort address = HostAndPort.fromString("tserver1:9998");
    +    long wait = AccumuloConfiguration.getTimeInMillis("1s");
    +    gcwal.timeToDelete(address, wait); // to store first seen dead
    +    sleep(wait * 2);
    +    assertTrue(gcwal.timeToDelete(address, wait));
    +  }
    +
    +  @Test
    +  public void testTimeToDeleteFalse() {
    +    HostAndPort address = HostAndPort.fromString("tserver1:9998");
    +    long wait = AccumuloConfiguration.getTimeInMillis("1h");
    +    long t1, t2;
    +    boolean ttd;
    +    do {
    +      t1 = System.nanoTime();
    +      gcwal.clearFirstSeenDead();
    +      gcwal.timeToDelete(address, wait);
    +      ttd = gcwal.timeToDelete(address, wait);
    +      t2 = System.nanoTime();
    +    } while (t2 - t1 > (wait / 2) * 1000000); // as long as it took less than half of the configured wait
    +
    +    assertFalse(gcwal.timeToDelete(address, wait));
    +  }
    +
    +  @Test
    +  public void testTimeToDeleteWithNullAddress() {
    +    assertFalse(gcwal.timeToDelete(null, 123l));
    +  }
    +
    +  /**
    +   * Wrapper class with some helper methods
    +   * <p>
    +   * Just a wrapper around a LinkedHashMap that store method name and argument information. Also includes some convenience methods to make usage cleaner.
    +   */
    +  class MethodCalls {
    +
    +    private LinkedHashMap<String,List<Object>> mapWrapper;
    +
    +    public MethodCalls() {
    +      mapWrapper = new LinkedHashMap<String,List<Object>>();
    +    }
    +
    +    public void put(String methodName, Object... args) {
    +      mapWrapper.put(methodName, Arrays.asList(args));
    +    }
    +
    +    public int size() {
    +      return mapWrapper.size();
    +    }
    +
    +    public boolean hasOneEntry() {
    +      return size() == 1;
    +    }
    +
    +    public Map.Entry<String,List<Object>> getFirstEntry() {
    +      return mapWrapper.entrySet().iterator().next();
    +    }
    +
    +    public String getFirstEntryMethod() {
    +      return getFirstEntry().getKey();
    +    }
    +
    +    public List<Object> getFirstEntryArgs() {
    +      return getFirstEntry().getValue();
    +    }
    +
    +    public Object getFirstEntryArg(int number) {
    +      return getFirstEntryArgs().get(number);
    +    }
    +  }
    +
    +  /**
    +   * Partial mock of the GarbageCollectWriteAheadLogs for testing the removeFile method
    +   * <p>
    +   * There is a map named methodCalls that can be used to assert parameters on methods called inside the removeFile method
    +   */
    +  class GCWALPartialMock extends GarbageCollectWriteAheadLogs {
    +
    +    private boolean holdsLockBool = false;
    +
    +    public GCWALPartialMock(Instance i, VolumeManager vm, boolean useTrash, boolean holdLock) throws IOException {
    +      super(i, vm, useTrash);
    +      this.holdsLockBool = holdLock;
    +    }
    +
    +    public MethodCalls methodCalls = new MethodCalls();
    +
    +    @Override
    +    boolean holdsLock(HostAndPort addr) {
    +      return holdsLockBool;
    +    }
    +
    +    @Override
    +    void removeWALfromDownTserver(HostAndPort address, AccumuloConfiguration conf, Entry<String,ArrayList<Path>> entry, final GCStatus status) {
    +      methodCalls.put("removeWALFromDownTserver", address, conf, entry, status);
    +    }
    +
    +    @Override
    +    void askTserverToRemoveWAL(HostAndPort address, AccumuloConfiguration conf, Entry<String,ArrayList<Path>> entry, final GCStatus status) {
    +      methodCalls.put("askTserverToRemoveWAL", address, conf, entry, status);
    +    }
    +
    +    @Override
    +    void removeOldStyleWAL(Entry<String,ArrayList<Path>> entry, final GCStatus status) {
    +      methodCalls.put("removeOldStyleWAL", entry, status);
    +    }
    +
    +    @Override
    +    void removeSortedWAL(Path swalog) {
    +      methodCalls.put("removeSortedWAL", swalog);
    +    }
    +  }
    +
    +  private GCWALPartialMock getGCWALForRemoveFileTest(GCStatus s, final boolean locked) throws IOException {
    +    return new GCWALPartialMock(new MockInstance("accumulo"), VolumeManagerImpl.get(), false, locked);
    +  }
    +
    +  private Map<String,Path> getEmptyMap() {
    +    return new HashMap<String,Path>();
    +  }
    +
    +  private Map<String,ArrayList<Path>> getServerToFileMap1(String key, Path singlePath) {
    +    Map<String,ArrayList<Path>> serverToFileMap = new HashMap<String,ArrayList<Path>>();
    +    serverToFileMap.put(key, new ArrayList<Path>(Arrays.asList(singlePath)));
    +    return serverToFileMap;
    +  }
    +
    +  @Test
    +  public void testRemoveFilesWithOldStyle() throws IOException {
    +    GCStatus status = new GCStatus();
    +    GarbageCollectWriteAheadLogs realGCWAL = getGCWALForRemoveFileTest(status, true);
    +    Path p1 = new Path("hdfs://localhost:9000/accumulo/wal/tserver1+9997/" + UUID.randomUUID().toString());
    +    Map<String,ArrayList<Path>> serverToFileMap = getServerToFileMap1("", p1);
    +
    +    realGCWAL.removeFiles(getEmptyMap(), serverToFileMap, getEmptyMap(), status);
    +
    +    MethodCalls calls = ((GCWALPartialMock) realGCWAL).methodCalls;
    +    assertEquals("Only one method should have been called", 1, calls.size());
    +    assertEquals("Method should be removeOldStyleWAL", "removeOldStyleWAL", calls.getFirstEntryMethod());
    +    Entry<String,ArrayList<Path>> firstServerToFileMap = serverToFileMap.entrySet().iterator().next();
    +    assertEquals("First param should be empty", firstServerToFileMap, calls.getFirstEntryArg(0));
    +    assertEquals("Second param should be the status", status, calls.getFirstEntryArg(1));
    +  }
    +
    +  @Test
    +  public void testRemoveFilesWithDeadTservers() throws IOException {
    +    GCStatus status = new GCStatus();
    +    GarbageCollectWriteAheadLogs realGCWAL = getGCWALForRemoveFileTest(status, false);
    +    String server = "tserver1+9997";
    +    Path p1 = new Path("hdfs://localhost:9000/accumulo/wal/" + server + "/" + UUID.randomUUID().toString());
    +    Map<String,ArrayList<Path>> serverToFileMap = getServerToFileMap1(server, p1);
    +
    +    realGCWAL.removeFiles(getEmptyMap(), serverToFileMap, getEmptyMap(), status);
    +
    +    MethodCalls calls = ((GCWALPartialMock) realGCWAL).methodCalls;
    +    assertEquals("Only one method should have been called", 1, calls.size());
    +    assertEquals("Method should be removeWALfromDownTserver", "removeWALFromDownTserver", calls.getFirstEntryMethod());
    +    assertEquals("First param should be address", HostAndPort.fromString(server.replaceAll("[+]", ":")), calls.getFirstEntryArg(0));
    +    assertTrue("Second param should be an AccumuloConfiguration", calls.getFirstEntryArg(1) instanceof AccumuloConfiguration);
    +    Entry<String,ArrayList<Path>> firstServerToFileMap = serverToFileMap.entrySet().iterator().next();
    +    assertEquals("Third param should be the entry", firstServerToFileMap, calls.getFirstEntryArg(2));
    +    assertEquals("Forth param should be the status", status, calls.getFirstEntryArg(3));
    +  }
    +
    +  @Test
    +  public void testRemoveFilesWithLiveTservers() throws IOException {
    +    GCStatus status = new GCStatus();
    +    GarbageCollectWriteAheadLogs realGCWAL = getGCWALForRemoveFileTest(status, true);
    +    String server = "tserver1+9997";
    +    Path p1 = new Path("hdfs://localhost:9000/accumulo/wal/" + server + "/" + UUID.randomUUID().toString());
    +    Map<String,ArrayList<Path>> serverToFileMap = getServerToFileMap1(server, p1);
    +
    +    realGCWAL.removeFiles(getEmptyMap(), serverToFileMap, getEmptyMap(), status);
    +
    +    MethodCalls calls = ((GCWALPartialMock) realGCWAL).methodCalls;
    +    assertEquals("Only one method should have been called", 1, calls.size());
    +    assertEquals("Method should be askTserverToRemoveWAL", "askTserverToRemoveWAL", calls.getFirstEntryMethod());
    +    assertEquals("First param should be address", HostAndPort.fromString(server.replaceAll("[+]", ":")), calls.getFirstEntryArg(0));
    +    assertTrue("Second param should be an AccumuloConfiguration", calls.getFirstEntryArg(1) instanceof AccumuloConfiguration);
    +    Entry<String,ArrayList<Path>> firstServerToFileMap = serverToFileMap.entrySet().iterator().next();
    +    assertEquals("Third param should be the entry", firstServerToFileMap, calls.getFirstEntryArg(2));
    +    assertEquals("Forth param should be the status", status, calls.getFirstEntryArg(3));
    +  }
    +
    +  @Test
    +  public void testRemoveFilesRemovesSortedWALs() throws IOException {
    +    GCStatus status = new GCStatus();
    +    GarbageCollectWriteAheadLogs realGCWAL = getGCWALForRemoveFileTest(status, true);
    +    Map<String,ArrayList<Path>> serverToFileMap = new HashMap<String,ArrayList<Path>>();
    +    Map<String,Path> sortedWALogs = new HashMap<String,Path>();
    +    Path p1 = new Path("hdfs://localhost:9000/accumulo/wal/tserver1+9997/" + UUID.randomUUID().toString());
    +    sortedWALogs.put("junk", p1); // TODO: see if this key is actually used here, maybe can be removed
    +
    +    realGCWAL.removeFiles(getEmptyMap(), serverToFileMap, sortedWALogs, status);
    +    MethodCalls calls = ((GCWALPartialMock) realGCWAL).methodCalls;
    +    assertEquals("Only one method should have been called", 1, calls.size());
    +    assertEquals("Method should be removeSortedWAL", "removeSortedWAL", calls.getFirstEntryMethod());
    +    assertEquals("First param should be the Path", p1, calls.getFirstEntryArg(0));
    +
    +  }
    +
    +  static String GCWAL_DEAD_DIR = "gcwal-collect-deadtserver";
    +  static String GCWAL_DEAD_TSERVER = "tserver1";
    +  static String GCWAL_DEAD_TSERVER_PORT = "9995";
    +  static String GCWAL_DEAD_TSERVER_COLLECT_FILE = UUID.randomUUID().toString();
    +
    +  class GCWALDeadTserverCollectMock extends GarbageCollectWriteAheadLogs {
    +
    +    public GCWALDeadTserverCollectMock(Instance i, VolumeManager vm, boolean useTrash) throws IOException {
    +      super(i, vm, useTrash);
    +    }
    +
    +    @Override
    +    boolean holdsLock(HostAndPort addr) {
    +      // tries use zookeeper
    +      return false;
    +    }
    +
    +    @Override
    +    Map<String,Path> getSortedWALogs() {
    +      return new HashMap<String,Path>();
    +    }
    +
    +    @Override
    +    int scanServers(Map<Path,String> fileToServerMap, Map<String,Path> nameToFileMap) throws Exception {
    +      String sep = File.separator;
    +      Path p = new Path(System.getProperty("java.io.tmpdir") + sep + GCWAL_DEAD_DIR + sep + GCWAL_DEAD_TSERVER + "+" + GCWAL_DEAD_TSERVER_PORT + sep
    +          + GCWAL_DEAD_TSERVER_COLLECT_FILE);
    +      fileToServerMap.put(p, GCWAL_DEAD_TSERVER + ":" + GCWAL_DEAD_TSERVER_PORT);
    +      nameToFileMap.put(GCWAL_DEAD_TSERVER_COLLECT_FILE, p);
    +      return 1;
    +    }
    +
    +    @Override
    +    int removeMetadataEntries(Map<String,Path> nameToFileMap, Map<String,Path> sortedWALogs, GCStatus status) throws IOException, KeeperException,
    +        InterruptedException {
    +      return 0;
    +    }
    +
    +    long getGCWALDeadServerWaitTime(AccumuloConfiguration conf) {
    +      // tries to use zookeeper
    +      return 1000l;
    +    }
    +  }
    +
    +  @Test
    +  public void testCollectWithDeadTserver() throws IOException, InterruptedException {
    +    Instance i = new MockInstance();
    +    File walDir = new File(System.getProperty("java.io.tmpdir") + File.separator + GCWAL_DEAD_DIR);
    --- End diff --
    
    Most Accumulo test that create files put them in `target`.  Can use something like this do that. `System.getProperty("user.dir") + "/target/"+GCWAL_DEAD_DIR`.  The prop `user.dir` is the CWD.  Maven sets the CWD to the dir where the pom is.  Its nice having the test write to target on jenkins and environments like that.  Although maybe jenkins sets java.io.tmpdir to something reasonable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65650715
  
    --- Diff: server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java ---
    @@ -170,76 +184,203 @@ boolean holdsLock(HostAndPort addr) {
         }
       }
     
    -  private int removeFiles(Map<String,Path> nameToFileMap, Map<String,ArrayList<Path>> serverToFileMap, Map<String,Path> sortedWALogs, final GCStatus status) {
    -    AccumuloConfiguration conf = ServerConfiguration.getSystemConfiguration(instance);
    +  private AccumuloConfiguration getConfig() {
    +    return ServerConfiguration.getSystemConfiguration(instance);
    +  }
    +
    +  /**
    +   * Top level method for removing WAL files.
    +   * <p>
    +   * Loops over all the gathered WAL and sortedWAL entries and calls the appropriate methods for removal
    +   *
    +   * @param nameToFileMap
    +   *          Map of filename to Path
    +   * @param serverToFileMap
    +   *          Map of HostAndPort string to a list of Paths
    +   * @param sortedWALogs
    +   *          Map of sorted WAL names to Path
    +   * @param status
    +   *          GCStatus object for tracking what is done
    +   * @return 0 always
    +   */
    +  @VisibleForTesting
    +  int removeFiles(Map<String,Path> nameToFileMap, Map<String,ArrayList<Path>> serverToFileMap, Map<String,Path> sortedWALogs, final GCStatus status) {
    +    // TODO: remove nameToFileMap from method signature, not used here I don't think
    +    AccumuloConfiguration conf = getConfig();
         for (Entry<String,ArrayList<Path>> entry : serverToFileMap.entrySet()) {
           if (entry.getKey().isEmpty()) {
    -        // old-style log entry, just remove it
    -        for (Path path : entry.getValue()) {
    -          log.debug("Removing old-style WAL " + path);
    -          try {
    -            if (!useTrash || !fs.moveToTrash(path))
    -              fs.deleteRecursively(path);
    -            status.currentLog.deleted++;
    -          } catch (FileNotFoundException ex) {
    -            // ignored
    -          } catch (IOException ex) {
    -            log.error("Unable to delete wal " + path + ": " + ex);
    -          }
    -        }
    +        removeOldStyleWAL(entry, status);
           } else {
    -        HostAndPort address = AddressUtil.parseAddress(entry.getKey(), false);
    -        if (!holdsLock(address)) {
    -          for (Path path : entry.getValue()) {
    -            log.debug("Removing WAL for offline server " + path);
    -            try {
    -              if (!useTrash || !fs.moveToTrash(path))
    -                fs.deleteRecursively(path);
    -              status.currentLog.deleted++;
    -            } catch (FileNotFoundException ex) {
    -              // ignored
    -            } catch (IOException ex) {
    -              log.error("Unable to delete wal " + path + ": " + ex);
    -            }
    -          }
    -          continue;
    -        } else {
    -          Client tserver = null;
    -          try {
    -            tserver = ThriftUtil.getClient(new TabletClientService.Client.Factory(), address, conf);
    -            tserver.removeLogs(Tracer.traceInfo(), SystemCredentials.get().toThrift(instance), paths2strings(entry.getValue()));
    -            log.debug("deleted " + entry.getValue() + " from " + entry.getKey());
    -            status.currentLog.deleted += entry.getValue().size();
    -          } catch (TException e) {
    -            log.warn("Error talking to " + address + ": " + e);
    -          } finally {
    -            if (tserver != null)
    -              ThriftUtil.returnClient(tserver);
    -          }
    -        }
    +        removeWALFile(entry, conf, status);
           }
         }
    -
         for (Path swalog : sortedWALogs.values()) {
    -      log.debug("Removing sorted WAL " + swalog);
    +      removeSortedWAL(swalog);
    +    }
    +    return 0;
    +  }
    +
    +  /**
    +   * Removes sortedWALs.
    +   * <p>
    +   * Sorted WALs are WALs that are in the recovery directory and have already been used.
    +   *
    +   * @param swalog
    +   *          Path to the WAL
    +   */
    +  @VisibleForTesting
    +  void removeSortedWAL(Path swalog) {
    +    log.debug("Removing sorted WAL " + swalog);
    +    try {
    +      if (!useTrash || !fs.moveToTrash(swalog)) {
    +        fs.deleteRecursively(swalog);
    +      }
    +    } catch (FileNotFoundException ex) {
    +      // ignored
    +    } catch (IOException ioe) {
           try {
    -        if (!useTrash || !fs.moveToTrash(swalog)) {
    -          fs.deleteRecursively(swalog);
    +        if (fs.exists(swalog)) {
    +          log.error("Unable to delete sorted walog " + swalog + ": " + ioe);
             }
    -      } catch (FileNotFoundException ex) {
    -        // ignored
    -      } catch (IOException ioe) {
    +      } catch (IOException ex) {
    +        log.error("Unable to check for the existence of " + swalog, ex);
    +      }
    +    }
    +  }
    +
    +  /**
    +   * A wrapper method to check if the tserver using the WAL is still alive
    +   * <p>
    +   * Delegates to the deletion to #removeWALfromDownTserver if the ZK lock is gone or #askTserverToRemoveWAL if the server is known to still be alive
    +   *
    +   * @param entry
    +   *          WAL information gathered
    +   * @param conf
    +   *          AccumuloConfiguration object
    +   * @param status
    +   *          GCStatus object
    +   */
    +  @VisibleForTesting
    +  void removeWALFile(Entry<String,ArrayList<Path>> entry, AccumuloConfiguration conf, final GCStatus status) {
    +    HostAndPort address = AddressUtil.parseAddress(entry.getKey(), false);
    +    if (!holdsLock(address)) {
    +      removeWALfromDownTserver(address, conf, entry, status);
    +    } else {
    +      askTserverToRemoveWAL(address, conf, entry, status);
    +    }
    +  }
    +
    +  /**
    +   * Asks a currently running tserver to remove it's WALs.
    +   * <p>
    +   * A tserver has more information about whether a WAL is still being used for current mutations. It is safer to ask the tserver to remove the file instead of
    +   * just relying on information in the metadata table.
    +   *
    +   * @param address
    +   *          HostAndPort of the tserver
    +   * @param conf
    +   *          AccumuloConfiguration entry
    +   * @param entry
    +   *          WAL information gathered
    +   * @param status
    +   *          GCStatus object
    +   */
    +  @VisibleForTesting
    +  void askTserverToRemoveWAL(HostAndPort address, AccumuloConfiguration conf, Entry<String,ArrayList<Path>> entry, final GCStatus status) {
    +    firstSeenDead.remove(address);
    +    Client tserver = null;
    +    try {
    +      tserver = ThriftUtil.getClient(new TabletClientService.Client.Factory(), address, conf);
    +      tserver.removeLogs(Tracer.traceInfo(), SystemCredentials.get().toThrift(instance), paths2strings(entry.getValue()));
    +      log.debug("asking tserver to delete " + entry.getValue() + " from " + entry.getKey());
    --- End diff --
    
    I think it should say `asked` instead of `asking` because the log message is after the thrift call.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65576260
  
    --- Diff: server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java ---
    @@ -55,12 +56,16 @@
     import org.apache.zookeeper.KeeperException;
     
     import com.google.common.net.HostAndPort;
    +import java.util.concurrent.TimeUnit;
    +import org.apache.accumulo.core.conf.Property;
     
     public class GarbageCollectWriteAheadLogs {
       private static final Logger log = Logger.getLogger(GarbageCollectWriteAheadLogs.class);
     
       private final Instance instance;
       private final VolumeManager fs;
    +  private final Map<HostAndPort,Long> firstSeenDead = new HashMap<HostAndPort,Long>();
    --- End diff --
    
    No, I don't actually think it's necessary (I haven't looked closely enough), but, at first glance, it looks like it be necessary. I believe the caller of GCWALs does so in such a way that it does not have any race conditions.
    
    Re-worded, I think this is probably fine as-is, but some documentation saying "This is not threadsafe, but the caller is not invoking it in a concurrent manner" would be good for future devs/readers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by mjwall <gi...@git.apache.org>.
Github user mjwall commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65575618
  
    --- Diff: server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java ---
    @@ -55,12 +56,16 @@
     import org.apache.zookeeper.KeeperException;
     
     import com.google.common.net.HostAndPort;
    +import java.util.concurrent.TimeUnit;
    +import org.apache.accumulo.core.conf.Property;
     
     public class GarbageCollectWriteAheadLogs {
       private static final Logger log = Logger.getLogger(GarbageCollectWriteAheadLogs.class);
     
       private final Instance instance;
       private final VolumeManager fs;
    +  private final Map<HostAndPort,Long> firstSeenDead = new HashMap<HostAndPort,Long>();
    --- End diff --
    
    Are you suggesting I use a ConcurrentHashMap?  I thought SimpleGarbageCollector was designed to be one thread out of the [run](https://github.com/apache/accumulo/blob/1.6/server/gc/src/main/java/org/apache/accumulo/gc/SimpleGarbageCollector.java#L513) method. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by mjwall <gi...@git.apache.org>.
Github user mjwall commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/107#discussion_r65704213
  
    --- Diff: server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogs.java ---
    @@ -410,4 +556,41 @@ static boolean isUUID(String name) {
         }
       }
     
    +  /**
    +   * Determine if TServer has been dead long enough to remove associated WALs.
    +   * <p>
    +   * Uses a map where the key is the address and the value is the time first seen dead. If the address is not in the map, it is added with the current system
    +   * nanoTime. When the passed in wait time has elapsed, this method returns true and removes the key and value from the map.
    +   *
    +   * @param address
    +   *          HostAndPort of dead tserver
    +   * @param wait
    +   *          long value of elapsed time to wait
    +   * @return boolean whether enough time elapsed since the server was first seen as dead.
    +   */
    +  @VisibleForTesting
    +  protected boolean timeToDelete(HostAndPort address, long wait) {
    +    // check whether the tserver has been dead long enough
    +    Long firstSeen = firstSeenDead.get(address);
    +    if (firstSeen != null) {
    +      long elapsedTime = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - firstSeen);
    --- End diff --
    
    Good catch, I change it and didn't update the log message.  Will fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #107: ACCUMULO-4157 Bug fix for removing WALs to quickly

Posted by mjwall <gi...@git.apache.org>.
Github user mjwall commented on the issue:

    https://github.com/apache/accumulo/pull/107
  
    Thanks @joshelser @keith-turner and @madrob for looking at this.  I am ready to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #107: ACCUMULO-4157 Bug fix for removing WALs to quick...

Posted by mjwall <gi...@git.apache.org>.
Github user mjwall closed the pull request at:

    https://github.com/apache/accumulo/pull/107


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---