You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2020/11/30 20:06:10 UTC

[GitHub] [accumulo] dlmarion opened a new pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

dlmarion opened a new pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818


   Modified the AccumuloUncaughtExceptionHandler to log Exception and
   Error and Halt the VM if the system property HaltVMOnThreadError
   is set to true. Modified all Executors to use the NamingThreadFactory
   which uses AccumuloUncaughtExceptionHandler. When modifying
   SimpleTimer I found that some critical tasks could fail and only
   be logged. Created SimpleCriticalTimer for the critical tasks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r532883229



##########
File path: core/src/main/java/org/apache/accumulo/core/util/AccumuloUncaughtExceptionHandler.java
##########
@@ -26,10 +26,20 @@
 public class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
 
   private static final Logger log = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
+  private static final String HALT_PROPERTY = "HaltVMOnThreadError";
 
   @Override
   public void uncaughtException(Thread t, Throwable e) {
-    log.error(String.format("Caught an exception in %s.  Shutting down.", t), e);
+    if (e instanceof Exception) {
+      log.error(String.format("Caught an exception in %s. Thread is dead.", t), e);

Review comment:
       ```suggestion
         log.error("Caught an exception in {}. Thread is dead.", t, e);
   ```

##########
File path: server/manager/src/main/java/org/apache/accumulo/master/Master.java
##########
@@ -884,8 +886,9 @@ private long balanceTablets() {
       gatherTableInformation(Set<TServerInstance> currentServers) {
     final long rpcTimeout = getConfiguration().getTimeInMillis(Property.GENERAL_RPC_TIMEOUT);
     int threads = getConfiguration().getCount(Property.MASTER_STATUS_THREAD_POOL_SIZE);
-    ExecutorService tp =
-        threads == 0 ? Executors.newCachedThreadPool() : Executors.newFixedThreadPool(threads);
+    ExecutorService tp = threads == 0
+        ? Executors.newCachedThreadPool(new NamingThreadFactory("GatherTableInformation"))
+        : Executors.newFixedThreadPool(threads, new NamingThreadFactory("GatherTableInformation"));

Review comment:
       This is probably a little easier to read if you assign the ThreadFactory to a variable first, like the following (formatting probably isn't right):
   
   ```suggestion
       var threadFactory = new NamingThreadFactory("GatherTableInformation");
       ExecutorService tp = threads == 0
           ? Executors.newCachedThreadPool(threadFactory)
           : Executors.newFixedThreadPool(threads, threadFactory);
   ```

##########
File path: core/src/main/java/org/apache/accumulo/core/util/AccumuloUncaughtExceptionHandler.java
##########
@@ -26,10 +26,20 @@
 public class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
 
   private static final Logger log = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
+  private static final String HALT_PROPERTY = "HaltVMOnThreadError";
 
   @Override
   public void uncaughtException(Thread t, Throwable e) {
-    log.error(String.format("Caught an exception in %s.  Shutting down.", t), e);
+    if (e instanceof Exception) {
+      log.error(String.format("Caught an exception in %s. Thread is dead.", t), e);
+    } else {
+      if (System.getProperty(HALT_PROPERTY, "false").equals("true")) {
+        log.error(String.format("Caught an exception in %s.", t), e);

Review comment:
       We don't need String.format here. The slf4j logger supports substitution with `{}`
   
   ```suggestion
           log.error("Caught an exception in {}.", t, e);
   ```

##########
File path: assemble/conf/accumulo-env.sh
##########
@@ -95,7 +95,7 @@ JAVA_OPTS=("${JAVA_OPTS[@]}"
 
 case "$cmd" in
   monitor|gc|master|tserver|tracer)
-    JAVA_OPTS=("${JAVA_OPTS[@]}" "-Dlog4j.configurationFile=log4j2-service.properties")
+    JAVA_OPTS=("${JAVA_OPTS[@]}" "-Dlog4j.configurationFile=log4j2-service.properties" "-DHaltVMOnThreadError=true")

Review comment:
       Interesting strategy. I wonder if the way you're handling this bypasses our setting for `'-XX:OnOutOfMemoryError=kill -9 %p'` in these threads.

##########
File path: server/base/src/main/java/org/apache/accumulo/server/util/time/SimpleCriticalTimer.java
##########
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.server.util.time;
+
+import java.lang.Thread.UncaughtExceptionHandler;
+
+import org.apache.accumulo.core.conf.AccumuloConfiguration;
+import org.apache.accumulo.core.conf.Property;
+import org.apache.accumulo.core.util.AccumuloUncaughtExceptionHandler;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Generic singleton timer for critical tasks.
+ */
+public class SimpleCriticalTimer extends SimpleTimer {

Review comment:
       What differentiates this from `SimpleTimer`? Could improve the javadoc here with a high-level explanation.

##########
File path: core/src/main/java/org/apache/accumulo/core/util/AccumuloUncaughtExceptionHandler.java
##########
@@ -26,10 +26,20 @@
 public class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
 
   private static final Logger log = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
+  private static final String HALT_PROPERTY = "HaltVMOnThreadError";
 
   @Override
   public void uncaughtException(Thread t, Throwable e) {
-    log.error(String.format("Caught an exception in %s.  Shutting down.", t), e);
+    if (e instanceof Exception) {
+      log.error(String.format("Caught an exception in %s. Thread is dead.", t), e);
+    } else {
+      if (System.getProperty(HALT_PROPERTY, "false").equals("true")) {
+        log.error(String.format("Caught an exception in %s.", t), e);
+        Halt.halt(String.format("Caught an exception in %s. Halting VM, check the logs.", t));
+      } else {
+        log.error(String.format("Caught an exception in %s. Thread is dead.", t), e);

Review comment:
       ```suggestion
           log.error("Caught an exception in {}. Thread is dead.", t, e);
   ```

##########
File path: server/base/src/main/java/org/apache/accumulo/server/util/time/SimpleTimer.java
##########
@@ -100,10 +93,19 @@ static int getInstanceThreadPoolSize() {
     return instanceThreadPoolSize;
   }
 
-  private SimpleTimer(int threadPoolSize) {
+  protected SimpleTimer(int threadPoolSize) {
     executor = Executors.newScheduledThreadPool(threadPoolSize,
         new ThreadFactoryBuilder().setNameFormat("SimpleTimer-%d").setDaemon(true)
-            .setUncaughtExceptionHandler(new ExceptionHandler()).build());
+            .setUncaughtExceptionHandler(getUncaughtExceptionHandler()).build());
+  }
+
+  protected Thread.UncaughtExceptionHandler getUncaughtExceptionHandler() {
+    return new Thread.UncaughtExceptionHandler() {
+      @Override
+      public void uncaughtException(Thread t, Throwable e) {
+        log.warn("SimpleTimer task failed", e);
+      }
+    };

Review comment:
       I'm curious what non-critical use cases we still have for this, that would warrant continuing to swallow OOM errors and such.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] brianloss commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
brianloss commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r555171268



##########
File path: core/src/main/java/org/apache/accumulo/core/clientImpl/TabletServerBatchWriter.java
##########
@@ -198,6 +197,8 @@ public long getTimeOut() {
 
   public TabletServerBatchWriter(ClientContext context, BatchWriterConfig config) {
     this.context = context;
+    this.executor = ThreadPools.getGeneralScheduledExecutorService(this.context.getConfiguration());
+    this.failedMutations = new FailedMutations();

Review comment:
       No worries--I wasn't suggesting a change, but rather just trying to figure out if I missed something special about that one variable.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-747575246


   So you are running with Java 8 and the jar file was compile with Java 11, for example. That jar will need to be re-compiled with Java 8.
   
   https://stackoverflow.com/questions/9170832/list-of-java-class-file-format-major-version-numbers


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r556541601



##########
File path: core/src/main/java/org/apache/accumulo/core/util/Halt.java
##########
@@ -50,10 +51,9 @@ public void run() {
   public static void halt(final int status, Runnable runnable) {
     try {
       // give ourselves a little time to try and do something
-      new Thread() {
+      new Daemon() {

Review comment:
       We should avoid using Hadoop's Daemon. I doubt that is stable public API. Even if it is, we don't want to increase our dependency on Hadoop library code for this. It should be simple to construct the object assigned to a local variable, set it to daemon, and start it in 3 statements, rather than rely on a the Hadoop library.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-747567220


   Do you have a stack trace?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-747603412


   Accumulo 2.1.0-SNAPSHOT actually requires Java 11. You shouldn't need to recompile with Java 8 (accumulo-testing already compiles as Java 8 using the `-release 8` flag). You should make sure Java 11 is available on your runtime. Make sure JAVA_HOME points to your Java 11 installation on whatever node you're running the verify code.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-738309134


   I consolidated most of the Thread and ThreadPool creation down to a consistent code path through new classes called Threads and ThreadPools. All threads created are Daemon and have an UncaughtExceptionHandler set on them. The UncaughtExceptionHandler implementation will halt the JVM when it runs into serious errors.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] FineAndDandy commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
FineAndDandy commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r556632737



##########
File path: core/src/main/java/org/apache/accumulo/core/util/threads/AccumuloUncaughtExceptionHandler.java
##########
@@ -16,20 +16,36 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-package org.apache.accumulo.core.util;
+package org.apache.accumulo.core.util.threads;
 
 import java.lang.Thread.UncaughtExceptionHandler;
 
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-public class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
+/**
+ * UncaughtExceptionHandler that logs all Exceptions and Errors thrown from a Thread. If an Error is
+ * thrown, halt the JVM.
+ *
+ */
+class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
 
-  private static final Logger log = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
+  private static final Logger LOG = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
 
   @Override
   public void uncaughtException(Thread t, Throwable e) {
-    log.error(String.format("Caught an exception in %s.  Shutting down.", t), e);
+    if (e instanceof Exception) {
+      LOG.error("Caught an Exception in {}. Thread is dead.", t, e);
+    } else if (e instanceof Error) {
+      try {
+        e.printStackTrace();
+        System.err.println("Error thrown in thread: " + t + ", halting VM.");
+      } catch (Throwable e1) {
+        // If e == OutOfMemoryError, then it's probably that another Error might be
+        // thrown when trying to print to System.err.
+      } finally {
+        Runtime.getRuntime().halt(-1);

Review comment:
       I would argue that there are potential subclasses of Error, that when appearing in a Scan thread should simply log and fail the scan, not halt the entire tserver. It's easy to agree that an OutOfMemoryError warrants halting, but what about a miss-configured context that leads to a UnsupportedClassVersionError or an IOError on a particular file? Particularly the scan threads need to have more flexibility since they are not necessarily running core code. One bad iterator/edge case should not be able to take a cluster down. If this is asking too much than having a configurable list of classes/subclasses that result in a halt would feel more comfortable. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-745386995


   Ok, I need to fixup some code around that method. It should not affect the test you are running though. You might just see some of those log messages.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r556578502



##########
File path: core/src/main/java/org/apache/accumulo/core/classloader/DefaultContextClassLoaderFactory.java
##########
@@ -59,18 +60,23 @@ public DefaultContextClassLoaderFactory(final AccumuloConfiguration accConf) {
   }
 
   private static void startCleanupThread(final Supplier<Map<String,String>> contextConfigSupplier) {
-    new Timer(className + "-cleanup", true).scheduleAtFixedRate(new TimerTask() {
-      @Override
-      public void run() {
-        Map<String,String> contextConfigs = contextConfigSupplier.get();
-        LOG.trace("{}-cleanup thread, properties: {}", className, contextConfigs);
-        int prefixlen = Property.VFS_CONTEXT_CLASSPATH_PROPERTY.getKey().length();
-        Set<String> contextsInUse = contextConfigs.keySet().stream()
-            .map(k -> k.substring(prefixlen)).collect(Collectors.toSet());
-        LOG.trace("{}-cleanup thread, contexts in use: {}", className, contextsInUse);
-        AccumuloVFSClassLoader.removeUnusedContexts(contextsInUse);
-      }
-    }, 60_000, 60_000);
+    final ConfigurationCopy threadPoolProperties =
+        new ConfigurationCopy(contextConfigSupplier.get());
+    String size = threadPoolProperties.get(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE);
+    if (null == size || size.isEmpty()) {
+      threadPoolProperties.set(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE,
+          Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE.getDefaultValue());
+    }

Review comment:
       Reading properties other than those that start with the VFS_CONTEXT_CLASSPATH_PROPERTY won't work here, because the contextConfigSupplier already filters out all other properties. So, you can't read the SIMPLETIMER properties from this map.

##########
File path: core/src/main/java/org/apache/accumulo/core/classloader/DefaultContextClassLoaderFactory.java
##########
@@ -59,18 +60,23 @@ public DefaultContextClassLoaderFactory(final AccumuloConfiguration accConf) {
   }
 
   private static void startCleanupThread(final Supplier<Map<String,String>> contextConfigSupplier) {
-    new Timer(className + "-cleanup", true).scheduleAtFixedRate(new TimerTask() {
-      @Override
-      public void run() {
-        Map<String,String> contextConfigs = contextConfigSupplier.get();
-        LOG.trace("{}-cleanup thread, properties: {}", className, contextConfigs);
-        int prefixlen = Property.VFS_CONTEXT_CLASSPATH_PROPERTY.getKey().length();
-        Set<String> contextsInUse = contextConfigs.keySet().stream()
-            .map(k -> k.substring(prefixlen)).collect(Collectors.toSet());
-        LOG.trace("{}-cleanup thread, contexts in use: {}", className, contextsInUse);
-        AccumuloVFSClassLoader.removeUnusedContexts(contextsInUse);
-      }
-    }, 60_000, 60_000);
+    final ConfigurationCopy threadPoolProperties =
+        new ConfigurationCopy(contextConfigSupplier.get());
+    String size = threadPoolProperties.get(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE);
+    if (null == size || size.isEmpty()) {
+      threadPoolProperties.set(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE,
+          Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE.getDefaultValue());
+    }
+    ThreadPools.createGeneralScheduledExecutorService(threadPoolProperties)
+        .scheduleWithFixedDelay(Threads.createNamedRunnable(className + "-cleanup", () -> {
+          ConfigurationCopy contextCleanerProperties =
+              new ConfigurationCopy(contextConfigSupplier.get());
+          LOG.trace("{}-cleanup thread, properties: {}", className, threadPoolProperties);
+          Set<String> contextsInUse = contextCleanerProperties
+              .getAllPropertiesWithPrefixStripped(Property.VFS_CONTEXT_CLASSPATH_PROPERTY).keySet();

Review comment:
       Placing it in a new ConfigurationCopy to use the method that strips the property prefix out seems a bit more work than the old code did. The contextConfigSupplier already filters matching items, so all we need to do is strip out the prefix. We can avoid creating a new ConfigurationCopy object for this.

##########
File path: core/src/main/java/org/apache/accumulo/core/util/Halt.java
##########
@@ -49,15 +49,16 @@ public void run() {
   }
 
   public static void halt(final int status, Runnable runnable) {
+
     try {
       // give ourselves a little time to try and do something
-      new Daemon() {
+      Threads.createThread("Halt Thread", new Runnable() {
         @Override
         public void run() {
           sleepUninterruptibly(100, TimeUnit.MILLISECONDS);
           Runtime.getRuntime().halt(status);
         }
-      }.start();
+      }).start();

Review comment:
       With a lambda, this would be slightly shorter:
   
   ```java
         Threads.createThread("Halt Thread", () -> {
           sleepUninterruptibly(100, TimeUnit.MILLISECONDS);
           Runtime.getRuntime().halt(status);
         }).start();
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r556523090



##########
File path: core/src/main/java/org/apache/accumulo/core/util/Halt.java
##########
@@ -51,9 +50,10 @@ public void run() {
   public static void halt(final int status, Runnable runnable) {
     try {
       // give ourselves a little time to try and do something
-      new Daemon() {
+      new Thread() {
         @Override
         public void run() {
+          setDaemon(true);

Review comment:
       Great catch. This won't work, I will fix.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r536897371



##########
File path: core/src/main/java/org/apache/accumulo/core/util/ThreadPools.java
##########
@@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.util;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.OptionalInt;
+import java.util.concurrent.BlockingQueue;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Future;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.ScheduledThreadPoolExecutor;
+import java.util.concurrent.SynchronousQueue;
+import java.util.concurrent.ThreadFactory;
+import java.util.concurrent.ThreadPoolExecutor;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import org.apache.accumulo.core.conf.AccumuloConfiguration;
+import org.apache.accumulo.core.conf.Property;
+import org.apache.accumulo.core.util.Threads.NamedRunnable;
+import org.apache.htrace.wrappers.TraceCallable;
+import org.apache.htrace.wrappers.TraceRunnable;
+
+public class ThreadPools {
+
+  /**
+   * ThreadFactory that sets the name and optionally the priority on a newly created Thread.
+   */
+  private static class NamedThreadFactory implements ThreadFactory {
+
+    private static final String FORMAT = "%s-%s-%d";
+
+    private AtomicInteger threadNum = new AtomicInteger(1);
+    private String name;
+    private OptionalInt priority;
+
+    private NamedThreadFactory(String name) {
+      this(name, OptionalInt.empty());
+    }
+
+    private NamedThreadFactory(String name, OptionalInt priority) {
+      this.name = name;
+      this.priority = priority;
+    }
+
+    @Override
+    public Thread newThread(Runnable r) {
+      String threadName = null;
+      if (r instanceof NamedRunnable) {
+        NamedRunnable nr = (NamedRunnable) r;
+        threadName = String.format(FORMAT, name, nr.getName(), threadNum.getAndIncrement());
+      } else {
+        threadName =
+            String.format(FORMAT, name, r.getClass().getSimpleName(), threadNum.getAndIncrement());
+      }
+      return Threads.createThread(threadName, priority, r);
+    }
+  }
+
+  /**
+   * ScheduledThreadPoolExecutor that traces executed tasks.
+   */
+  public static class TracingScheduledThreadPoolExecutor extends ScheduledThreadPoolExecutor {
+
+    private TracingScheduledThreadPoolExecutor(int corePoolSize, ThreadFactory threadFactory) {
+      super(corePoolSize, threadFactory);
+    }
+
+    @Override
+    public void execute(Runnable command) {
+      super.execute(new TraceRunnable(command));
+    }
+
+    @Override
+    public <T> Future<T> submit(Callable<T> task) {
+      return super.submit(new TraceCallable<T>(task));
+    }
+
+    @Override
+    public <T> Future<T> submit(Runnable task, T result) {
+      return super.submit(new TraceRunnable(task), result);
+    }
+
+    @Override
+    public Future<?> submit(Runnable task) {
+      return super.submit(new TraceRunnable(task));
+    }
+
+    private <T> Collection<? extends Callable<T>>
+        wrapCollection(Collection<? extends Callable<T>> tasks) {
+      List<Callable<T>> result = new ArrayList<Callable<T>>(tasks.size());
+      tasks.forEach(t -> result.add(new TraceCallable<T>(t)));
+      return result;
+    }
+
+    @Override
+    public <T> List<Future<T>> invokeAll(Collection<? extends Callable<T>> tasks)
+        throws InterruptedException {
+      return super.invokeAll(wrapCollection(tasks));
+    }
+
+    @Override
+    public <T> List<Future<T>> invokeAll(Collection<? extends Callable<T>> tasks, long timeout,
+        TimeUnit unit) throws InterruptedException {
+      return super.invokeAll(wrapCollection(tasks), timeout, unit);
+    }
+
+    @Override
+    public <T> T invokeAny(Collection<? extends Callable<T>> tasks)
+        throws InterruptedException, ExecutionException {
+      return super.invokeAny(wrapCollection(tasks));
+    }
+
+    @Override
+    public <T> T invokeAny(Collection<? extends Callable<T>> tasks, long timeout, TimeUnit unit)
+        throws InterruptedException, ExecutionException, TimeoutException {
+      return super.invokeAny(wrapCollection(tasks), timeout, unit);
+    }
+
+    @Override
+    public void shutdown() {
+      super.shutdown();
+    }
+
+    @Override
+    public List<Runnable> shutdownNow() {
+      return super.shutdownNow();
+    }
+
+  }
+
+  /**
+   * ThreadPoolExecutor that traces executed tasks.
+   */
+  public static class TracingThreadPoolExecutor extends ThreadPoolExecutor {
+
+    private TracingThreadPoolExecutor(int corePoolSize, int maximumPoolSize, long keepAliveTime,
+        TimeUnit unit, BlockingQueue<Runnable> workQueue, ThreadFactory threadFactory) {
+      super(corePoolSize, maximumPoolSize, keepAliveTime, unit, workQueue, threadFactory);
+    }
+
+    @Override
+    public void execute(Runnable command) {
+      super.execute(new TraceRunnable(command));
+    }
+
+    @Override
+    public <T> Future<T> submit(Callable<T> task) {
+      return super.submit(new TraceCallable<T>(task));
+    }
+
+    @Override
+    public <T> Future<T> submit(Runnable task, T result) {
+      return super.submit(new TraceRunnable(task), result);
+    }
+
+    @Override
+    public Future<?> submit(Runnable task) {
+      return super.submit(new TraceRunnable(task));
+    }
+
+    private <T> Collection<? extends Callable<T>>
+        wrapCollection(Collection<? extends Callable<T>> tasks) {
+      List<Callable<T>> result = new ArrayList<Callable<T>>(tasks.size());
+      tasks.forEach(t -> result.add(new TraceCallable<T>(t)));
+      return result;
+    }
+
+    @Override
+    public <T> List<Future<T>> invokeAll(Collection<? extends Callable<T>> tasks)
+        throws InterruptedException {
+      return super.invokeAll(wrapCollection(tasks));
+    }
+
+    @Override
+    public <T> List<Future<T>> invokeAll(Collection<? extends Callable<T>> tasks, long timeout,
+        TimeUnit unit) throws InterruptedException {
+      return super.invokeAll(wrapCollection(tasks), timeout, unit);
+    }
+
+    @Override
+    public <T> T invokeAny(Collection<? extends Callable<T>> tasks)
+        throws InterruptedException, ExecutionException {
+      return super.invokeAny(wrapCollection(tasks));
+    }
+
+    @Override
+    public <T> T invokeAny(Collection<? extends Callable<T>> tasks, long timeout, TimeUnit unit)
+        throws InterruptedException, ExecutionException, TimeoutException {
+      return super.invokeAny(wrapCollection(tasks), timeout, unit);
+    }
+
+    @Override
+    public void shutdown() {
+      super.shutdown();
+    }
+
+    @Override
+    public List<Runnable> shutdownNow() {
+      return super.shutdownNow();
+    }

Review comment:
       Yeah, these were leftovers from an earlier implementation idea.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
Manno15 commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-746122262


   From my CI run, everything seems to work. The only warnings I got were the ones I mentioned prior and things that seem related to hadoop and not this. I'll still paste them below just in case. They're all apart of the `org.apache.hadoop.hdfs.DataStreamer` package. I will update the branch to include your most recent changes and run RW today. 
   
   ```
   java.io.IOException: Bad response ERROR for BP-281662969-10.0.0.1-1608046410373:blk_1073761487_20926 from datanode DatanodeInfoWithStorage[10.0.0.2:9866,DS-bf1d7fff-1c80-48ce-89db-529393752060,DISK]
   	at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1126)
   ```
   ```
     | Error  Recovery for BP-281662969-10.0.0.1-1608046410373:blk_1073761488_20927  in pipeline  [DatanodeInfoWithStorage[10.0.0.5:9866,DS-5b3d88cd-2ced-4bfe-96dc-bacfd5192ac2,DISK],   DatanodeInfoWithStorage[10.0.0.2:9866,DS-bf1d7fff-1c80-48ce-89db-529393752060,DISK]]:  datanode 1(DatanodeInfoWithStorage[10.0.
   ```
   
   ```
   Slow waitForAckedSeqno took 65187ms (threshold=30000ms). File being written: /accumulo/tables/3/t-00005j8/F0000fet.rf_tmp, block: BP-281662969-10.0.0.1-1608046410373:blk_1073761487_21010, Write pipeline datanodes: [DatanodeInfoWithStorage[10.0.0.5:9866,DS-5b3d88cd-2ced-4bfe-96dc-bacfd5192ac2,DISK]].
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
Manno15 commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-746803138


   Sorry for not providing more info. 7 nodes but only one ingester and without agitation. I will work on running more tests with agitation as well. Everything else is a clean accumulo installation, with no change in settings except for heap sizes. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r556680180



##########
File path: core/src/main/java/org/apache/accumulo/core/classloader/DefaultContextClassLoaderFactory.java
##########
@@ -55,22 +56,21 @@ public DefaultContextClassLoaderFactory(final AccumuloConfiguration accConf) {
         () -> accConf.getAllPropertiesWithPrefix(Property.VFS_CONTEXT_CLASSPATH_PROPERTY);
     AccumuloVFSClassLoader.setContextConfig(contextConfigSupplier);
     LOG.debug("ContextManager configuration set");
-    startCleanupThread(contextConfigSupplier);
+    startCleanupThread(accConf, contextConfigSupplier);
   }
 
-  private static void startCleanupThread(final Supplier<Map<String,String>> contextConfigSupplier) {
-    new Timer(className + "-cleanup", true).scheduleAtFixedRate(new TimerTask() {
-      @Override
-      public void run() {
-        Map<String,String> contextConfigs = contextConfigSupplier.get();
-        LOG.trace("{}-cleanup thread, properties: {}", className, contextConfigs);
-        int prefixlen = Property.VFS_CONTEXT_CLASSPATH_PROPERTY.getKey().length();
-        Set<String> contextsInUse = contextConfigs.keySet().stream()
-            .map(k -> k.substring(prefixlen)).collect(Collectors.toSet());
-        LOG.trace("{}-cleanup thread, contexts in use: {}", className, contextsInUse);
-        AccumuloVFSClassLoader.removeUnusedContexts(contextsInUse);
-      }
-    }, 60_000, 60_000);
+  private static void startCleanupThread(final AccumuloConfiguration conf,
+      final Supplier<Map<String,String>> contextConfigSupplier) {
+    ThreadPools.createGeneralScheduledExecutorService(conf)
+        .scheduleWithFixedDelay(Threads.createNamedRunnable(className + "-cleanup", () -> {
+          LOG.trace("{}-cleanup thread, properties: {}", className, conf);
+          Set<String> contextsInUse = contextConfigSupplier.get().keySet().stream()
+              .filter(k -> k.startsWith(Property.VFS_CONTEXT_CLASSPATH_PROPERTY.name()))

Review comment:
       Two things are wrong with this filter line:
   
   1. This is using the enum `name()`, when it should be filtering on the enum's `getKey()` method, and
   2. It's not even necessary to execute this filter, because the contextConfigSupplier already performs this filter.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r536526096



##########
File path: core/src/main/java/org/apache/accumulo/core/util/ThreadPools.java
##########
@@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.util;

Review comment:
       Let's move this to a subpackage of util. Util is pretty bloated, and in a subpackage, we can break out all the inner classes. Inner classes make it hard to follow the code. package-private classes would be better than private inner classes.

##########
File path: core/src/main/java/org/apache/accumulo/core/util/ThreadPools.java
##########
@@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.util;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.OptionalInt;
+import java.util.concurrent.BlockingQueue;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Future;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.ScheduledThreadPoolExecutor;
+import java.util.concurrent.SynchronousQueue;
+import java.util.concurrent.ThreadFactory;
+import java.util.concurrent.ThreadPoolExecutor;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import org.apache.accumulo.core.conf.AccumuloConfiguration;
+import org.apache.accumulo.core.conf.Property;
+import org.apache.accumulo.core.util.Threads.NamedRunnable;
+import org.apache.htrace.wrappers.TraceCallable;
+import org.apache.htrace.wrappers.TraceRunnable;
+
+public class ThreadPools {
+
+  /**
+   * ThreadFactory that sets the name and optionally the priority on a newly created Thread.
+   */
+  private static class NamedThreadFactory implements ThreadFactory {
+
+    private static final String FORMAT = "%s-%s-%d";
+
+    private AtomicInteger threadNum = new AtomicInteger(1);
+    private String name;
+    private OptionalInt priority;
+
+    private NamedThreadFactory(String name) {
+      this(name, OptionalInt.empty());
+    }
+
+    private NamedThreadFactory(String name, OptionalInt priority) {
+      this.name = name;
+      this.priority = priority;
+    }
+
+    @Override
+    public Thread newThread(Runnable r) {
+      String threadName = null;
+      if (r instanceof NamedRunnable) {
+        NamedRunnable nr = (NamedRunnable) r;
+        threadName = String.format(FORMAT, name, nr.getName(), threadNum.getAndIncrement());
+      } else {
+        threadName =
+            String.format(FORMAT, name, r.getClass().getSimpleName(), threadNum.getAndIncrement());
+      }
+      return Threads.createThread(threadName, priority, r);
+    }
+  }
+
+  /**
+   * ScheduledThreadPoolExecutor that traces executed tasks.
+   */
+  public static class TracingScheduledThreadPoolExecutor extends ScheduledThreadPoolExecutor {
+
+    private TracingScheduledThreadPoolExecutor(int corePoolSize, ThreadFactory threadFactory) {
+      super(corePoolSize, threadFactory);
+    }
+
+    @Override
+    public void execute(Runnable command) {
+      super.execute(new TraceRunnable(command));
+    }
+
+    @Override
+    public <T> Future<T> submit(Callable<T> task) {
+      return super.submit(new TraceCallable<T>(task));
+    }
+
+    @Override
+    public <T> Future<T> submit(Runnable task, T result) {
+      return super.submit(new TraceRunnable(task), result);
+    }
+
+    @Override
+    public Future<?> submit(Runnable task) {
+      return super.submit(new TraceRunnable(task));
+    }
+
+    private <T> Collection<? extends Callable<T>>
+        wrapCollection(Collection<? extends Callable<T>> tasks) {
+      List<Callable<T>> result = new ArrayList<Callable<T>>(tasks.size());
+      tasks.forEach(t -> result.add(new TraceCallable<T>(t)));
+      return result;
+    }
+
+    @Override
+    public <T> List<Future<T>> invokeAll(Collection<? extends Callable<T>> tasks)
+        throws InterruptedException {
+      return super.invokeAll(wrapCollection(tasks));
+    }
+
+    @Override
+    public <T> List<Future<T>> invokeAll(Collection<? extends Callable<T>> tasks, long timeout,
+        TimeUnit unit) throws InterruptedException {
+      return super.invokeAll(wrapCollection(tasks), timeout, unit);
+    }
+
+    @Override
+    public <T> T invokeAny(Collection<? extends Callable<T>> tasks)
+        throws InterruptedException, ExecutionException {
+      return super.invokeAny(wrapCollection(tasks));
+    }
+
+    @Override
+    public <T> T invokeAny(Collection<? extends Callable<T>> tasks, long timeout, TimeUnit unit)
+        throws InterruptedException, ExecutionException, TimeoutException {
+      return super.invokeAny(wrapCollection(tasks), timeout, unit);
+    }
+
+    @Override
+    public void shutdown() {
+      super.shutdown();
+    }
+
+    @Override
+    public List<Runnable> shutdownNow() {
+      return super.shutdownNow();
+    }
+
+  }
+
+  /**
+   * ThreadPoolExecutor that traces executed tasks.
+   */
+  public static class TracingThreadPoolExecutor extends ThreadPoolExecutor {
+
+    private TracingThreadPoolExecutor(int corePoolSize, int maximumPoolSize, long keepAliveTime,
+        TimeUnit unit, BlockingQueue<Runnable> workQueue, ThreadFactory threadFactory) {
+      super(corePoolSize, maximumPoolSize, keepAliveTime, unit, workQueue, threadFactory);
+    }
+
+    @Override
+    public void execute(Runnable command) {
+      super.execute(new TraceRunnable(command));
+    }
+
+    @Override
+    public <T> Future<T> submit(Callable<T> task) {
+      return super.submit(new TraceCallable<T>(task));
+    }
+
+    @Override
+    public <T> Future<T> submit(Runnable task, T result) {
+      return super.submit(new TraceRunnable(task), result);
+    }
+
+    @Override
+    public Future<?> submit(Runnable task) {
+      return super.submit(new TraceRunnable(task));
+    }
+
+    private <T> Collection<? extends Callable<T>>
+        wrapCollection(Collection<? extends Callable<T>> tasks) {
+      List<Callable<T>> result = new ArrayList<Callable<T>>(tasks.size());
+      tasks.forEach(t -> result.add(new TraceCallable<T>(t)));
+      return result;
+    }
+
+    @Override
+    public <T> List<Future<T>> invokeAll(Collection<? extends Callable<T>> tasks)
+        throws InterruptedException {
+      return super.invokeAll(wrapCollection(tasks));
+    }
+
+    @Override
+    public <T> List<Future<T>> invokeAll(Collection<? extends Callable<T>> tasks, long timeout,
+        TimeUnit unit) throws InterruptedException {
+      return super.invokeAll(wrapCollection(tasks), timeout, unit);
+    }
+
+    @Override
+    public <T> T invokeAny(Collection<? extends Callable<T>> tasks)
+        throws InterruptedException, ExecutionException {
+      return super.invokeAny(wrapCollection(tasks));
+    }
+
+    @Override
+    public <T> T invokeAny(Collection<? extends Callable<T>> tasks, long timeout, TimeUnit unit)
+        throws InterruptedException, ExecutionException, TimeoutException {
+      return super.invokeAny(wrapCollection(tasks), timeout, unit);
+    }
+
+    @Override
+    public void shutdown() {
+      super.shutdown();
+    }
+
+    @Override
+    public List<Runnable> shutdownNow() {
+      return super.shutdownNow();
+    }

Review comment:
       There's a few of these methods that don't do anything other than call the superclass implementation. These overriding methods can be removed, so the caller of these methods can call the superclass' implementation directly.

##########
File path: core/src/main/java/org/apache/accumulo/core/util/ThreadPools.java
##########
@@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.util;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.OptionalInt;
+import java.util.concurrent.BlockingQueue;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Future;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.ScheduledThreadPoolExecutor;
+import java.util.concurrent.SynchronousQueue;
+import java.util.concurrent.ThreadFactory;
+import java.util.concurrent.ThreadPoolExecutor;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import org.apache.accumulo.core.conf.AccumuloConfiguration;
+import org.apache.accumulo.core.conf.Property;
+import org.apache.accumulo.core.util.Threads.NamedRunnable;
+import org.apache.htrace.wrappers.TraceCallable;
+import org.apache.htrace.wrappers.TraceRunnable;
+
+public class ThreadPools {
+
+  /**
+   * ThreadFactory that sets the name and optionally the priority on a newly created Thread.
+   */
+  private static class NamedThreadFactory implements ThreadFactory {
+
+    private static final String FORMAT = "%s-%s-%d";
+
+    private AtomicInteger threadNum = new AtomicInteger(1);
+    private String name;
+    private OptionalInt priority;
+
+    private NamedThreadFactory(String name) {
+      this(name, OptionalInt.empty());
+    }
+
+    private NamedThreadFactory(String name, OptionalInt priority) {
+      this.name = name;
+      this.priority = priority;
+    }
+
+    @Override
+    public Thread newThread(Runnable r) {
+      String threadName = null;
+      if (r instanceof NamedRunnable) {
+        NamedRunnable nr = (NamedRunnable) r;
+        threadName = String.format(FORMAT, name, nr.getName(), threadNum.getAndIncrement());
+      } else {
+        threadName =
+            String.format(FORMAT, name, r.getClass().getSimpleName(), threadNum.getAndIncrement());
+      }
+      return Threads.createThread(threadName, priority, r);
+    }
+  }
+
+  /**
+   * ScheduledThreadPoolExecutor that traces executed tasks.
+   */
+  public static class TracingScheduledThreadPoolExecutor extends ScheduledThreadPoolExecutor {
+
+    private TracingScheduledThreadPoolExecutor(int corePoolSize, ThreadFactory threadFactory) {
+      super(corePoolSize, threadFactory);
+    }
+
+    @Override
+    public void execute(Runnable command) {
+      super.execute(new TraceRunnable(command));
+    }
+
+    @Override
+    public <T> Future<T> submit(Callable<T> task) {
+      return super.submit(new TraceCallable<T>(task));
+    }
+
+    @Override
+    public <T> Future<T> submit(Runnable task, T result) {
+      return super.submit(new TraceRunnable(task), result);
+    }
+
+    @Override
+    public Future<?> submit(Runnable task) {
+      return super.submit(new TraceRunnable(task));
+    }
+
+    private <T> Collection<? extends Callable<T>>
+        wrapCollection(Collection<? extends Callable<T>> tasks) {
+      List<Callable<T>> result = new ArrayList<Callable<T>>(tasks.size());
+      tasks.forEach(t -> result.add(new TraceCallable<T>(t)));
+      return result;
+    }
+
+    @Override
+    public <T> List<Future<T>> invokeAll(Collection<? extends Callable<T>> tasks)
+        throws InterruptedException {
+      return super.invokeAll(wrapCollection(tasks));
+    }
+
+    @Override
+    public <T> List<Future<T>> invokeAll(Collection<? extends Callable<T>> tasks, long timeout,
+        TimeUnit unit) throws InterruptedException {
+      return super.invokeAll(wrapCollection(tasks), timeout, unit);
+    }
+
+    @Override
+    public <T> T invokeAny(Collection<? extends Callable<T>> tasks)
+        throws InterruptedException, ExecutionException {
+      return super.invokeAny(wrapCollection(tasks));
+    }
+
+    @Override
+    public <T> T invokeAny(Collection<? extends Callable<T>> tasks, long timeout, TimeUnit unit)
+        throws InterruptedException, ExecutionException, TimeoutException {
+      return super.invokeAny(wrapCollection(tasks), timeout, unit);
+    }
+
+    @Override
+    public void shutdown() {
+      super.shutdown();
+    }
+
+    @Override
+    public List<Runnable> shutdownNow() {
+      return super.shutdownNow();
+    }
+
+  }
+
+  /**
+   * ThreadPoolExecutor that traces executed tasks.
+   */
+  public static class TracingThreadPoolExecutor extends ThreadPoolExecutor {
+
+    private TracingThreadPoolExecutor(int corePoolSize, int maximumPoolSize, long keepAliveTime,
+        TimeUnit unit, BlockingQueue<Runnable> workQueue, ThreadFactory threadFactory) {
+      super(corePoolSize, maximumPoolSize, keepAliveTime, unit, workQueue, threadFactory);
+    }
+
+    @Override
+    public void execute(Runnable command) {
+      super.execute(new TraceRunnable(command));
+    }
+
+    @Override
+    public <T> Future<T> submit(Callable<T> task) {
+      return super.submit(new TraceCallable<T>(task));
+    }
+
+    @Override
+    public <T> Future<T> submit(Runnable task, T result) {
+      return super.submit(new TraceRunnable(task), result);
+    }
+
+    @Override
+    public Future<?> submit(Runnable task) {
+      return super.submit(new TraceRunnable(task));
+    }
+
+    private <T> Collection<? extends Callable<T>>
+        wrapCollection(Collection<? extends Callable<T>> tasks) {
+      List<Callable<T>> result = new ArrayList<Callable<T>>(tasks.size());
+      tasks.forEach(t -> result.add(new TraceCallable<T>(t)));
+      return result;
+    }
+
+    @Override
+    public <T> List<Future<T>> invokeAll(Collection<? extends Callable<T>> tasks)
+        throws InterruptedException {
+      return super.invokeAll(wrapCollection(tasks));
+    }
+
+    @Override
+    public <T> List<Future<T>> invokeAll(Collection<? extends Callable<T>> tasks, long timeout,
+        TimeUnit unit) throws InterruptedException {
+      return super.invokeAll(wrapCollection(tasks), timeout, unit);
+    }
+
+    @Override
+    public <T> T invokeAny(Collection<? extends Callable<T>> tasks)
+        throws InterruptedException, ExecutionException {
+      return super.invokeAny(wrapCollection(tasks));
+    }
+
+    @Override
+    public <T> T invokeAny(Collection<? extends Callable<T>> tasks, long timeout, TimeUnit unit)
+        throws InterruptedException, ExecutionException, TimeoutException {
+      return super.invokeAny(wrapCollection(tasks), timeout, unit);
+    }
+
+    @Override
+    public void shutdown() {
+      super.shutdown();
+    }
+
+    @Override
+    public List<Runnable> shutdownNow() {
+      return super.shutdownNow();
+    }
+
+  }
+
+  public static class CloseableThreadPoolExecutor implements AutoCloseable {
+
+    private final ThreadPoolExecutor tpe;
+
+    public CloseableThreadPoolExecutor(ThreadPoolExecutor tpe) {
+      this.tpe = tpe;
+    }
+
+    @Override
+    public void close() throws Exception {
+      this.tpe.shutdownNow();
+    }
+
+  }

Review comment:
       Rather than create this new type, you can just have a different cleanup method in CleanerUtil that calls shutdownNow instead of close on the argument. It should be a much smaller change... and fewer types is good. I can help with this part, if you're not sure what I mean.

##########
File path: core/src/main/java/org/apache/accumulo/core/util/ThreadPools.java
##########
@@ -0,0 +1,368 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.util;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.OptionalInt;
+import java.util.concurrent.BlockingQueue;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Future;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.ScheduledThreadPoolExecutor;
+import java.util.concurrent.SynchronousQueue;
+import java.util.concurrent.ThreadFactory;
+import java.util.concurrent.ThreadPoolExecutor;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import org.apache.accumulo.core.conf.AccumuloConfiguration;
+import org.apache.accumulo.core.conf.Property;
+import org.apache.accumulo.core.util.Threads.NamedRunnable;
+import org.apache.htrace.wrappers.TraceCallable;
+import org.apache.htrace.wrappers.TraceRunnable;
+
+public class ThreadPools {
+
+  /**
+   * ThreadFactory that sets the name and optionally the priority on a newly created Thread.
+   */
+  private static class NamedThreadFactory implements ThreadFactory {
+
+    private static final String FORMAT = "%s-%s-%d";
+
+    private AtomicInteger threadNum = new AtomicInteger(1);
+    private String name;
+    private OptionalInt priority;
+
+    private NamedThreadFactory(String name) {
+      this(name, OptionalInt.empty());
+    }
+
+    private NamedThreadFactory(String name, OptionalInt priority) {
+      this.name = name;
+      this.priority = priority;
+    }
+
+    @Override
+    public Thread newThread(Runnable r) {
+      String threadName = null;
+      if (r instanceof NamedRunnable) {
+        NamedRunnable nr = (NamedRunnable) r;
+        threadName = String.format(FORMAT, name, nr.getName(), threadNum.getAndIncrement());
+      } else {
+        threadName =
+            String.format(FORMAT, name, r.getClass().getSimpleName(), threadNum.getAndIncrement());
+      }
+      return Threads.createThread(threadName, priority, r);
+    }
+  }
+
+  /**
+   * ScheduledThreadPoolExecutor that traces executed tasks.
+   */
+  public static class TracingScheduledThreadPoolExecutor extends ScheduledThreadPoolExecutor {
+
+    private TracingScheduledThreadPoolExecutor(int corePoolSize, ThreadFactory threadFactory) {
+      super(corePoolSize, threadFactory);
+    }
+
+    @Override
+    public void execute(Runnable command) {
+      super.execute(new TraceRunnable(command));
+    }
+
+    @Override
+    public <T> Future<T> submit(Callable<T> task) {
+      return super.submit(new TraceCallable<T>(task));
+    }
+
+    @Override
+    public <T> Future<T> submit(Runnable task, T result) {
+      return super.submit(new TraceRunnable(task), result);
+    }
+
+    @Override
+    public Future<?> submit(Runnable task) {
+      return super.submit(new TraceRunnable(task));
+    }
+
+    private <T> Collection<? extends Callable<T>>
+        wrapCollection(Collection<? extends Callable<T>> tasks) {
+      List<Callable<T>> result = new ArrayList<Callable<T>>(tasks.size());
+      tasks.forEach(t -> result.add(new TraceCallable<T>(t)));
+      return result;
+    }
+
+    @Override
+    public <T> List<Future<T>> invokeAll(Collection<? extends Callable<T>> tasks)
+        throws InterruptedException {
+      return super.invokeAll(wrapCollection(tasks));
+    }
+
+    @Override
+    public <T> List<Future<T>> invokeAll(Collection<? extends Callable<T>> tasks, long timeout,
+        TimeUnit unit) throws InterruptedException {
+      return super.invokeAll(wrapCollection(tasks), timeout, unit);
+    }
+
+    @Override
+    public <T> T invokeAny(Collection<? extends Callable<T>> tasks)
+        throws InterruptedException, ExecutionException {
+      return super.invokeAny(wrapCollection(tasks));
+    }
+
+    @Override
+    public <T> T invokeAny(Collection<? extends Callable<T>> tasks, long timeout, TimeUnit unit)
+        throws InterruptedException, ExecutionException, TimeoutException {
+      return super.invokeAny(wrapCollection(tasks), timeout, unit);
+    }
+
+    @Override
+    public void shutdown() {
+      super.shutdown();
+    }
+
+    @Override
+    public List<Runnable> shutdownNow() {
+      return super.shutdownNow();
+    }
+
+  }
+
+  /**
+   * ThreadPoolExecutor that traces executed tasks.
+   */
+  public static class TracingThreadPoolExecutor extends ThreadPoolExecutor {
+
+    private TracingThreadPoolExecutor(int corePoolSize, int maximumPoolSize, long keepAliveTime,
+        TimeUnit unit, BlockingQueue<Runnable> workQueue, ThreadFactory threadFactory) {
+      super(corePoolSize, maximumPoolSize, keepAliveTime, unit, workQueue, threadFactory);
+    }
+
+    @Override
+    public void execute(Runnable command) {
+      super.execute(new TraceRunnable(command));
+    }
+
+    @Override
+    public <T> Future<T> submit(Callable<T> task) {
+      return super.submit(new TraceCallable<T>(task));
+    }
+
+    @Override
+    public <T> Future<T> submit(Runnable task, T result) {
+      return super.submit(new TraceRunnable(task), result);
+    }
+
+    @Override
+    public Future<?> submit(Runnable task) {
+      return super.submit(new TraceRunnable(task));
+    }
+
+    private <T> Collection<? extends Callable<T>>
+        wrapCollection(Collection<? extends Callable<T>> tasks) {
+      List<Callable<T>> result = new ArrayList<Callable<T>>(tasks.size());
+      tasks.forEach(t -> result.add(new TraceCallable<T>(t)));
+      return result;
+    }
+
+    @Override
+    public <T> List<Future<T>> invokeAll(Collection<? extends Callable<T>> tasks)
+        throws InterruptedException {
+      return super.invokeAll(wrapCollection(tasks));
+    }
+
+    @Override
+    public <T> List<Future<T>> invokeAll(Collection<? extends Callable<T>> tasks, long timeout,
+        TimeUnit unit) throws InterruptedException {
+      return super.invokeAll(wrapCollection(tasks), timeout, unit);
+    }
+
+    @Override
+    public <T> T invokeAny(Collection<? extends Callable<T>> tasks)
+        throws InterruptedException, ExecutionException {
+      return super.invokeAny(wrapCollection(tasks));
+    }
+
+    @Override
+    public <T> T invokeAny(Collection<? extends Callable<T>> tasks, long timeout, TimeUnit unit)
+        throws InterruptedException, ExecutionException, TimeoutException {
+      return super.invokeAny(wrapCollection(tasks), timeout, unit);
+    }
+
+    @Override
+    public void shutdown() {
+      super.shutdown();
+    }
+
+    @Override
+    public List<Runnable> shutdownNow() {
+      return super.shutdownNow();
+    }
+
+  }
+
+  public static class CloseableThreadPoolExecutor implements AutoCloseable {
+
+    private final ThreadPoolExecutor tpe;
+
+    public CloseableThreadPoolExecutor(ThreadPoolExecutor tpe) {
+      this.tpe = tpe;
+    }
+
+    @Override
+    public void close() throws Exception {
+      this.tpe.shutdownNow();
+    }
+
+  }
+
+  // the number of seconds before we allow a thread to terminate with non-use.
+  public static final long DEFAULT_TIMEOUT_MILLISECS = 180000L;
+
+  /**
+   * Get a thread pool based on a thread pool related property
+   *
+   * @param conf
+   *          accumulo configuration
+   * @param p
+   *          thread pool related property
+   * @return ExecutorService impl
+   * @throws RuntimeException
+   *           if property is not handled
+   */
+  public static ExecutorService getExecutorService(AccumuloConfiguration conf, Property p) {

Review comment:
       This method, and how you based it on the property might be my favorite piece of code this year. :smiley_cat:  This was a great way to bring all the various implementations into one centralized place.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-757979184


   > @ctubbsii - you [mentioned](https://github.com/apache/accumulo/pull/1818#pullrequestreview-545415695) that you needed time to go through this. @Manno15 ran some CI tests and I don't think he found any issues. Do you still need more time?
   
   If you don't mind giving me until the end of the day, I would appreciate the opportunity to take another look today.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r556292689



##########
File path: core/src/main/java/org/apache/accumulo/core/clientImpl/TabletServerBatchWriter.java
##########
@@ -571,13 +570,17 @@ private synchronized void addFailedMutations(MutationSet failedMutations) {
     }
   }
 
-  private class FailedMutations extends TimerTask {
+  private class FailedMutations {
 
     private MutationSet recentFailures = null;
     private long initTime;
+    private final Runnable task;
 
     FailedMutations() {
-      jtimer.schedule(this, 0, 500);
+      task = Threads.createNamedRunnable("failed mutationBatchWriterLatencyTimers handler", () -> {
+        run();
+      });

Review comment:
       Would this work?
   
   ```suggestion
         task = Threads.createNamedRunnable("failed mutationBatchWriterLatencyTimers handler", this::run);
   ```

##########
File path: core/src/main/java/org/apache/accumulo/core/classloader/DefaultContextClassLoaderFactory.java
##########
@@ -59,18 +60,20 @@ public DefaultContextClassLoaderFactory(final AccumuloConfiguration accConf) {
   }
 
   private static void startCleanupThread(final Supplier<Map<String,String>> contextConfigSupplier) {
-    new Timer(className + "-cleanup", true).scheduleAtFixedRate(new TimerTask() {
-      @Override
-      public void run() {
-        Map<String,String> contextConfigs = contextConfigSupplier.get();
-        LOG.trace("{}-cleanup thread, properties: {}", className, contextConfigs);
-        int prefixlen = Property.VFS_CONTEXT_CLASSPATH_PROPERTY.getKey().length();
-        Set<String> contextsInUse = contextConfigs.keySet().stream()
-            .map(k -> k.substring(prefixlen)).collect(Collectors.toSet());
-        LOG.trace("{}-cleanup thread, contexts in use: {}", className, contextsInUse);
-        AccumuloVFSClassLoader.removeUnusedContexts(contextsInUse);
-      }
-    }, 60_000, 60_000);
+    final ConfigurationCopy cc = new ConfigurationCopy(contextConfigSupplier.get());
+    String size = cc.get(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE);
+    if (null == size || size.isEmpty()) {
+      cc.set(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE,
+          Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE.getDefaultValue());
+    }
+    ThreadPools.createGeneralScheduledExecutorService(cc)
+        .scheduleWithFixedDelay(Threads.createNamedRunnable(className + "-cleanup", () -> {
+          LOG.trace("{}-cleanup thread, properties: {}", className, cc);
+          Set<String> contextsInUse = cc
+              .getAllPropertiesWithPrefixStripped(Property.VFS_CONTEXT_CLASSPATH_PROPERTY).keySet();

Review comment:
       This updates the context using the ConfigurationCopy instead of the contextConfigSupplier, which would provide it on-demand. Using the ConfigurationCopy means it will get the same initial value every time the thread executes, and will never see the updated value of `VFS_CONTEXT_CLASSPATH_PROPERTY` if it changes.

##########
File path: core/src/main/java/org/apache/accumulo/core/util/Halt.java
##########
@@ -51,9 +50,10 @@ public void run() {
   public static void halt(final int status, Runnable runnable) {
     try {
       // give ourselves a little time to try and do something
-      new Daemon() {
+      new Thread() {
         @Override
         public void run() {
+          setDaemon(true);

Review comment:
       Is it legal to switch to a Daemon thread after it has already started executing? Will this have any effect?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-757975401


   @ctubbsii - you [mentioned](https://github.com/apache/accumulo/pull/1818#pullrequestreview-545415695) that you needed time to go through this. @Manno15 ran some CI tests and I don't think he found any issues. Do you still need more time?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 edited a comment on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
Manno15 edited a comment on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-747381390


   The test with agitation also went well. 7 server, technically 8 node cluster. Received the same warnings as above and the only other warnings/errors are I think due to the nodes going down from agitation.
   
   I do see some able errors but I think I normally get that when I do CI with agitation. 
   Things like:
   ```  
   Error recovering tablet 2;0985e;0945d from log files
   ```
   
   Running verify now. (EDIT): I am having troubling getting verify to run. Running into a Java Runtime issue with mismatch class file versions. Can't seem to fix it for some reason.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r556689854



##########
File path: core/src/main/java/org/apache/accumulo/core/classloader/DefaultContextClassLoaderFactory.java
##########
@@ -55,22 +56,21 @@ public DefaultContextClassLoaderFactory(final AccumuloConfiguration accConf) {
         () -> accConf.getAllPropertiesWithPrefix(Property.VFS_CONTEXT_CLASSPATH_PROPERTY);
     AccumuloVFSClassLoader.setContextConfig(contextConfigSupplier);
     LOG.debug("ContextManager configuration set");
-    startCleanupThread(contextConfigSupplier);
+    startCleanupThread(accConf, contextConfigSupplier);
   }
 
-  private static void startCleanupThread(final Supplier<Map<String,String>> contextConfigSupplier) {
-    new Timer(className + "-cleanup", true).scheduleAtFixedRate(new TimerTask() {
-      @Override
-      public void run() {
-        Map<String,String> contextConfigs = contextConfigSupplier.get();
-        LOG.trace("{}-cleanup thread, properties: {}", className, contextConfigs);
-        int prefixlen = Property.VFS_CONTEXT_CLASSPATH_PROPERTY.getKey().length();
-        Set<String> contextsInUse = contextConfigs.keySet().stream()
-            .map(k -> k.substring(prefixlen)).collect(Collectors.toSet());
-        LOG.trace("{}-cleanup thread, contexts in use: {}", className, contextsInUse);
-        AccumuloVFSClassLoader.removeUnusedContexts(contextsInUse);
-      }
-    }, 60_000, 60_000);
+  private static void startCleanupThread(final AccumuloConfiguration conf,
+      final Supplier<Map<String,String>> contextConfigSupplier) {
+    ThreadPools.createGeneralScheduledExecutorService(conf)
+        .scheduleWithFixedDelay(Threads.createNamedRunnable(className + "-cleanup", () -> {
+          LOG.trace("{}-cleanup thread, properties: {}", className, conf);
+          Set<String> contextsInUse = contextConfigSupplier.get().keySet().stream()
+              .filter(k -> k.startsWith(Property.VFS_CONTEXT_CLASSPATH_PROPERTY.name()))

Review comment:
       Thanks for catching that, I went down one path, then another, and left that in by mistake. Resolved in latest commit.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
Manno15 commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-748041508


   Verify completed successfully. I can post the results if you would want to see them. 
   
   Based on my CI testing (no RW due to that failing, see [accuimulo-testing#132](https://github.com/apache/accumulo-testing/issues/132)) everything seems to be working as expected. I have not run into any fatal issue as of yet even with agitation. I will move on to testing the releases and I can come back to this if more work is needed to be done. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r543498123



##########
File path: core/src/main/java/org/apache/accumulo/core/util/cleaner/CleanerUtil.java
##########
@@ -107,10 +107,14 @@ public static Cleanable shutdownThreadPoolExecutor(Object tpe, Logger log) {
     requireNonNull(tpe);
     requireNonNull(log);
     return CLEANER.register(tpe, () -> {
+      ThreadPoolExecutor pool = (ThreadPoolExecutor) tpe;

Review comment:
       Why cast here? Why not just make the type more restrictive in the method signature 4 lines above?

##########
File path: core/src/main/java/org/apache/accumulo/core/clientImpl/TabletServerBatchReader.java
##########
@@ -72,16 +72,15 @@ protected TabletServerBatchReader(ClientContext context, Class<?> scopeClass, Ta
 
     queryThreadPool = ThreadPools.getFixedThreadPool(numQueryThreads,
         "batch scanner " + batchReaderInstance + "-", false);
+    // Call shutdown on this thread pool in case the caller does not call close().
     cleanable = CleanerUtil.shutdownThreadPoolExecutor(queryThreadPool, log);
   }
 
   @Override
   public void close() {
     if (closed.compareAndSet(false, true)) {
-      // deregister cleanable, but it won't run because it checks
-      // the value of closed first, which is now true
-      cleanable.clean();
       queryThreadPool.shutdownNow();
+      cleanable.clean();

Review comment:
       We really should deregister the cleanable first, so we don't get duplicate exceptions from that if there's a problem calling the shutdown.

##########
File path: core/src/main/java/org/apache/accumulo/core/util/cleaner/CleanerUtil.java
##########
@@ -107,10 +107,14 @@ public static Cleanable shutdownThreadPoolExecutor(Object tpe, Logger log) {
     requireNonNull(tpe);
     requireNonNull(log);
     return CLEANER.register(tpe, () -> {
+      ThreadPoolExecutor pool = (ThreadPoolExecutor) tpe;
+      if (pool.isShutdown()) {
+        return;
+      }

Review comment:
       The other cleanables pass in the closed atomic boolean, so they can detect whether the resource has already been closed, and the cleanable action isn't needed. I suggest following the same pattern, just for convenience, rather than follow a custom pattern here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r556528385



##########
File path: core/src/main/java/org/apache/accumulo/core/classloader/DefaultContextClassLoaderFactory.java
##########
@@ -59,18 +60,20 @@ public DefaultContextClassLoaderFactory(final AccumuloConfiguration accConf) {
   }
 
   private static void startCleanupThread(final Supplier<Map<String,String>> contextConfigSupplier) {
-    new Timer(className + "-cleanup", true).scheduleAtFixedRate(new TimerTask() {
-      @Override
-      public void run() {
-        Map<String,String> contextConfigs = contextConfigSupplier.get();
-        LOG.trace("{}-cleanup thread, properties: {}", className, contextConfigs);
-        int prefixlen = Property.VFS_CONTEXT_CLASSPATH_PROPERTY.getKey().length();
-        Set<String> contextsInUse = contextConfigs.keySet().stream()
-            .map(k -> k.substring(prefixlen)).collect(Collectors.toSet());
-        LOG.trace("{}-cleanup thread, contexts in use: {}", className, contextsInUse);
-        AccumuloVFSClassLoader.removeUnusedContexts(contextsInUse);
-      }
-    }, 60_000, 60_000);
+    final ConfigurationCopy cc = new ConfigurationCopy(contextConfigSupplier.get());
+    String size = cc.get(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE);
+    if (null == size || size.isEmpty()) {
+      cc.set(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE,
+          Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE.getDefaultValue());
+    }
+    ThreadPools.createGeneralScheduledExecutorService(cc)
+        .scheduleWithFixedDelay(Threads.createNamedRunnable(className + "-cleanup", () -> {
+          LOG.trace("{}-cleanup thread, properties: {}", className, cc);
+          Set<String> contextsInUse = cc
+              .getAllPropertiesWithPrefixStripped(Property.VFS_CONTEXT_CLASSPATH_PROPERTY).keySet();

Review comment:
       The ContextClassLoaderFactory is initialized on server process start (AbstractServer constructor). I don't believe that this property was ever dynamic, it's not the table context classpath property which could change, it's the property to load system jars from VFS.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r543514934



##########
File path: core/src/main/java/org/apache/accumulo/core/clientImpl/TabletServerBatchReader.java
##########
@@ -72,16 +72,15 @@ protected TabletServerBatchReader(ClientContext context, Class<?> scopeClass, Ta
 
     queryThreadPool = ThreadPools.getFixedThreadPool(numQueryThreads,
         "batch scanner " + batchReaderInstance + "-", false);
+    // Call shutdown on this thread pool in case the caller does not call close().
     cleanable = CleanerUtil.shutdownThreadPoolExecutor(queryThreadPool, log);
   }
 
   @Override
   public void close() {
     if (closed.compareAndSet(false, true)) {
-      // deregister cleanable, but it won't run because it checks
-      // the value of closed first, which is now true
-      cleanable.clean();
       queryThreadPool.shutdownNow();
+      cleanable.clean();

Review comment:
       So we really only need `cleanable.clean()` here as it deregisters *and* invokes the cleaning action.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r556524086



##########
File path: core/src/main/java/org/apache/accumulo/core/clientImpl/TabletServerBatchWriter.java
##########
@@ -571,13 +570,17 @@ private synchronized void addFailedMutations(MutationSet failedMutations) {
     }
   }
 
-  private class FailedMutations extends TimerTask {
+  private class FailedMutations {
 
     private MutationSet recentFailures = null;
     private long initTime;
+    private final Runnable task;
 
     FailedMutations() {
-      jtimer.schedule(this, 0, 500);
+      task = Threads.createNamedRunnable("failed mutationBatchWriterLatencyTimers handler", () -> {
+        run();
+      });

Review comment:
       That should work, I can make that change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r543636815



##########
File path: core/src/main/java/org/apache/accumulo/core/util/threads/AccumuloUncaughtExceptionHandler.java
##########
@@ -16,20 +16,36 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-package org.apache.accumulo.core.util;
+package org.apache.accumulo.core.util.threads;
 
 import java.lang.Thread.UncaughtExceptionHandler;
 
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-public class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
+/**
+ * UncaughtExceptionHandler that logs all Exceptions and Errors thrown from a Thread. If an Error is
+ * thrown, halt the JVM.
+ *
+ */
+class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
 
-  private static final Logger log = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
+  private static final Logger LOG = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
 
   @Override
   public void uncaughtException(Thread t, Throwable e) {
-    log.error(String.format("Caught an exception in %s.  Shutting down.", t), e);
+    if (e instanceof Exception) {
+      LOG.error("Caught an Exception in {}. Thread is dead.", t, e);
+    } else if (e instanceof Error) {
+      try {
+        e.printStackTrace();
+        System.err.println("Error thrown in thread: " + t + ", halting VM.");
+      } catch (Throwable e1) {
+        // If e == OutOfMemoryError, then it's probably that another Error might be
+        // thrown when trying to print to System.err.
+      } finally {
+        Runtime.getRuntime().halt(-1);

Review comment:
       That just means that we need to make sure that we are catching Exceptions appropriately in the non-critical threads so that it doesn't take down the server.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r555184171



##########
File path: core/src/main/java/org/apache/accumulo/core/classloader/DefaultContextClassLoaderFactory.java
##########
@@ -59,18 +60,20 @@ public DefaultContextClassLoaderFactory(final AccumuloConfiguration accConf) {
   }
 
   private static void startCleanupThread(final Supplier<Map<String,String>> contextConfigSupplier) {
-    new Timer(className + "-cleanup", true).scheduleAtFixedRate(new TimerTask() {
-      @Override
-      public void run() {
-        Map<String,String> contextConfigs = contextConfigSupplier.get();
-        LOG.trace("{}-cleanup thread, properties: {}", className, contextConfigs);
-        int prefixlen = Property.VFS_CONTEXT_CLASSPATH_PROPERTY.getKey().length();
-        Set<String> contextsInUse = contextConfigs.keySet().stream()
-            .map(k -> k.substring(prefixlen)).collect(Collectors.toSet());
-        LOG.trace("{}-cleanup thread, contexts in use: {}", className, contextsInUse);
-        AccumuloVFSClassLoader.removeUnusedContexts(contextsInUse);
-      }
-    }, 60_000, 60_000);
+    final ConfigurationCopy cc = new ConfigurationCopy(contextConfigSupplier.get());
+    String size = cc.get(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE);
+    if (null == size || size.isEmpty()) {
+      cc.set(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE,
+          Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE.getDefaultValue());
+    }

Review comment:
       IIRC this was specific to the issue of the property not being in the Map<String,String> from which the ConfigurationCopy is based upon *and* the fact that ConfigurationCopy does not provide the default value in get(Property) but all of the other implementations do. Putting it here resolved this specific issue. I could move it into `ThreadPools.getGeneralScheduledExecutorService` but it would not be used for the most part. I'm indifferent, I can move it if we have others that think it's a good idea.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-757986642


   No problem, I wasn't sure if you were done or needed more time. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] brianloss commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
brianloss commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r555188361



##########
File path: core/src/main/java/org/apache/accumulo/core/classloader/DefaultContextClassLoaderFactory.java
##########
@@ -59,18 +60,20 @@ public DefaultContextClassLoaderFactory(final AccumuloConfiguration accConf) {
   }
 
   private static void startCleanupThread(final Supplier<Map<String,String>> contextConfigSupplier) {
-    new Timer(className + "-cleanup", true).scheduleAtFixedRate(new TimerTask() {
-      @Override
-      public void run() {
-        Map<String,String> contextConfigs = contextConfigSupplier.get();
-        LOG.trace("{}-cleanup thread, properties: {}", className, contextConfigs);
-        int prefixlen = Property.VFS_CONTEXT_CLASSPATH_PROPERTY.getKey().length();
-        Set<String> contextsInUse = contextConfigs.keySet().stream()
-            .map(k -> k.substring(prefixlen)).collect(Collectors.toSet());
-        LOG.trace("{}-cleanup thread, contexts in use: {}", className, contextsInUse);
-        AccumuloVFSClassLoader.removeUnusedContexts(contextsInUse);
-      }
-    }, 60_000, 60_000);
+    final ConfigurationCopy cc = new ConfigurationCopy(contextConfigSupplier.get());
+    String size = cc.get(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE);
+    if (null == size || size.isEmpty()) {
+      cc.set(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE,
+          Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE.getDefaultValue());
+    }

Review comment:
       That explanation makes sense to me--I just didn't have enough context. Sorry for the noise.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r532900453



##########
File path: server/base/src/main/java/org/apache/accumulo/server/util/time/SimpleTimer.java
##########
@@ -100,10 +93,19 @@ static int getInstanceThreadPoolSize() {
     return instanceThreadPoolSize;
   }
 
-  private SimpleTimer(int threadPoolSize) {
+  protected SimpleTimer(int threadPoolSize) {
     executor = Executors.newScheduledThreadPool(threadPoolSize,
         new ThreadFactoryBuilder().setNameFormat("SimpleTimer-%d").setDaemon(true)
-            .setUncaughtExceptionHandler(new ExceptionHandler()).build());
+            .setUncaughtExceptionHandler(getUncaughtExceptionHandler()).build());
+  }
+
+  protected Thread.UncaughtExceptionHandler getUncaughtExceptionHandler() {
+    return new Thread.UncaughtExceptionHandler() {
+      @Override
+      public void uncaughtException(Thread t, Throwable e) {
+        log.warn("SimpleTimer task failed", e);
+      }
+    };

Review comment:
       I agree, I'm not sure I have the ability to decide what is critical vs what is not. I came across a reference to new JVM parameters that may take OOME handling out of our hands: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8152669




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion merged pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion merged pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r543558569



##########
File path: core/src/main/java/org/apache/accumulo/core/clientImpl/TabletServerBatchReader.java
##########
@@ -72,16 +72,15 @@ protected TabletServerBatchReader(ClientContext context, Class<?> scopeClass, Ta
 
     queryThreadPool = ThreadPools.getFixedThreadPool(numQueryThreads,
         "batch scanner " + batchReaderInstance + "-", false);
+    // Call shutdown on this thread pool in case the caller does not call close().
     cleanable = CleanerUtil.shutdownThreadPoolExecutor(queryThreadPool, log);
   }
 
   @Override
   public void close() {
     if (closed.compareAndSet(false, true)) {
-      // deregister cleanable, but it won't run because it checks
-      // the value of closed first, which is now true
-      cleanable.clean();
       queryThreadPool.shutdownNow();
+      cleanable.clean();

Review comment:
       > In the next commit I left the order of calls the same. If we only use `cleanable.clean()` or put `cleanable.clean()` first, then we will always get the WARN log message: ThreadPoolExecutor found unreferenced without calling shutdown() or shutdownNow().
   
   If shutdown is only called in the close method, then it can be guarded with the closed variable that you are now passing to the cleaner. I guess what I don't know is if we ever want to call shutdown while leaving the containing object unclosed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r555190818



##########
File path: core/src/main/java/org/apache/accumulo/core/util/threads/AccumuloUncaughtExceptionHandler.java
##########
@@ -16,20 +16,36 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-package org.apache.accumulo.core.util;
+package org.apache.accumulo.core.util.threads;
 
 import java.lang.Thread.UncaughtExceptionHandler;
 
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-public class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
+/**
+ * UncaughtExceptionHandler that logs all Exceptions and Errors thrown from a Thread. If an Error is
+ * thrown, halt the JVM.
+ *
+ */
+class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
 
-  private static final Logger log = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
+  private static final Logger LOG = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
 
   @Override
   public void uncaughtException(Thread t, Throwable e) {
-    log.error(String.format("Caught an exception in %s.  Shutting down.", t), e);
+    if (e instanceof Exception) {
+      LOG.error("Caught an Exception in {}. Thread is dead.", t, e);
+    } else if (e instanceof Error) {
+      try {
+        e.printStackTrace();
+        System.err.println("Error thrown in thread: " + t + ", halting VM.");
+      } catch (Throwable e1) {
+        // If e == OutOfMemoryError, then it's probably that another Error might be
+        // thrown when trying to print to System.err.
+      } finally {
+        Runtime.getRuntime().halt(-1);

Review comment:
       In the [first](https://github.com/apache/accumulo/pull/1818/commits/d66ba42bc3ac227d85aaa0cbce98cf179dd01be8) commit for this PR I had put in some logic to only halt the server it a property was set, and then I set that property in the JVM arguments for the server processes. I can easily resurrect that code, but to your point, I think there are still differing opinions.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-745382089


   That method is only called from https://github.com/apache/accumulo/pull/1818/files#diff-40da4ec0c83a0e97c69ac58f6876f7895a08c52efe18041af42ccb9e6bcdccb2R75


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
Manno15 commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-747616460


   I did and still get the same error. I tried on a different node too and nothing. It does work locally so must be something related to the cluster. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r556556623



##########
File path: core/src/main/java/org/apache/accumulo/core/util/Halt.java
##########
@@ -50,10 +51,9 @@ public void run() {
   public static void halt(final int status, Runnable runnable) {
     try {
       // give ourselves a little time to try and do something
-      new Thread() {
+      new Daemon() {

Review comment:
       We had one of our own, but I think you may have deleted it elsewhere in this PR (which is a good thing, as we don't really need a dedicated class for that, especially with all your thread factory stuff being consolidated into one place).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r556645235



##########
File path: core/src/main/java/org/apache/accumulo/core/util/threads/AccumuloUncaughtExceptionHandler.java
##########
@@ -16,20 +16,36 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-package org.apache.accumulo.core.util;
+package org.apache.accumulo.core.util.threads;
 
 import java.lang.Thread.UncaughtExceptionHandler;
 
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-public class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
+/**
+ * UncaughtExceptionHandler that logs all Exceptions and Errors thrown from a Thread. If an Error is
+ * thrown, halt the JVM.
+ *
+ */
+class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
 
-  private static final Logger log = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
+  private static final Logger LOG = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
 
   @Override
   public void uncaughtException(Thread t, Throwable e) {
-    log.error(String.format("Caught an exception in %s.  Shutting down.", t), e);
+    if (e instanceof Exception) {
+      LOG.error("Caught an Exception in {}. Thread is dead.", t, e);
+    } else if (e instanceof Error) {
+      try {
+        e.printStackTrace();
+        System.err.println("Error thrown in thread: " + t + ", halting VM.");
+      } catch (Throwable e1) {
+        // If e == OutOfMemoryError, then it's probably that another Error might be
+        // thrown when trying to print to System.err.
+      } finally {
+        Runtime.getRuntime().halt(-1);

Review comment:
       I would rather resurrect the system property from the [first](https://github.com/apache/accumulo/pull/1818/commits/d66ba42bc3ac227d85aaa0cbce98cf179dd01be8) commit and make it enabled by default instead of making a configurable list of Error's that you want to ignore. If you don't want that functionality, then you don't set that property in your accumulo-env.sh file.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r533742044



##########
File path: assemble/conf/accumulo-env.sh
##########
@@ -95,7 +95,7 @@ JAVA_OPTS=("${JAVA_OPTS[@]}"
 
 case "$cmd" in
   monitor|gc|master|tserver|tracer)
-    JAVA_OPTS=("${JAVA_OPTS[@]}" "-Dlog4j.configurationFile=log4j2-service.properties")
+    JAVA_OPTS=("${JAVA_OPTS[@]}" "-Dlog4j.configurationFile=log4j2-service.properties" "-DHaltVMOnThreadError=true")

Review comment:
       > know that it wouldn't necessarily make sense to re-throw from the handler... because the docs say stuff thrown from the handler are ignored... but not sure if that applies to Error types as well as Exceptions.
   
   I'm pretty sure I have seen a StackOverflowError from a Thread that kills the Thread but not the VM. My guess is that an Error re-thrown from the handler on a Thread would do nothing, unless its the OutOfMemoryError and the OnOutOfMemoryError parameter is set.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
Manno15 commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-745373285


   @dlmarion - is there anything, in particular, I need to set to fully test this? I just set up the cluster and I get the warning below for both master and gc. 
   ```
   ThreadPoolExecutor found unreferenced without calling shutdown() or shutdownNow()
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-745380424


   No, you should not need to do anything special here. That's coming from https://github.com/apache/accumulo/pull/1818/files#diff-5c9ace41ad775cc540b1abbdd1cf70d46aef314576963953835267ff8c18d513R110. It's just a warning with a class name. It's still running, right?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r555174845



##########
File path: core/src/main/java/org/apache/accumulo/core/clientImpl/TabletServerBatchWriter.java
##########
@@ -209,20 +210,17 @@ public TabletServerBatchWriter(ClientContext context, BatchWriterConfig config)
     this.writer = new MutationWriter(config.getMaxWriteThreads());
 
     if (this.maxLatency != Long.MAX_VALUE) {
-      jtimer.schedule(new TimerTask() {
-        @Override
-        public void run() {
-          try {
-            synchronized (TabletServerBatchWriter.this) {
-              if ((System.currentTimeMillis() - lastProcessingStartTime)
-                  > TabletServerBatchWriter.this.maxLatency)
-                startProcessing();
-            }
-          } catch (Throwable t) {
-            updateUnknownErrors("Max latency task failed " + t.getMessage(), t);
+      executor.scheduleWithFixedDelay(Threads.createNamedRunnable("latency timer", () -> {

Review comment:
       That's a good catch, I can change the thread name.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-744747318


   @dlmarion Because this is such a significant change in behavior, what do you think about running CI and maybe RW a bit with these before this is merged? Would you be able to do that?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r543678356



##########
File path: core/src/main/java/org/apache/accumulo/core/util/threads/AccumuloUncaughtExceptionHandler.java
##########
@@ -16,20 +16,36 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-package org.apache.accumulo.core.util;
+package org.apache.accumulo.core.util.threads;
 
 import java.lang.Thread.UncaughtExceptionHandler;
 
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-public class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
+/**
+ * UncaughtExceptionHandler that logs all Exceptions and Errors thrown from a Thread. If an Error is
+ * thrown, halt the JVM.
+ *
+ */
+class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
 
-  private static final Logger log = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
+  private static final Logger LOG = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
 
   @Override
   public void uncaughtException(Thread t, Throwable e) {
-    log.error(String.format("Caught an exception in %s.  Shutting down.", t), e);
+    if (e instanceof Exception) {
+      LOG.error("Caught an Exception in {}. Thread is dead.", t, e);
+    } else if (e instanceof Error) {
+      try {
+        e.printStackTrace();
+        System.err.println("Error thrown in thread: " + t + ", halting VM.");
+      } catch (Throwable e1) {
+        // If e == OutOfMemoryError, then it's probably that another Error might be
+        // thrown when trying to print to System.err.
+      } finally {
+        Runtime.getRuntime().halt(-1);

Review comment:
       > That just means that we need to make sure that we are catching Exceptions appropriately in the non-critical threads so that it doesn't take down the server.
   
   Right, but I would say all threads, regardless of critical or non-critical. The more we handle regular exceptions robustly, and recover robustly on restart, the less it matters that we didn't do a graceful shutdown under emergency (Error) conditions, because 1) it won't happen very often, and 2) we'll handle recovery well when we restart.

##########
File path: core/src/main/java/org/apache/accumulo/core/util/threads/AccumuloUncaughtExceptionHandler.java
##########
@@ -16,20 +16,36 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-package org.apache.accumulo.core.util;
+package org.apache.accumulo.core.util.threads;
 
 import java.lang.Thread.UncaughtExceptionHandler;
 
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-public class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
+/**
+ * UncaughtExceptionHandler that logs all Exceptions and Errors thrown from a Thread. If an Error is
+ * thrown, halt the JVM.
+ *
+ */
+class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
 
-  private static final Logger log = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
+  private static final Logger LOG = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
 
   @Override
   public void uncaughtException(Thread t, Throwable e) {
-    log.error(String.format("Caught an exception in %s.  Shutting down.", t), e);
+    if (e instanceof Exception) {
+      LOG.error("Caught an Exception in {}. Thread is dead.", t, e);
+    } else if (e instanceof Error) {
+      try {
+        e.printStackTrace();
+        System.err.println("Error thrown in thread: " + t + ", halting VM.");
+      } catch (Throwable e1) {
+        // If e == OutOfMemoryError, then it's probably that another Error might be
+        // thrown when trying to print to System.err.
+      } finally {
+        Runtime.getRuntime().halt(-1);

Review comment:
       > That just means that we need to make sure that we are catching Exceptions appropriately in the non-critical threads so that it doesn't take down the server.
   
   Right, but I would say all threads, regardless of critical or non-critical. The more we handle regular exceptions robustly, and recover robustly on restart, the less it matters that we didn't do a graceful shutdown under emergency (Error) conditions, because 1) it won't happen very often, and 2) we'll handle recovery well when we restart (or re-assign).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r543629566



##########
File path: core/src/main/java/org/apache/accumulo/core/util/threads/AccumuloUncaughtExceptionHandler.java
##########
@@ -16,20 +16,36 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-package org.apache.accumulo.core.util;
+package org.apache.accumulo.core.util.threads;
 
 import java.lang.Thread.UncaughtExceptionHandler;
 
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-public class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
+/**
+ * UncaughtExceptionHandler that logs all Exceptions and Errors thrown from a Thread. If an Error is
+ * thrown, halt the JVM.
+ *
+ */
+class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
 
-  private static final Logger log = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
+  private static final Logger LOG = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
 
   @Override
   public void uncaughtException(Thread t, Throwable e) {
-    log.error(String.format("Caught an exception in %s.  Shutting down.", t), e);
+    if (e instanceof Exception) {
+      LOG.error("Caught an Exception in {}. Thread is dead.", t, e);
+    } else if (e instanceof Error) {
+      try {
+        e.printStackTrace();
+        System.err.println("Error thrown in thread: " + t + ", halting VM.");
+      } catch (Throwable e1) {
+        // If e == OutOfMemoryError, then it's probably that another Error might be
+        // thrown when trying to print to System.err.
+      } finally {
+        Runtime.getRuntime().halt(-1);

Review comment:
       That said, I'm not sure if this particular method of killing the tserver in the face of errors is best or not.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r555195902



##########
File path: core/src/main/java/org/apache/accumulo/core/util/threads/Threads.java
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.util.threads;
+
+import java.lang.Thread.UncaughtExceptionHandler;
+import java.util.OptionalInt;
+
+public class Threads {
+
+  public static Runnable createNamedRunnable(String name, Runnable r) {
+    return new NamedRunnable(name, r);
+  }
+
+  public static Runnable createNamedRunnable(String name, OptionalInt priority, Runnable r) {
+    return new NamedRunnable(name, priority, r);
+  }
+
+  private static final UncaughtExceptionHandler UEH = new AccumuloUncaughtExceptionHandler();
+
+  public static Thread createThread(String name, Runnable r) {
+    return createThread(name, OptionalInt.empty(), r);
+  }
+
+  public static Thread createThread(String name, OptionalInt priority, Runnable r) {
+    Thread thread = null;
+    if (r instanceof NamedRunnable) {
+      NamedRunnable nr = (NamedRunnable) r;
+      thread = new Thread(r, name);
+      if (nr.getPriority().isPresent()) {
+        thread.setPriority(nr.getPriority().getAsInt());
+      } else if (priority.isPresent()) {
+        thread.setPriority(priority.getAsInt());
+      }
+    } else {
+      thread = new Thread(r, name);
+      if (priority.isPresent()) {
+        thread.setPriority(priority.getAsInt());
+      }
+    }

Review comment:
       Ah, I see it now. My change should be functionally equivalent.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r555184723



##########
File path: core/src/main/java/org/apache/accumulo/core/util/threads/ThreadPools.java
##########
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.util.threads;
+
+import java.util.OptionalInt;
+import java.util.concurrent.BlockingQueue;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.ScheduledThreadPoolExecutor;
+import java.util.concurrent.SynchronousQueue;
+import java.util.concurrent.ThreadPoolExecutor;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.accumulo.core.conf.AccumuloConfiguration;
+import org.apache.accumulo.core.conf.Property;
+
+public class ThreadPools {
+
+  // the number of seconds before we allow a thread to terminate with non-use.
+  public static final long DEFAULT_TIMEOUT_MILLISECS = 180000L;
+
+  /**
+   * Get a thread pool based on a thread pool related property
+   *
+   * @param conf
+   *          accumulo configuration
+   * @param p
+   *          thread pool related property
+   * @return ExecutorService impl
+   * @throws RuntimeException
+   *           if property is not handled
+   */
+  public static ExecutorService getExecutorService(AccumuloConfiguration conf, Property p) {

Review comment:
       I can change the method names to `create*`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r556571117



##########
File path: core/src/main/java/org/apache/accumulo/core/util/Halt.java
##########
@@ -50,10 +51,9 @@ public void run() {
   public static void halt(final int status, Runnable runnable) {
     try {
       // give ourselves a little time to try and do something
-      new Thread() {
+      new Daemon() {

Review comment:
       Resolved in latest commit.

##########
File path: core/src/main/java/org/apache/accumulo/core/classloader/DefaultContextClassLoaderFactory.java
##########
@@ -59,18 +60,20 @@ public DefaultContextClassLoaderFactory(final AccumuloConfiguration accConf) {
   }
 
   private static void startCleanupThread(final Supplier<Map<String,String>> contextConfigSupplier) {
-    new Timer(className + "-cleanup", true).scheduleAtFixedRate(new TimerTask() {
-      @Override
-      public void run() {
-        Map<String,String> contextConfigs = contextConfigSupplier.get();
-        LOG.trace("{}-cleanup thread, properties: {}", className, contextConfigs);
-        int prefixlen = Property.VFS_CONTEXT_CLASSPATH_PROPERTY.getKey().length();
-        Set<String> contextsInUse = contextConfigs.keySet().stream()
-            .map(k -> k.substring(prefixlen)).collect(Collectors.toSet());
-        LOG.trace("{}-cleanup thread, contexts in use: {}", className, contextsInUse);
-        AccumuloVFSClassLoader.removeUnusedContexts(contextsInUse);
-      }
-    }, 60_000, 60_000);
+    final ConfigurationCopy cc = new ConfigurationCopy(contextConfigSupplier.get());
+    String size = cc.get(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE);
+    if (null == size || size.isEmpty()) {
+      cc.set(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE,
+          Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE.getDefaultValue());
+    }
+    ThreadPools.createGeneralScheduledExecutorService(cc)
+        .scheduleWithFixedDelay(Threads.createNamedRunnable(className + "-cleanup", () -> {
+          LOG.trace("{}-cleanup thread, properties: {}", className, cc);
+          Set<String> contextsInUse = cc
+              .getAllPropertiesWithPrefixStripped(Property.VFS_CONTEXT_CLASSPATH_PROPERTY).keySet();

Review comment:
       Resolved in latest commit.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
Manno15 commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-747638788


   I got it to work. I forgot hadoop has an export JAVA_HOME of its own and that was exporting java 8 still for some reason. Thanks for your help guys. I will update you with the results tomorrow


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
Manno15 commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-744754240


   I can also help test this on our local cluster if necessary. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r543599320



##########
File path: core/src/main/java/org/apache/accumulo/core/util/threads/AccumuloUncaughtExceptionHandler.java
##########
@@ -16,20 +16,36 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-package org.apache.accumulo.core.util;
+package org.apache.accumulo.core.util.threads;
 
 import java.lang.Thread.UncaughtExceptionHandler;
 
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-public class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
+/**
+ * UncaughtExceptionHandler that logs all Exceptions and Errors thrown from a Thread. If an Error is
+ * thrown, halt the JVM.
+ *
+ */
+class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
 
-  private static final Logger log = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
+  private static final Logger LOG = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
 
   @Override
   public void uncaughtException(Thread t, Throwable e) {
-    log.error(String.format("Caught an exception in %s.  Shutting down.", t), e);
+    if (e instanceof Exception) {
+      LOG.error("Caught an Exception in {}. Thread is dead.", t, e);
+    } else if (e instanceof Error) {
+      try {
+        e.printStackTrace();
+        System.err.println("Error thrown in thread: " + t + ", halting VM.");
+      } catch (Throwable e1) {
+        // If e == OutOfMemoryError, then it's probably that another Error might be
+        // thrown when trying to print to System.err.
+      } finally {
+        Runtime.getRuntime().halt(-1);

Review comment:
       In talking with @ivakegg and @drewfarris, this may be too severe as it does not run any shutdown hooks and does not perform a graceful shutdown of the application. For example, in [TabletServer](https://github.com/apache/accumulo/blob/main/server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java#L865) a normal shutdown will close the FileSystem and return leases on the files in HDFS. Or, maybe this remains the default action and we need to better handle the errors in the threads.
   
   There is also the case where we may want to still catch Throwable in non-critical threads (e.g. QueryTask) so that a StackOverflowError does not shutdown the TabletServer, for example.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-745297351


   @Manno15 - I appreciate the offer and if it's not too much trouble I'll take you up on it. This is going to be a short week for me this week and I am off next week. I have been asked to spend some cycles getting these releases out, so any help getting this tested would be much appreciated.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r555169849



##########
File path: core/src/main/java/org/apache/accumulo/core/clientImpl/TabletServerBatchWriter.java
##########
@@ -198,6 +197,8 @@ public long getTimeOut() {
 
   public TabletServerBatchWriter(ClientContext context, BatchWriterConfig config) {
     this.context = context;
+    this.executor = ThreadPools.getGeneralScheduledExecutorService(this.context.getConfiguration());
+    this.failedMutations = new FailedMutations();

Review comment:
       It's been almost a month since I looked at this. I don't remember the reason why I moved it and it should work if this line was reverted. I'm not sure if there is reason to do it one vs the other. I can easily change it back if necessary.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
Manno15 commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-747381390


   The test with agitation also went well. 7 server, technically 8 node cluster. Received the same warnings as above and the only other warnings/errors are I think due to the nodes going down from agitation.
   
   I do see some able errors but I think I normally get that when I do CI with agitation. 
   Things like:
   ```  
   Error recovering tablet 2;0985e;0945d from log files
   ```
   
   Running verify now. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r543628709



##########
File path: core/src/main/java/org/apache/accumulo/core/util/threads/AccumuloUncaughtExceptionHandler.java
##########
@@ -16,20 +16,36 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-package org.apache.accumulo.core.util;
+package org.apache.accumulo.core.util.threads;
 
 import java.lang.Thread.UncaughtExceptionHandler;
 
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-public class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
+/**
+ * UncaughtExceptionHandler that logs all Exceptions and Errors thrown from a Thread. If an Error is
+ * thrown, halt the JVM.
+ *
+ */
+class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
 
-  private static final Logger log = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
+  private static final Logger LOG = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
 
   @Override
   public void uncaughtException(Thread t, Throwable e) {
-    log.error(String.format("Caught an exception in %s.  Shutting down.", t), e);
+    if (e instanceof Exception) {
+      LOG.error("Caught an Exception in {}. Thread is dead.", t, e);
+    } else if (e instanceof Error) {
+      try {
+        e.printStackTrace();
+        System.err.println("Error thrown in thread: " + t + ", halting VM.");
+      } catch (Throwable e1) {
+        // If e == OutOfMemoryError, then it's probably that another Error might be
+        // thrown when trying to print to System.err.
+      } finally {
+        Runtime.getRuntime().halt(-1);

Review comment:
       I think we can't trust the TServer to shut down gracefully if we get to this state, and adding code to categorize and handle different threads each individually, would add too much (unmaintainable) complexity that would be prone to accruing technical debt and drifting from the original intent over time. I think it's simpler and safer to just stop everything in the TServer. Yes, it prevents some graceful shutdown steps from occurring... but it also prevents it from causing harm that is somewhat unpredictable. Rather than try to write protective shutdown code in uncertain emergency circumstances, our code complexity and development effort would be better spent on ensuring robust recovery, in my opinion.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r556689527



##########
File path: core/src/main/java/org/apache/accumulo/core/util/Halt.java
##########
@@ -49,15 +49,16 @@ public void run() {
   }
 
   public static void halt(final int status, Runnable runnable) {
+
     try {
       // give ourselves a little time to try and do something
-      new Daemon() {
+      Threads.createThread("Halt Thread", new Runnable() {
         @Override
         public void run() {
           sleepUninterruptibly(100, TimeUnit.MILLISECONDS);
           Runtime.getRuntime().halt(status);
         }
-      }.start();
+      }).start();

Review comment:
       Resolved in latest commit




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r556298849



##########
File path: core/src/main/java/org/apache/accumulo/core/clientImpl/TabletServerBatchReader.java
##########
@@ -72,16 +72,15 @@ protected TabletServerBatchReader(ClientContext context, Class<?> scopeClass, Ta
 
     queryThreadPool = ThreadPools.getFixedThreadPool(numQueryThreads,
         "batch scanner " + batchReaderInstance + "-", false);
+    // Call shutdown on this thread pool in case the caller does not call close().
     cleanable = CleanerUtil.shutdownThreadPoolExecutor(queryThreadPool, log);
   }
 
   @Override
   public void close() {
     if (closed.compareAndSet(false, true)) {
-      // deregister cleanable, but it won't run because it checks
-      // the value of closed first, which is now true
-      cleanable.clean();
       queryThreadPool.shutdownNow();
+      cleanable.clean();

Review comment:
       The reason to ensure the cleanable is deregistered first, is because the `shutdownNow()` could throw an exception, causing the cleanable to stick around until the object is garbage collected. In general, we don't want cleanables to stick around if they are going to be NOOPs later (the AtomicBoolean "closed" variable will ensure it's a NOOP, so we don't try to close it both inside the cleanable and outside).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
Manno15 commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-745319895


   @dlmarion - No trouble at all. I will start the testing as soon as I can. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r557014219



##########
File path: core/src/main/java/org/apache/accumulo/core/util/threads/AccumuloUncaughtExceptionHandler.java
##########
@@ -16,20 +16,36 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-package org.apache.accumulo.core.util;
+package org.apache.accumulo.core.util.threads;
 
 import java.lang.Thread.UncaughtExceptionHandler;
 
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-public class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
+/**
+ * UncaughtExceptionHandler that logs all Exceptions and Errors thrown from a Thread. If an Error is
+ * thrown, halt the JVM.
+ *
+ */
+class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
 
-  private static final Logger log = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
+  private static final Logger LOG = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
 
   @Override
   public void uncaughtException(Thread t, Throwable e) {
-    log.error(String.format("Caught an exception in %s.  Shutting down.", t), e);
+    if (e instanceof Exception) {
+      LOG.error("Caught an Exception in {}. Thread is dead.", t, e);
+    } else if (e instanceof Error) {
+      try {
+        e.printStackTrace();
+        System.err.println("Error thrown in thread: " + t + ", halting VM.");
+      } catch (Throwable e1) {
+        // If e == OutOfMemoryError, then it's probably that another Error might be
+        // thrown when trying to print to System.err.
+      } finally {
+        Runtime.getRuntime().halt(-1);

Review comment:
       > a miss-configured context that leads to a UnsupportedClassVersionError or an IOError on a particular file
   
   I would argue that misconfigured class paths are much more serious of an issue than would be implied by trying to proceed in spite of errors caused by them. We should *not* be trying to proceed when an error occurs as the result of a misconfigured class path. If you can't trust the class path, you can't trust anything, and it is not safe to proceed.
   
   As for IOError, see [Mark Reinhold's explanation for that](http://cafe.elharo.com/blogroll/undocumented-changes-in-java-6-mustang-ioerror/#comment-160) ; he specifically explains that it was added to be used in situations where an *unrecoverable error* (his words) occurs. Suggesting that we should try to recover from situations that are definition-ally unrecoverable seems a bit unreasonable to me.
   
   > One bad iterator/edge case should not be able to take a cluster down.
   
   I understand the motivation for pursuing this. However, thinking about worst case scenarios, I can imagine far worse things than being temporarily offline (data corruption, hijacking by malware, etc.) due to administrator error deploying a bad iterator.
   
   I think that baking in special handling for these (what should be exceedingly rare) edge cases would be a mistake for a number of reasons. However, persuasive arguments in favor of, or against, such a thing can be made in a subsequent effort, if there is a critical demand for such a thing. There's no need to block this PR on that. It would actually be much simpler to discuss that sort of thing on its own, rather than bundled with the other changes in this PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-736057745


   I made the modifications you requested. I also found another case where Threads were being created, in a class called `Daemon`. I'm starting to think I should create a class called `Threads` and consolidate all of the ThreadPoolExecutor, ExecutorService and SimpleTimer code. Also, I think we can replace SimpleTimer with ScheduledThreadPoolExecutor.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r532910387



##########
File path: server/base/src/main/java/org/apache/accumulo/server/util/time/SimpleTimer.java
##########
@@ -100,10 +93,19 @@ static int getInstanceThreadPoolSize() {
     return instanceThreadPoolSize;
   }
 
-  private SimpleTimer(int threadPoolSize) {
+  protected SimpleTimer(int threadPoolSize) {
     executor = Executors.newScheduledThreadPool(threadPoolSize,
         new ThreadFactoryBuilder().setNameFormat("SimpleTimer-%d").setDaemon(true)
-            .setUncaughtExceptionHandler(new ExceptionHandler()).build());
+            .setUncaughtExceptionHandler(getUncaughtExceptionHandler()).build());
+  }
+
+  protected Thread.UncaughtExceptionHandler getUncaughtExceptionHandler() {
+    return new Thread.UncaughtExceptionHandler() {
+      @Override
+      public void uncaughtException(Thread t, Throwable e) {
+        log.warn("SimpleTimer task failed", e);
+      }
+    };

Review comment:
       It looks like this new parameter does not work in all cases. https://bugs.openjdk.java.net/browse/JDK-8155004. 
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r543531842



##########
File path: core/src/main/java/org/apache/accumulo/core/clientImpl/TabletServerBatchReader.java
##########
@@ -72,16 +72,15 @@ protected TabletServerBatchReader(ClientContext context, Class<?> scopeClass, Ta
 
     queryThreadPool = ThreadPools.getFixedThreadPool(numQueryThreads,
         "batch scanner " + batchReaderInstance + "-", false);
+    // Call shutdown on this thread pool in case the caller does not call close().
     cleanable = CleanerUtil.shutdownThreadPoolExecutor(queryThreadPool, log);
   }
 
   @Override
   public void close() {
     if (closed.compareAndSet(false, true)) {
-      // deregister cleanable, but it won't run because it checks
-      // the value of closed first, which is now true
-      cleanable.clean();
       queryThreadPool.shutdownNow();
+      cleanable.clean();

Review comment:
       In the next commit I left the order of calls the same. If we only use `cleanable.clean()` or put `cleanable.clean()` first, then we will always get the WARN log message: ThreadPoolExecutor found unreferenced without calling shutdown() or shutdownNow().




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r532927323



##########
File path: assemble/conf/accumulo-env.sh
##########
@@ -95,7 +95,7 @@ JAVA_OPTS=("${JAVA_OPTS[@]}"
 
 case "$cmd" in
   monitor|gc|master|tserver|tracer)
-    JAVA_OPTS=("${JAVA_OPTS[@]}" "-Dlog4j.configurationFile=log4j2-service.properties")
+    JAVA_OPTS=("${JAVA_OPTS[@]}" "-Dlog4j.configurationFile=log4j2-service.properties" "-DHaltVMOnThreadError=true")

Review comment:
       I think LoggingRunnable avoided the question of which would win by trying to wrap the Runnable and catching the Throwables in the run method instead of relying on the uncaught handler. However, the fatal flaw with LoggingRunnable is that it wouldn't re-throw things. Your solution to put an uncaught handler on everything makes it so we can (should) delete LoggingRunnable, as it's not needed any more, but it raises questions about what wins. I know that it wouldn't necessarily make sense to re-throw from the handler... because the docs say stuff thrown from the handler are ignored... but not sure if that applies to Error types as well as Exceptions.
   
   I'm finding myself with more questions than answers about how Java actually behaves, so it's hard to have an opinion about how things should go. I think I might want to simulate a few scenarios to test the various JVM behaviors, and use that to inform what to do.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r555188763



##########
File path: core/src/main/java/org/apache/accumulo/core/util/threads/Threads.java
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.util.threads;
+
+import java.lang.Thread.UncaughtExceptionHandler;
+import java.util.OptionalInt;
+
+public class Threads {
+
+  public static Runnable createNamedRunnable(String name, Runnable r) {
+    return new NamedRunnable(name, r);
+  }
+
+  public static Runnable createNamedRunnable(String name, OptionalInt priority, Runnable r) {
+    return new NamedRunnable(name, priority, r);
+  }
+
+  private static final UncaughtExceptionHandler UEH = new AccumuloUncaughtExceptionHandler();
+
+  public static Thread createThread(String name, Runnable r) {
+    return createThread(name, OptionalInt.empty(), r);
+  }
+
+  public static Thread createThread(String name, OptionalInt priority, Runnable r) {
+    Thread thread = null;
+    if (r instanceof NamedRunnable) {
+      NamedRunnable nr = (NamedRunnable) r;
+      thread = new Thread(r, name);
+      if (nr.getPriority().isPresent()) {
+        thread.setPriority(nr.getPriority().getAsInt());
+      } else if (priority.isPresent()) {
+        thread.setPriority(priority.getAsInt());
+      }
+    } else {
+      thread = new Thread(r, name);
+      if (priority.isPresent()) {
+        thread.setPriority(priority.getAsInt());
+      }
+    }

Review comment:
       I made this change (in the next commit), but I modified your suggestion such that if the priority is set in the NamedRunnable it is not overridden by the priority parameter.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] brianloss commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
brianloss commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r555192694



##########
File path: core/src/main/java/org/apache/accumulo/core/util/threads/Threads.java
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.util.threads;
+
+import java.lang.Thread.UncaughtExceptionHandler;
+import java.util.OptionalInt;
+
+public class Threads {
+
+  public static Runnable createNamedRunnable(String name, Runnable r) {
+    return new NamedRunnable(name, r);
+  }
+
+  public static Runnable createNamedRunnable(String name, OptionalInt priority, Runnable r) {
+    return new NamedRunnable(name, priority, r);
+  }
+
+  private static final UncaughtExceptionHandler UEH = new AccumuloUncaughtExceptionHandler();
+
+  public static Thread createThread(String name, Runnable r) {
+    return createThread(name, OptionalInt.empty(), r);
+  }
+
+  public static Thread createThread(String name, OptionalInt priority, Runnable r) {
+    Thread thread = null;
+    if (r instanceof NamedRunnable) {
+      NamedRunnable nr = (NamedRunnable) r;
+      thread = new Thread(r, name);
+      if (nr.getPriority().isPresent()) {
+        thread.setPriority(nr.getPriority().getAsInt());
+      } else if (priority.isPresent()) {
+        thread.setPriority(priority.getAsInt());
+      }
+    } else {
+      thread = new Thread(r, name);
+      if (priority.isPresent()) {
+        thread.setPriority(priority.getAsInt());
+      }
+    }

Review comment:
       > I made this change (in the next commit), but I modified your suggestion such that if the priority is set in the NamedRunnable it is not overridden by the priority parameter.
   
   I think my suggestion would have done that too, since it overwrites the priority parameter on line 45, but if you missed that then my suggestion was too subtle so it's best not to use it. :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
Manno15 commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-747579226


   I've tried that but I will try again. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r556655282



##########
File path: core/src/main/java/org/apache/accumulo/core/classloader/DefaultContextClassLoaderFactory.java
##########
@@ -59,18 +60,23 @@ public DefaultContextClassLoaderFactory(final AccumuloConfiguration accConf) {
   }
 
   private static void startCleanupThread(final Supplier<Map<String,String>> contextConfigSupplier) {
-    new Timer(className + "-cleanup", true).scheduleAtFixedRate(new TimerTask() {
-      @Override
-      public void run() {
-        Map<String,String> contextConfigs = contextConfigSupplier.get();
-        LOG.trace("{}-cleanup thread, properties: {}", className, contextConfigs);
-        int prefixlen = Property.VFS_CONTEXT_CLASSPATH_PROPERTY.getKey().length();
-        Set<String> contextsInUse = contextConfigs.keySet().stream()
-            .map(k -> k.substring(prefixlen)).collect(Collectors.toSet());
-        LOG.trace("{}-cleanup thread, contexts in use: {}", className, contextsInUse);
-        AccumuloVFSClassLoader.removeUnusedContexts(contextsInUse);
-      }
-    }, 60_000, 60_000);
+    final ConfigurationCopy threadPoolProperties =
+        new ConfigurationCopy(contextConfigSupplier.get());
+    String size = threadPoolProperties.get(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE);
+    if (null == size || size.isEmpty()) {
+      threadPoolProperties.set(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE,
+          Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE.getDefaultValue());
+    }

Review comment:
       Changed in latest commit

##########
File path: core/src/main/java/org/apache/accumulo/core/classloader/DefaultContextClassLoaderFactory.java
##########
@@ -59,18 +60,23 @@ public DefaultContextClassLoaderFactory(final AccumuloConfiguration accConf) {
   }
 
   private static void startCleanupThread(final Supplier<Map<String,String>> contextConfigSupplier) {
-    new Timer(className + "-cleanup", true).scheduleAtFixedRate(new TimerTask() {
-      @Override
-      public void run() {
-        Map<String,String> contextConfigs = contextConfigSupplier.get();
-        LOG.trace("{}-cleanup thread, properties: {}", className, contextConfigs);
-        int prefixlen = Property.VFS_CONTEXT_CLASSPATH_PROPERTY.getKey().length();
-        Set<String> contextsInUse = contextConfigs.keySet().stream()
-            .map(k -> k.substring(prefixlen)).collect(Collectors.toSet());
-        LOG.trace("{}-cleanup thread, contexts in use: {}", className, contextsInUse);
-        AccumuloVFSClassLoader.removeUnusedContexts(contextsInUse);
-      }
-    }, 60_000, 60_000);
+    final ConfigurationCopy threadPoolProperties =
+        new ConfigurationCopy(contextConfigSupplier.get());
+    String size = threadPoolProperties.get(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE);
+    if (null == size || size.isEmpty()) {
+      threadPoolProperties.set(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE,
+          Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE.getDefaultValue());
+    }
+    ThreadPools.createGeneralScheduledExecutorService(threadPoolProperties)
+        .scheduleWithFixedDelay(Threads.createNamedRunnable(className + "-cleanup", () -> {
+          ConfigurationCopy contextCleanerProperties =
+              new ConfigurationCopy(contextConfigSupplier.get());
+          LOG.trace("{}-cleanup thread, properties: {}", className, threadPoolProperties);
+          Set<String> contextsInUse = contextCleanerProperties
+              .getAllPropertiesWithPrefixStripped(Property.VFS_CONTEXT_CLASSPATH_PROPERTY).keySet();

Review comment:
       Changed in latest commit




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-746796338


   > From my CI run, everything seems to work.
   
   How many nodes? How many ingesters? Did you run with or without agitation?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-738793126


   May need to look for instances of java.util.Timer also...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r532895319



##########
File path: assemble/conf/accumulo-env.sh
##########
@@ -95,7 +95,7 @@ JAVA_OPTS=("${JAVA_OPTS[@]}"
 
 case "$cmd" in
   monitor|gc|master|tserver|tracer)
-    JAVA_OPTS=("${JAVA_OPTS[@]}" "-Dlog4j.configurationFile=log4j2-service.properties")
+    JAVA_OPTS=("${JAVA_OPTS[@]}" "-Dlog4j.configurationFile=log4j2-service.properties" "-DHaltVMOnThreadError=true")

Review comment:
       I think it will in this specific case. Although, I could bypass Halt if the Error is an OutOfMemoryError. I believe that is the only error that has special handling in the JVM.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
Manno15 commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-747570194


   This is what I am running into with trying to verify using accumulo-testing cingest/verify. I understand the error but nothing is working to fix it.
   ```
   Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/accumulo/testing/continuous/ContinuousVerify has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0
   	at java.lang.ClassLoader.defineClass1(Native Method)
   	at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
   	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
   	at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
   	at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
   	at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
   	at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
   	at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:176)
   	at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157)
   	at java.lang.Class.forName0(Native Method)
   	at java.lang.Class.forName(Class.java:348)
   	at org.apache.hadoop.util.RunJar.run(RunJar.java:316)
   	at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
   
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] brianloss commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
brianloss commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r555101347



##########
File path: core/src/main/java/org/apache/accumulo/core/classloader/DefaultContextClassLoaderFactory.java
##########
@@ -59,18 +60,20 @@ public DefaultContextClassLoaderFactory(final AccumuloConfiguration accConf) {
   }
 
   private static void startCleanupThread(final Supplier<Map<String,String>> contextConfigSupplier) {
-    new Timer(className + "-cleanup", true).scheduleAtFixedRate(new TimerTask() {
-      @Override
-      public void run() {
-        Map<String,String> contextConfigs = contextConfigSupplier.get();
-        LOG.trace("{}-cleanup thread, properties: {}", className, contextConfigs);
-        int prefixlen = Property.VFS_CONTEXT_CLASSPATH_PROPERTY.getKey().length();
-        Set<String> contextsInUse = contextConfigs.keySet().stream()
-            .map(k -> k.substring(prefixlen)).collect(Collectors.toSet());
-        LOG.trace("{}-cleanup thread, contexts in use: {}", className, contextsInUse);
-        AccumuloVFSClassLoader.removeUnusedContexts(contextsInUse);
-      }
-    }, 60_000, 60_000);
+    final ConfigurationCopy cc = new ConfigurationCopy(contextConfigSupplier.get());
+    String size = cc.get(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE);
+    if (null == size || size.isEmpty()) {
+      cc.set(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE,
+          Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE.getDefaultValue());
+    }

Review comment:
       It seems like maybe calling `ThreadPools.getGeneralScheduledExecutorService` depends on the configuration having a value for `Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE`. Should this logic be moved into `ThreadPools.getGeneralScheduledExecutorService`?

##########
File path: core/src/main/java/org/apache/accumulo/core/clientImpl/TabletServerBatchWriter.java
##########
@@ -198,6 +197,8 @@ public long getTimeOut() {
 
   public TabletServerBatchWriter(ClientContext context, BatchWriterConfig config) {
     this.context = context;
+    this.executor = ThreadPools.getGeneralScheduledExecutorService(this.context.getConfiguration());
+    this.failedMutations = new FailedMutations();

Review comment:
       Just curious why the initialization for `failedMutations` was moved into the constructor but the other nearby inline final initializations (`violations`, `authorizationFailures`, `serverSideErrors`) weren't?

##########
File path: core/src/main/java/org/apache/accumulo/core/util/threads/ThreadPools.java
##########
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.util.threads;
+
+import java.util.OptionalInt;
+import java.util.concurrent.BlockingQueue;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.ScheduledThreadPoolExecutor;
+import java.util.concurrent.SynchronousQueue;
+import java.util.concurrent.ThreadPoolExecutor;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.accumulo.core.conf.AccumuloConfiguration;
+import org.apache.accumulo.core.conf.Property;
+
+public class ThreadPools {
+
+  // the number of seconds before we allow a thread to terminate with non-use.
+  public static final long DEFAULT_TIMEOUT_MILLISECS = 180000L;
+
+  /**
+   * Get a thread pool based on a thread pool related property
+   *
+   * @param conf
+   *          accumulo configuration
+   * @param p
+   *          thread pool related property
+   * @return ExecutorService impl
+   * @throws RuntimeException
+   *           if property is not handled
+   */
+  public static ExecutorService getExecutorService(AccumuloConfiguration conf, Property p) {

Review comment:
       Not a super strong opinion, but when I first started reading the code I thought maybe these methods could be returning cached/shared executors. I suggest considering changing all of the methods in this class from "get*" to "create*" (or "new*" to match the pattern in the `java.util.concurrent.Executors` class) to make it explicit that they are always creating a resource.

##########
File path: core/src/main/java/org/apache/accumulo/core/util/threads/AccumuloUncaughtExceptionHandler.java
##########
@@ -16,20 +16,36 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-package org.apache.accumulo.core.util;
+package org.apache.accumulo.core.util.threads;
 
 import java.lang.Thread.UncaughtExceptionHandler;
 
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-public class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
+/**
+ * UncaughtExceptionHandler that logs all Exceptions and Errors thrown from a Thread. If an Error is
+ * thrown, halt the JVM.
+ *
+ */
+class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
 
-  private static final Logger log = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
+  private static final Logger LOG = LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
 
   @Override
   public void uncaughtException(Thread t, Throwable e) {
-    log.error(String.format("Caught an exception in %s.  Shutting down.", t), e);
+    if (e instanceof Exception) {
+      LOG.error("Caught an Exception in {}. Thread is dead.", t, e);
+    } else if (e instanceof Error) {
+      try {
+        e.printStackTrace();
+        System.err.println("Error thrown in thread: " + t + ", halting VM.");
+      } catch (Throwable e1) {
+        // If e == OutOfMemoryError, then it's probably that another Error might be
+        // thrown when trying to print to System.err.
+      } finally {
+        Runtime.getRuntime().halt(-1);

Review comment:
       From reading the discussion here, it sounds like there's some uncertainty of what action to take if we get here. Did Ivan or Drew ever weigh in again? I think Christopher's arguments make sense, but am just wondering if it's worth adding configuration to skip halting the tserver? The main reason I suggest that is if there's some unanticipated case where this is hit and it has a big impact on production systems. At least in such a case it could be disabled with a configuration change vs a re-release and deployment of Accumulo. I only suggest that given the uncertainty of whether or not this is too severe of a response to uncaught errors.

##########
File path: core/src/main/java/org/apache/accumulo/core/clientImpl/TabletServerBatchWriter.java
##########
@@ -209,20 +210,17 @@ public TabletServerBatchWriter(ClientContext context, BatchWriterConfig config)
     this.writer = new MutationWriter(config.getMaxWriteThreads());
 
     if (this.maxLatency != Long.MAX_VALUE) {
-      jtimer.schedule(new TimerTask() {
-        @Override
-        public void run() {
-          try {
-            synchronized (TabletServerBatchWriter.this) {
-              if ((System.currentTimeMillis() - lastProcessingStartTime)
-                  > TabletServerBatchWriter.this.maxLatency)
-                startProcessing();
-            }
-          } catch (Throwable t) {
-            updateUnknownErrors("Max latency task failed " + t.getMessage(), t);
+      executor.scheduleWithFixedDelay(Threads.createNamedRunnable("latency timer", () -> {

Review comment:
       I like the fact that previously the timer thread would have `BatchWriter` in its name. Only `latency timer` makes it just a little harder to figure out when looking at a thread dump, IMO.
   ```suggestion
         executor.scheduleWithFixedDelay(Threads.createNamedRunnable("BatchWriterLatencyTimer", () -> {
   ```

##########
File path: core/src/main/java/org/apache/accumulo/core/util/threads/Threads.java
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.util.threads;
+
+import java.lang.Thread.UncaughtExceptionHandler;
+import java.util.OptionalInt;
+
+public class Threads {
+
+  public static Runnable createNamedRunnable(String name, Runnable r) {
+    return new NamedRunnable(name, r);
+  }
+
+  public static Runnable createNamedRunnable(String name, OptionalInt priority, Runnable r) {
+    return new NamedRunnable(name, priority, r);
+  }
+
+  private static final UncaughtExceptionHandler UEH = new AccumuloUncaughtExceptionHandler();
+
+  public static Thread createThread(String name, Runnable r) {
+    return createThread(name, OptionalInt.empty(), r);
+  }
+
+  public static Thread createThread(String name, OptionalInt priority, Runnable r) {
+    Thread thread = null;
+    if (r instanceof NamedRunnable) {
+      NamedRunnable nr = (NamedRunnable) r;
+      thread = new Thread(r, name);
+      if (nr.getPriority().isPresent()) {
+        thread.setPriority(nr.getPriority().getAsInt());
+      } else if (priority.isPresent()) {
+        thread.setPriority(priority.getAsInt());
+      }
+    } else {
+      thread = new Thread(r, name);
+      if (priority.isPresent()) {
+        thread.setPriority(priority.getAsInt());
+      }
+    }

Review comment:
       Just a nit, but it doesn't appear that the initialization of `thread` is any different based on the type of Runnable passed in. Why not move the initialization up? Maybe something like:
   ```suggestion
       Thread thread = new Thread(r, name);
       if (r instanceof NamedRunnable) {
         NamedRunnable nr = (NamedRunnable) r;
         if (nr.getPriority().isPresent()) {
           priority = nr.getPriority();
         }
       }
       if (priority.isPresent()) {
         thread.setPriority(priority.getAsInt());
       }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 edited a comment on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
Manno15 edited a comment on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-746803138


   Sorry for not providing more info. 7 nodes but only one ingester and without agitation. Running a CI with agitation as we speak. Everything else is a clean accumulo installation, with no change in settings except for heap sizes. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r556552871



##########
File path: core/src/main/java/org/apache/accumulo/core/classloader/DefaultContextClassLoaderFactory.java
##########
@@ -59,18 +60,20 @@ public DefaultContextClassLoaderFactory(final AccumuloConfiguration accConf) {
   }
 
   private static void startCleanupThread(final Supplier<Map<String,String>> contextConfigSupplier) {
-    new Timer(className + "-cleanup", true).scheduleAtFixedRate(new TimerTask() {
-      @Override
-      public void run() {
-        Map<String,String> contextConfigs = contextConfigSupplier.get();
-        LOG.trace("{}-cleanup thread, properties: {}", className, contextConfigs);
-        int prefixlen = Property.VFS_CONTEXT_CLASSPATH_PROPERTY.getKey().length();
-        Set<String> contextsInUse = contextConfigs.keySet().stream()
-            .map(k -> k.substring(prefixlen)).collect(Collectors.toSet());
-        LOG.trace("{}-cleanup thread, contexts in use: {}", className, contextsInUse);
-        AccumuloVFSClassLoader.removeUnusedContexts(contextsInUse);
-      }
-    }, 60_000, 60_000);
+    final ConfigurationCopy cc = new ConfigurationCopy(contextConfigSupplier.get());
+    String size = cc.get(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE);
+    if (null == size || size.isEmpty()) {
+      cc.set(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE,
+          Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE.getDefaultValue());
+    }
+    ThreadPools.createGeneralScheduledExecutorService(cc)
+        .scheduleWithFixedDelay(Threads.createNamedRunnable(className + "-cleanup", () -> {
+          LOG.trace("{}-cleanup thread, properties: {}", className, cc);
+          Set<String> contextsInUse = cc
+              .getAllPropertiesWithPrefixStripped(Property.VFS_CONTEXT_CLASSPATH_PROPERTY).keySet();

Review comment:
       I read the code wrong, I think I need more coffee.

##########
File path: core/src/main/java/org/apache/accumulo/core/util/Halt.java
##########
@@ -50,10 +51,9 @@ public void run() {
   public static void halt(final int status, Runnable runnable) {
     try {
       // give ourselves a little time to try and do something
-      new Thread() {
+      new Daemon() {

Review comment:
       I didn't even notice that it was a Hadoop class.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r556544177



##########
File path: core/src/main/java/org/apache/accumulo/core/classloader/DefaultContextClassLoaderFactory.java
##########
@@ -59,18 +60,20 @@ public DefaultContextClassLoaderFactory(final AccumuloConfiguration accConf) {
   }
 
   private static void startCleanupThread(final Supplier<Map<String,String>> contextConfigSupplier) {
-    new Timer(className + "-cleanup", true).scheduleAtFixedRate(new TimerTask() {
-      @Override
-      public void run() {
-        Map<String,String> contextConfigs = contextConfigSupplier.get();
-        LOG.trace("{}-cleanup thread, properties: {}", className, contextConfigs);
-        int prefixlen = Property.VFS_CONTEXT_CLASSPATH_PROPERTY.getKey().length();
-        Set<String> contextsInUse = contextConfigs.keySet().stream()
-            .map(k -> k.substring(prefixlen)).collect(Collectors.toSet());
-        LOG.trace("{}-cleanup thread, contexts in use: {}", className, contextsInUse);
-        AccumuloVFSClassLoader.removeUnusedContexts(contextsInUse);
-      }
-    }, 60_000, 60_000);
+    final ConfigurationCopy cc = new ConfigurationCopy(contextConfigSupplier.get());
+    String size = cc.get(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE);
+    if (null == size || size.isEmpty()) {
+      cc.set(Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE,
+          Property.GENERAL_SIMPLETIMER_THREADPOOL_SIZE.getDefaultValue());
+    }
+    ThreadPools.createGeneralScheduledExecutorService(cc)
+        .scheduleWithFixedDelay(Threads.createNamedRunnable(className + "-cleanup", () -> {
+          LOG.trace("{}-cleanup thread, properties: {}", className, cc);
+          Set<String> contextsInUse = cc
+              .getAllPropertiesWithPrefixStripped(Property.VFS_CONTEXT_CLASSPATH_PROPERTY).keySet();

Review comment:
       Interesting Then what is this thread supposed to be cleaning up?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on pull request #1818: Fixes #1808 - Stop server side VM on Error in Thread

Posted by GitBox <gi...@apache.org>.
Manno15 commented on pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#issuecomment-745382033


   Yeah, everything is still running. Just wanted to make sure before I start the ingest process. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org