You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2022/01/04 19:57:44 UTC

[GitHub] [accumulo] dlmarion opened a new pull request #2408: Attempt to verify that MAC processes are up and running in start

dlmarion opened a new pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408


   Closes #1897


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
dlmarion commented on pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#issuecomment-1005671230


   I added the following to the bottom of `verifyUp`:
   
   ```
       try (AccumuloClient client = Accumulo.newClient().from(getClientProperties()).build()) {
         client.instanceOperations().waitForBalance();
       }
   ```
   
   and it causes this:
   
   ```
   ...
   Caused by: org.apache.accumulo.fate.zookeeper.ZooSession$ZooSessionShutdownException: The Accumulo singleton that that tracks zookeeper session is disabled.  This is likely caused by all AccumuloClients being closed or garbage collected.
   	at org.apache.accumulo.fate.zookeeper.ZooSession.getSession(ZooSession.java:195)
   	at org.apache.accumulo.fate.zookeeper.ZooSession.getAuthenticatedSession(ZooSession.java:184)
   	at org.apache.accumulo.fate.zookeeper.ZooReaderWriter.getZooKeeper(ZooReaderWriter.java:58)
   	at org.apache.accumulo.fate.zookeeper.ZooReader.retryLoopMutator(ZooReader.java:165)
   	at org.apache.accumulo.fate.zookeeper.ZooReader.retryLoop(ZooReader.java:144)
   	at org.apache.accumulo.fate.zookeeper.ZooReader.retryLoop(ZooReader.java:131)
   	at org.apache.accumulo.fate.zookeeper.ZooReader.getChildren(ZooReader.java:87)
   	at org.apache.accumulo.miniclusterImpl.MiniAccumuloClusterImpl.verifyUp(MiniAccumuloClusterImpl.java:640)
   	at org.apache.accumulo.miniclusterImpl.MiniAccumuloClusterImpl.start(MiniAccumuloClusterImpl.java:606)
   	... 31 more
   ```
   
   I'm not sure I understand why.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] EdColeman commented on a change in pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
EdColeman commented on a change in pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#discussion_r778367994



##########
File path: minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java
##########
@@ -600,6 +602,97 @@ public synchronized void start() throws IOException, InterruptedException {
     if (executor == null) {
       executor = Executors.newSingleThreadExecutor();
     }
+
+    verifyUp();
+
+  }
+
+  private void verifyUp() throws InterruptedException, IOException {
+
+    Objects.requireNonNull(getClusterControl().managerProcess, "Error starting Manager");
+    Objects.requireNonNull(getClusterControl().managerProcess.info().startInstant().get(),
+        "Error starting Manager");
+
+    Objects.requireNonNull(getClusterControl().gcProcess, "Error starting GC");
+    Objects.requireNonNull(getClusterControl().gcProcess.info().startInstant().get(),
+        "Error starting GC");
+
+    int tsExpectedCount = 0;
+    for (Process tsp : getClusterControl().tabletServerProcesses) {
+      tsExpectedCount++;
+      Objects.requireNonNull(tsp, "Error starting TabletServer");
+      Objects.requireNonNull(tsp.info().startInstant().get(), "Error starting Manager");
+    }
+
+    String zk = config.getExistingZooKeepers() != null ? config.getExistingZooKeepers()
+        : config.getZooKeepers();
+
+    ZooReaderWriter zrw =
+        new ZooReaderWriter(zk, 30000, Property.INSTANCE_SECRET.getDefaultValue());
+
+    String instanceId = null;
+    try {
+      for (String name : zrw.getChildren(Constants.ZROOT + Constants.ZINSTANCES)) {
+        if (name.equals(config.getInstanceName())) {
+          String instanceNamePath = Constants.ZROOT + Constants.ZINSTANCES + "/" + name;
+          byte[] bytes = zrw.getData(instanceNamePath);
+          instanceId = new String(bytes, UTF_8);
+        }
+      }
+    } catch (KeeperException e) {
+      throw new RuntimeException("Unable to read instance name from zookeeper.", e);
+    }
+    String rootPath = ZooUtil.getRoot(instanceId);
+
+    int tsActualCount = 0;
+    int tryCount = 0;
+    while (tsActualCount != tsExpectedCount) {
+      tryCount++;
+      try {
+        tsActualCount = 0;
+        for (String child : zrw.getChildren(rootPath + Constants.ZTSERVERS)) {
+          tsActualCount++;
+          if (zrw.getChildren(rootPath + Constants.ZTSERVERS + "/" + child).isEmpty())
+            throw new RuntimeException("TServer not present in ZooKeeper");

Review comment:
       maybe
   ```
   throw new RuntimeException("TServer " + child + " not present in ZooKeeper");
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
milleruntime commented on pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#issuecomment-1005181436


   Checkout `LateLastContactIT` cause it starts its own `ZombieTServer` so your verify method may not work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#issuecomment-1006030898


   FWIW, our multi-module build creates the native maps in a module that the test module depends on. So, the native maps are always built and ready for the ITs that use them in the test module when building from the command-line. However, Eclipse (and likely other IDEs) don't execute all the build tasks in the native map module when building. Specifically, they don't execute the `exec-maven-plugin` that calls `make`, because the IDE doesn't know how to execute the `exec-maven-plugin`, as it doesn't map to an action that the IDE can do. So, you have to do something like `mvn package -DskipTests` to ensure `exec-maven-plugin` builds the native maps in the `server/native` module, and then load the project into Eclipse. At that point, the ITs should be able to find the native maps without any problems. This is just a quirk of how the IDE can't do everything the command-line can.
   
   In any case, checking that the processes are up is a nice fast-fail, so the test doesn't hang in the IDE because of this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
dlmarion commented on pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#issuecomment-1005859037


   Ran `mvn clean` on command line, then attempted to run `ExternalCompaction1_IT.testExternalCompaction()` in Eclipse. It failed with the following error in 18s:
   
   ```
   java.lang.RuntimeException: Timed out waiting for TServer information in ZooKeeper
   	at org.apache.accumulo.miniclusterImpl.MiniAccumuloClusterImpl.verifyUp(MiniAccumuloClusterImpl.java:668)
   	at org.apache.accumulo.miniclusterImpl.MiniAccumuloClusterImpl.start(MiniAccumuloClusterImpl.java:605)
   	at org.apache.accumulo.harness.AccumuloClusterHarness.setupCluster(AccumuloClusterHarness.java:167)
   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
   	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
   	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
   	at org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
   	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
   	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
   	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
   	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
   	at java.base/java.lang.Thread.run(Thread.java:829)
   ```
   
   Both TServers logs showed that they failed to load the native map and exited:
   
   ```
   2022-01-05T16:02:50,957 [tserver.NativeMap] ERROR: FATAL! Accumulo native libraries were requested but could not be be loaded. Either set 'tserver.memory.maps.native.enabled' to false in accumulo.properties or make sure native libraries are created in directories set by the JVM system property 'accumulo.native.lib.path' in accumulo-env.sh!
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] EdColeman commented on a change in pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
EdColeman commented on a change in pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#discussion_r778365311



##########
File path: minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java
##########
@@ -600,6 +602,97 @@ public synchronized void start() throws IOException, InterruptedException {
     if (executor == null) {
       executor = Executors.newSingleThreadExecutor();
     }
+
+    verifyUp();
+
+  }
+
+  private void verifyUp() throws InterruptedException, IOException {
+
+    Objects.requireNonNull(getClusterControl().managerProcess, "Error starting Manager");

Review comment:
       Maybe differentiate between the two error conditions?  Guessing on how the difference between the two messages should read.
   ```
       Objects.requireNonNull(getClusterControl().managerProcess, "Error starting manager - no process");
   '''
   and
   ```
   Objects.requireNonNull(getClusterControl().managerProcess.info().startInstant().get(),
           "Error starting Manager - instance not started");
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
dlmarion commented on pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#issuecomment-1005648699


   I ran all of the ITs yesterday using GitHub Actions. All succeeded except one in ExternalCompaction3_IT that timed out during a restart of the CompactionCoordinator. The test passes locally.
   
   > Another check that you may be able to use is InstanceOperations.ping()
   
   I'm wondering if I should use `InstanceOperations.waitForBalance()` instead. This would mean that the Manager and TabletServers are  up, communicating with each other, and the `root` and `metadata` tables are hosted.
   
   Also, I looked at the ITs that you mentioned. In most cases MAC is started and then these tests are stopping and starting single processes (Monitor, Manager, etc). Calling `verifyUp` after each one may be pretty costly in terms of time and may not provide much value given that the tests are currently succeeding. I think the current implementation meets the initial ask of failing if MAC fails to start at the beginning of the test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#discussion_r778380274



##########
File path: minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java
##########
@@ -600,6 +602,97 @@ public synchronized void start() throws IOException, InterruptedException {
     if (executor == null) {
       executor = Executors.newSingleThreadExecutor();
     }
+
+    verifyUp();
+
+  }
+
+  private void verifyUp() throws InterruptedException, IOException {

Review comment:
       Do you know which tests?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
dlmarion commented on pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#issuecomment-1005676069


   > That is a good one too but would definitely add overhead time to the tests that we might not want.
   
   Ok, then I don't think I have any other changes for this PR at this time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] EdColeman commented on a change in pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
EdColeman commented on a change in pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#discussion_r778378923



##########
File path: minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java
##########
@@ -600,6 +602,97 @@ public synchronized void start() throws IOException, InterruptedException {
     if (executor == null) {
       executor = Executors.newSingleThreadExecutor();
     }
+
+    verifyUp();
+
+  }
+
+  private void verifyUp() throws InterruptedException, IOException {
+
+    Objects.requireNonNull(getClusterControl().managerProcess, "Error starting Manager");
+    Objects.requireNonNull(getClusterControl().managerProcess.info().startInstant().get(),
+        "Error starting Manager");
+
+    Objects.requireNonNull(getClusterControl().gcProcess, "Error starting GC");
+    Objects.requireNonNull(getClusterControl().gcProcess.info().startInstant().get(),
+        "Error starting GC");
+
+    int tsExpectedCount = 0;
+    for (Process tsp : getClusterControl().tabletServerProcesses) {
+      tsExpectedCount++;
+      Objects.requireNonNull(tsp, "Error starting TabletServer");
+      Objects.requireNonNull(tsp.info().startInstant().get(), "Error starting Manager");
+    }
+
+    String zk = config.getExistingZooKeepers() != null ? config.getExistingZooKeepers()
+        : config.getZooKeepers();
+
+    ZooReaderWriter zrw =
+        new ZooReaderWriter(zk, 30000, Property.INSTANCE_SECRET.getDefaultValue());
+
+    String instanceId = null;
+    try {
+      for (String name : zrw.getChildren(Constants.ZROOT + Constants.ZINSTANCES)) {
+        if (name.equals(config.getInstanceName())) {
+          String instanceNamePath = Constants.ZROOT + Constants.ZINSTANCES + "/" + name;
+          byte[] bytes = zrw.getData(instanceNamePath);
+          instanceId = new String(bytes, UTF_8);
+        }
+      }
+    } catch (KeeperException e) {
+      throw new RuntimeException("Unable to read instance name from zookeeper.", e);
+    }
+    String rootPath = ZooUtil.getRoot(instanceId);
+
+    int tsActualCount = 0;
+    int tryCount = 0;
+    while (tsActualCount != tsExpectedCount) {
+      tryCount++;
+      try {
+        tsActualCount = 0;
+        for (String child : zrw.getChildren(rootPath + Constants.ZTSERVERS)) {
+          tsActualCount++;
+          if (zrw.getChildren(rootPath + Constants.ZTSERVERS + "/" + child).isEmpty())
+            throw new RuntimeException("TServer not present in ZooKeeper");
+        }
+      } catch (KeeperException e) {
+        throw new RuntimeException("Unable to read TServer information from zookeeper.", e);
+      }
+      if (tryCount == 10) {

Review comment:
       This (and other places) - could use a relational operator > or >= rather that equality - makes it harder to make a mistake, but likely a style preference. Some checkers would flag it in a for loop termination - which is essential what this is doing here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
milleruntime commented on pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#issuecomment-1005184202


   Another check that you may be able to use is `InstanceOperations.ping()`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] DomGarguilo commented on a change in pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
DomGarguilo commented on a change in pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#discussion_r778373905



##########
File path: minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java
##########
@@ -600,6 +602,97 @@ public synchronized void start() throws IOException, InterruptedException {
     if (executor == null) {
       executor = Executors.newSingleThreadExecutor();
     }
+
+    verifyUp();
+
+  }
+
+  private void verifyUp() throws InterruptedException, IOException {
+
+    Objects.requireNonNull(getClusterControl().managerProcess, "Error starting Manager");
+    Objects.requireNonNull(getClusterControl().managerProcess.info().startInstant().get(),
+        "Error starting Manager");
+
+    Objects.requireNonNull(getClusterControl().gcProcess, "Error starting GC");
+    Objects.requireNonNull(getClusterControl().gcProcess.info().startInstant().get(),
+        "Error starting GC");
+
+    int tsExpectedCount = 0;
+    for (Process tsp : getClusterControl().tabletServerProcesses) {
+      tsExpectedCount++;
+      Objects.requireNonNull(tsp, "Error starting TabletServer");
+      Objects.requireNonNull(tsp.info().startInstant().get(), "Error starting Manager");
+    }
+
+    String zk = config.getExistingZooKeepers() != null ? config.getExistingZooKeepers()
+        : config.getZooKeepers();
+
+    ZooReaderWriter zrw =
+        new ZooReaderWriter(zk, 30000, Property.INSTANCE_SECRET.getDefaultValue());
+
+    String instanceId = null;
+    try {
+      for (String name : zrw.getChildren(Constants.ZROOT + Constants.ZINSTANCES)) {
+        if (name.equals(config.getInstanceName())) {
+          String instanceNamePath = Constants.ZROOT + Constants.ZINSTANCES + "/" + name;
+          byte[] bytes = zrw.getData(instanceNamePath);
+          instanceId = new String(bytes, UTF_8);
+        }
+      }
+    } catch (KeeperException e) {
+      throw new RuntimeException("Unable to read instance name from zookeeper.", e);
+    }
+    String rootPath = ZooUtil.getRoot(instanceId);
+
+    int tsActualCount = 0;
+    int tryCount = 0;
+    while (tsActualCount != tsExpectedCount) {
+      tryCount++;
+      try {
+        tsActualCount = 0;
+        for (String child : zrw.getChildren(rootPath + Constants.ZTSERVERS)) {
+          tsActualCount++;
+          if (zrw.getChildren(rootPath + Constants.ZTSERVERS + "/" + child).isEmpty())
+            throw new RuntimeException("TServer not present in ZooKeeper");
+        }
+      } catch (KeeperException e) {
+        throw new RuntimeException("Unable to read TServer information from zookeeper.", e);
+      }
+      if (tryCount == 10) {
+        throw new RuntimeException("Timed out waiting for TServer information in ZooKeeper");
+      }
+      Thread.sleep(1000);
+    }
+
+    Preconditions.checkArgument(tsExpectedCount == tsActualCount,

Review comment:
       I think this will always be true according to the while loop condition above.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] DomGarguilo commented on a change in pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
DomGarguilo commented on a change in pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#discussion_r778369899



##########
File path: minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java
##########
@@ -600,6 +602,97 @@ public synchronized void start() throws IOException, InterruptedException {
     if (executor == null) {
       executor = Executors.newSingleThreadExecutor();
     }
+
+    verifyUp();
+
+  }
+
+  private void verifyUp() throws InterruptedException, IOException {
+
+    Objects.requireNonNull(getClusterControl().managerProcess, "Error starting Manager");
+    Objects.requireNonNull(getClusterControl().managerProcess.info().startInstant().get(),
+        "Error starting Manager");
+
+    Objects.requireNonNull(getClusterControl().gcProcess, "Error starting GC");
+    Objects.requireNonNull(getClusterControl().gcProcess.info().startInstant().get(),
+        "Error starting GC");
+
+    int tsExpectedCount = 0;
+    for (Process tsp : getClusterControl().tabletServerProcesses) {
+      tsExpectedCount++;
+      Objects.requireNonNull(tsp, "Error starting TabletServer");
+      Objects.requireNonNull(tsp.info().startInstant().get(), "Error starting Manager");
+    }
+
+    String zk = config.getExistingZooKeepers() != null ? config.getExistingZooKeepers()
+        : config.getZooKeepers();
+
+    ZooReaderWriter zrw =
+        new ZooReaderWriter(zk, 30000, Property.INSTANCE_SECRET.getDefaultValue());
+
+    String instanceId = null;
+    try {
+      for (String name : zrw.getChildren(Constants.ZROOT + Constants.ZINSTANCES)) {
+        if (name.equals(config.getInstanceName())) {
+          String instanceNamePath = Constants.ZROOT + Constants.ZINSTANCES + "/" + name;
+          byte[] bytes = zrw.getData(instanceNamePath);
+          instanceId = new String(bytes, UTF_8);
+        }
+      }
+    } catch (KeeperException e) {
+      throw new RuntimeException("Unable to read instance name from zookeeper.", e);
+    }
+    String rootPath = ZooUtil.getRoot(instanceId);

Review comment:
       Is `instanceID` only expected to be found once? If so, can we break out of the loop after it is found? 
   
   Also, is it possible that it is never found and `instanceID` stays null? If so, would that cause issues with `ZooUtil.getRoot(instanceId)`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion merged pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
dlmarion merged pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] EdColeman commented on a change in pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
EdColeman commented on a change in pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#discussion_r778366481



##########
File path: minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java
##########
@@ -600,6 +602,97 @@ public synchronized void start() throws IOException, InterruptedException {
     if (executor == null) {
       executor = Executors.newSingleThreadExecutor();
     }
+
+    verifyUp();
+
+  }
+
+  private void verifyUp() throws InterruptedException, IOException {
+
+    Objects.requireNonNull(getClusterControl().managerProcess, "Error starting Manager");
+    Objects.requireNonNull(getClusterControl().managerProcess.info().startInstant().get(),
+        "Error starting Manager");
+
+    Objects.requireNonNull(getClusterControl().gcProcess, "Error starting GC");
+    Objects.requireNonNull(getClusterControl().gcProcess.info().startInstant().get(),
+        "Error starting GC");
+
+    int tsExpectedCount = 0;
+    for (Process tsp : getClusterControl().tabletServerProcesses) {
+      tsExpectedCount++;
+      Objects.requireNonNull(tsp, "Error starting TabletServer");
+      Objects.requireNonNull(tsp.info().startInstant().get(), "Error starting Manager");

Review comment:
       Should this read
   ``` 
   "Error starting tserver"
   ```
   and maybe add "instance not started" ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on a change in pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#discussion_r778370416



##########
File path: minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java
##########
@@ -600,6 +602,97 @@ public synchronized void start() throws IOException, InterruptedException {
     if (executor == null) {
       executor = Executors.newSingleThreadExecutor();
     }
+
+    verifyUp();
+
+  }
+
+  private void verifyUp() throws InterruptedException, IOException {

Review comment:
       It might be useful to make this public (and available outside of the impl) so tests can use it as well. I am pretty sure there are a few tests doing this already, maybe this could replace duplicate code.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#discussion_r779097891



##########
File path: minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java
##########
@@ -600,6 +601,101 @@ public synchronized void start() throws IOException, InterruptedException {
     if (executor == null) {
       executor = Executors.newSingleThreadExecutor();
     }
+
+    verifyUp();
+
+  }
+
+  private void verifyUp() throws InterruptedException, IOException {
+
+    Objects.requireNonNull(getClusterControl().managerProcess,

Review comment:
       `requireNonNull` could be a static import, so these lines are shorter and slightly easier to read everywhere.

##########
File path: minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java
##########
@@ -600,6 +601,101 @@ public synchronized void start() throws IOException, InterruptedException {
     if (executor == null) {
       executor = Executors.newSingleThreadExecutor();
     }
+
+    verifyUp();
+
+  }
+
+  private void verifyUp() throws InterruptedException, IOException {
+
+    Objects.requireNonNull(getClusterControl().managerProcess,
+        "Error starting Manager - no process");
+    Objects.requireNonNull(getClusterControl().managerProcess.info().startInstant().get(),
+        "Error starting Manager - instance not started");
+
+    Objects.requireNonNull(getClusterControl().gcProcess, "Error starting GC - no process");
+    Objects.requireNonNull(getClusterControl().gcProcess.info().startInstant().get(),
+        "Error starting GC - instance not started");
+
+    int tsExpectedCount = 0;
+    for (Process tsp : getClusterControl().tabletServerProcesses) {
+      tsExpectedCount++;
+      Objects.requireNonNull(tsp,
+          "Error starting TabletServer " + tsExpectedCount + " - no process");
+      Objects.requireNonNull(tsp.info().startInstant().get(),
+          "Error starting TabletServer " + tsExpectedCount + "- instance not started");
+    }
+
+    String zk = config.getExistingZooKeepers() != null ? config.getExistingZooKeepers()
+        : config.getZooKeepers();
+
+    ZooReaderWriter zrw =
+        new ZooReaderWriter(zk, 30000, Property.INSTANCE_SECRET.getDefaultValue());
+
+    String instanceId = null;
+    try {
+      for (String name : zrw.getChildren(Constants.ZROOT + Constants.ZINSTANCES)) {
+        if (name.equals(config.getInstanceName())) {
+          String instanceNamePath = Constants.ZROOT + Constants.ZINSTANCES + "/" + name;
+          byte[] bytes = zrw.getData(instanceNamePath);
+          instanceId = new String(bytes, UTF_8);
+          break;
+        }
+      }
+    } catch (KeeperException e) {
+      throw new RuntimeException("Unable to read instance name from zookeeper.", e);

Review comment:
       all these occurrences of `throw new RuntimeException` could use a more specific RTE that is applicable, rather than the generic parent of all RTEs. `IllegalStateException` might be appropriate.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on a change in pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#discussion_r778801722



##########
File path: minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java
##########
@@ -600,6 +602,97 @@ public synchronized void start() throws IOException, InterruptedException {
     if (executor == null) {
       executor = Executors.newSingleThreadExecutor();
     }
+
+    verifyUp();
+
+  }
+
+  private void verifyUp() throws InterruptedException, IOException {

Review comment:
       And this could be a follow on task since its probably a lot of work.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#issuecomment-1006371786


   @dlmarion This change broke a bunch of ITs, and the numerous crashing ITs:
   
   ```
   139 Failed Tests:
       AccumuloFileOutputFormatIT.testRealWrite after 7.9 sec
       AccumuloFileOutputFormatIT.writeBadVisibility after 8.1 sec
       AccumuloInputFormatIT.testCorrectRangeInputSplits after 8 sec
       AccumuloInputFormatIT.testSample after 8 sec
       AccumuloOutputFormatIT.testMapred after 6.8 sec
       AccumuloFileOutputFormatIT.testRealWrite after 7.9 sec
       AccumuloFileOutputFormatIT.writeBadVisibility after 7.9 sec
       AccumuloInputFormatIT.testCorrectRangeInputSplits after 7 sec
       AccumuloInputFormatIT.testMapWithBatchScanner after 6.8 sec
       AccumuloInputFormatIT.testGetSplits after 6.9 sec
       AccumuloInputFormatIT.testSample after 7 sec
       AccumuloInputFormatIT.testPartialInputSplitDelegationToConfiguration after 6.8 sec
       AuditMessageIT.testDataOperationsAudits after 6.9 sec
       AuditMessageIT.testImportExportOperationsAudits after 7 sec
       AuditMessageIT.testDeniedAudits after 6.8 sec
       AuditMessageIT.testTableOperationsAudits after 6.9 sec
       AuditMessageIT.testUserOperationsAudits after 6.8 sec
       BatchWriterInTabletServerIT.testNormalWrite after 8.1 sec
       ClientSideIteratorIT.testIntersect after 7.9 sec
       CloneIT.testClonedMarker after 7.9 sec
       CloneIT.testFilesChange after 7.9 sec
       CloneIT.testNoFiles after 8.1 sec
       CloneIT.testSplit1 after 7.9 sec
       CloneIT.testSplit2 after 7.9 sec
       CloneIT.testSplit3 after 8.1 sec
       ExistingMacIT.testExistingInstance after 34 sec
       LargeSplitRowIT.automaticSplitLater after 6.9 sec
       LargeSplitRowIT.automaticSplitWithGaps after 6.8 sec
       LargeSplitRowIT.automaticSplitWith250Same after 6.9 sec
       LargeSplitRowIT.automaticSplitWithoutGaps after 6.8 sec
       MetaSplitIT.testRootTableSplit after 7.9 sec
       MetaSplitIT.testMetadataTableSplit after 7.8 sec
       MissingWalHeaderCompletesRecoveryIT.testPartialHeaderWalRecoveryCompletes after 7.7 sec
       MultiTableBatchWriterIT.testOfflineTable after 8.1 sec
       MultiTableBatchWriterIT.testTableRenameDataValidation after 8 sec
       MultiT…
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
dlmarion commented on pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#issuecomment-1006637416


   @ctubbsii - That's interesting. A few of the tests passed locally and all but two passed when run using GitHub Actions. I wonder what the difference is/was. Anyway, thanks for the assist. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on a change in pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#discussion_r778407206



##########
File path: minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java
##########
@@ -600,6 +602,97 @@ public synchronized void start() throws IOException, InterruptedException {
     if (executor == null) {
       executor = Executors.newSingleThreadExecutor();
     }
+
+    verifyUp();
+
+  }
+
+  private void verifyUp() throws InterruptedException, IOException {

Review comment:
       I couldn't think of any off the top of my head but I did a quick search of ITs that look for ZK locks and found: RestartIT, ReadWriteIT, ThriftServerBindsBeforeZooKeeperLockIT, TabletStateChangeIteratorIT




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on a change in pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#discussion_r778388493



##########
File path: minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java
##########
@@ -600,6 +602,97 @@ public synchronized void start() throws IOException, InterruptedException {
     if (executor == null) {
       executor = Executors.newSingleThreadExecutor();
     }
+
+    verifyUp();
+
+  }
+
+  private void verifyUp() throws InterruptedException, IOException {
+
+    Objects.requireNonNull(getClusterControl().managerProcess, "Error starting Manager");
+    Objects.requireNonNull(getClusterControl().managerProcess.info().startInstant().get(),
+        "Error starting Manager");
+
+    Objects.requireNonNull(getClusterControl().gcProcess, "Error starting GC");
+    Objects.requireNonNull(getClusterControl().gcProcess.info().startInstant().get(),
+        "Error starting GC");
+
+    int tsExpectedCount = 0;
+    for (Process tsp : getClusterControl().tabletServerProcesses) {
+      tsExpectedCount++;
+      Objects.requireNonNull(tsp, "Error starting TabletServer");
+      Objects.requireNonNull(tsp.info().startInstant().get(), "Error starting Manager");
+    }
+
+    String zk = config.getExistingZooKeepers() != null ? config.getExistingZooKeepers()
+        : config.getZooKeepers();
+
+    ZooReaderWriter zrw =
+        new ZooReaderWriter(zk, 30000, Property.INSTANCE_SECRET.getDefaultValue());
+
+    String instanceId = null;
+    try {
+      for (String name : zrw.getChildren(Constants.ZROOT + Constants.ZINSTANCES)) {
+        if (name.equals(config.getInstanceName())) {
+          String instanceNamePath = Constants.ZROOT + Constants.ZINSTANCES + "/" + name;
+          byte[] bytes = zrw.getData(instanceNamePath);
+          instanceId = new String(bytes, UTF_8);
+        }
+      }
+    } catch (KeeperException e) {
+      throw new RuntimeException("Unable to read instance name from zookeeper.", e);
+    }
+    String rootPath = ZooUtil.getRoot(instanceId);
+
+    int tsActualCount = 0;
+    int tryCount = 0;
+    while (tsActualCount != tsExpectedCount) {
+      tryCount++;
+      try {
+        tsActualCount = 0;
+        for (String child : zrw.getChildren(rootPath + Constants.ZTSERVERS)) {
+          tsActualCount++;
+          if (zrw.getChildren(rootPath + Constants.ZTSERVERS + "/" + child).isEmpty())
+            throw new RuntimeException("TServer not present in ZooKeeper");
+        }
+      } catch (KeeperException e) {
+        throw new RuntimeException("Unable to read TServer information from zookeeper.", e);
+      }
+      if (tryCount == 10) {

Review comment:
       Addressed in 35efd1c

##########
File path: minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java
##########
@@ -600,6 +602,97 @@ public synchronized void start() throws IOException, InterruptedException {
     if (executor == null) {
       executor = Executors.newSingleThreadExecutor();
     }
+
+    verifyUp();
+
+  }
+
+  private void verifyUp() throws InterruptedException, IOException {
+
+    Objects.requireNonNull(getClusterControl().managerProcess, "Error starting Manager");
+    Objects.requireNonNull(getClusterControl().managerProcess.info().startInstant().get(),
+        "Error starting Manager");
+
+    Objects.requireNonNull(getClusterControl().gcProcess, "Error starting GC");
+    Objects.requireNonNull(getClusterControl().gcProcess.info().startInstant().get(),
+        "Error starting GC");
+
+    int tsExpectedCount = 0;
+    for (Process tsp : getClusterControl().tabletServerProcesses) {
+      tsExpectedCount++;
+      Objects.requireNonNull(tsp, "Error starting TabletServer");
+      Objects.requireNonNull(tsp.info().startInstant().get(), "Error starting Manager");
+    }
+
+    String zk = config.getExistingZooKeepers() != null ? config.getExistingZooKeepers()
+        : config.getZooKeepers();
+
+    ZooReaderWriter zrw =
+        new ZooReaderWriter(zk, 30000, Property.INSTANCE_SECRET.getDefaultValue());
+
+    String instanceId = null;
+    try {
+      for (String name : zrw.getChildren(Constants.ZROOT + Constants.ZINSTANCES)) {
+        if (name.equals(config.getInstanceName())) {
+          String instanceNamePath = Constants.ZROOT + Constants.ZINSTANCES + "/" + name;
+          byte[] bytes = zrw.getData(instanceNamePath);
+          instanceId = new String(bytes, UTF_8);
+        }
+      }
+    } catch (KeeperException e) {
+      throw new RuntimeException("Unable to read instance name from zookeeper.", e);
+    }
+    String rootPath = ZooUtil.getRoot(instanceId);
+
+    int tsActualCount = 0;
+    int tryCount = 0;
+    while (tsActualCount != tsExpectedCount) {
+      tryCount++;
+      try {
+        tsActualCount = 0;
+        for (String child : zrw.getChildren(rootPath + Constants.ZTSERVERS)) {
+          tsActualCount++;
+          if (zrw.getChildren(rootPath + Constants.ZTSERVERS + "/" + child).isEmpty())
+            throw new RuntimeException("TServer not present in ZooKeeper");
+        }
+      } catch (KeeperException e) {
+        throw new RuntimeException("Unable to read TServer information from zookeeper.", e);
+      }
+      if (tryCount == 10) {
+        throw new RuntimeException("Timed out waiting for TServer information in ZooKeeper");
+      }
+      Thread.sleep(1000);
+    }
+
+    Preconditions.checkArgument(tsExpectedCount == tsActualCount,

Review comment:
       Addressed in 35efd1c

##########
File path: minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java
##########
@@ -600,6 +602,97 @@ public synchronized void start() throws IOException, InterruptedException {
     if (executor == null) {
       executor = Executors.newSingleThreadExecutor();
     }
+
+    verifyUp();
+
+  }
+
+  private void verifyUp() throws InterruptedException, IOException {
+
+    Objects.requireNonNull(getClusterControl().managerProcess, "Error starting Manager");
+    Objects.requireNonNull(getClusterControl().managerProcess.info().startInstant().get(),
+        "Error starting Manager");
+
+    Objects.requireNonNull(getClusterControl().gcProcess, "Error starting GC");
+    Objects.requireNonNull(getClusterControl().gcProcess.info().startInstant().get(),
+        "Error starting GC");
+
+    int tsExpectedCount = 0;
+    for (Process tsp : getClusterControl().tabletServerProcesses) {
+      tsExpectedCount++;
+      Objects.requireNonNull(tsp, "Error starting TabletServer");
+      Objects.requireNonNull(tsp.info().startInstant().get(), "Error starting Manager");
+    }
+
+    String zk = config.getExistingZooKeepers() != null ? config.getExistingZooKeepers()
+        : config.getZooKeepers();
+
+    ZooReaderWriter zrw =
+        new ZooReaderWriter(zk, 30000, Property.INSTANCE_SECRET.getDefaultValue());
+
+    String instanceId = null;
+    try {
+      for (String name : zrw.getChildren(Constants.ZROOT + Constants.ZINSTANCES)) {
+        if (name.equals(config.getInstanceName())) {
+          String instanceNamePath = Constants.ZROOT + Constants.ZINSTANCES + "/" + name;
+          byte[] bytes = zrw.getData(instanceNamePath);
+          instanceId = new String(bytes, UTF_8);
+        }
+      }
+    } catch (KeeperException e) {
+      throw new RuntimeException("Unable to read instance name from zookeeper.", e);
+    }
+    String rootPath = ZooUtil.getRoot(instanceId);

Review comment:
       Addressed in 35efd1c

##########
File path: minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java
##########
@@ -600,6 +602,97 @@ public synchronized void start() throws IOException, InterruptedException {
     if (executor == null) {
       executor = Executors.newSingleThreadExecutor();
     }
+
+    verifyUp();
+
+  }
+
+  private void verifyUp() throws InterruptedException, IOException {
+
+    Objects.requireNonNull(getClusterControl().managerProcess, "Error starting Manager");
+    Objects.requireNonNull(getClusterControl().managerProcess.info().startInstant().get(),
+        "Error starting Manager");
+
+    Objects.requireNonNull(getClusterControl().gcProcess, "Error starting GC");
+    Objects.requireNonNull(getClusterControl().gcProcess.info().startInstant().get(),
+        "Error starting GC");
+
+    int tsExpectedCount = 0;
+    for (Process tsp : getClusterControl().tabletServerProcesses) {
+      tsExpectedCount++;
+      Objects.requireNonNull(tsp, "Error starting TabletServer");
+      Objects.requireNonNull(tsp.info().startInstant().get(), "Error starting Manager");
+    }
+
+    String zk = config.getExistingZooKeepers() != null ? config.getExistingZooKeepers()
+        : config.getZooKeepers();
+
+    ZooReaderWriter zrw =
+        new ZooReaderWriter(zk, 30000, Property.INSTANCE_SECRET.getDefaultValue());
+
+    String instanceId = null;
+    try {
+      for (String name : zrw.getChildren(Constants.ZROOT + Constants.ZINSTANCES)) {
+        if (name.equals(config.getInstanceName())) {
+          String instanceNamePath = Constants.ZROOT + Constants.ZINSTANCES + "/" + name;
+          byte[] bytes = zrw.getData(instanceNamePath);
+          instanceId = new String(bytes, UTF_8);
+        }
+      }
+    } catch (KeeperException e) {
+      throw new RuntimeException("Unable to read instance name from zookeeper.", e);
+    }
+    String rootPath = ZooUtil.getRoot(instanceId);
+
+    int tsActualCount = 0;
+    int tryCount = 0;
+    while (tsActualCount != tsExpectedCount) {
+      tryCount++;
+      try {
+        tsActualCount = 0;
+        for (String child : zrw.getChildren(rootPath + Constants.ZTSERVERS)) {
+          tsActualCount++;
+          if (zrw.getChildren(rootPath + Constants.ZTSERVERS + "/" + child).isEmpty())
+            throw new RuntimeException("TServer not present in ZooKeeper");

Review comment:
       Addressed in 35efd1c

##########
File path: minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java
##########
@@ -600,6 +602,97 @@ public synchronized void start() throws IOException, InterruptedException {
     if (executor == null) {
       executor = Executors.newSingleThreadExecutor();
     }
+
+    verifyUp();
+
+  }
+
+  private void verifyUp() throws InterruptedException, IOException {
+
+    Objects.requireNonNull(getClusterControl().managerProcess, "Error starting Manager");
+    Objects.requireNonNull(getClusterControl().managerProcess.info().startInstant().get(),
+        "Error starting Manager");
+
+    Objects.requireNonNull(getClusterControl().gcProcess, "Error starting GC");
+    Objects.requireNonNull(getClusterControl().gcProcess.info().startInstant().get(),
+        "Error starting GC");
+
+    int tsExpectedCount = 0;
+    for (Process tsp : getClusterControl().tabletServerProcesses) {
+      tsExpectedCount++;
+      Objects.requireNonNull(tsp, "Error starting TabletServer");
+      Objects.requireNonNull(tsp.info().startInstant().get(), "Error starting Manager");

Review comment:
       Addressed in 35efd1c

##########
File path: minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java
##########
@@ -600,6 +602,97 @@ public synchronized void start() throws IOException, InterruptedException {
     if (executor == null) {
       executor = Executors.newSingleThreadExecutor();
     }
+
+    verifyUp();
+
+  }
+
+  private void verifyUp() throws InterruptedException, IOException {
+
+    Objects.requireNonNull(getClusterControl().managerProcess, "Error starting Manager");

Review comment:
       Addressed in 35efd1c




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#issuecomment-1006436032


   Fixed the issue identified above in #2410 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] EdColeman commented on a change in pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
EdColeman commented on a change in pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#discussion_r778365311



##########
File path: minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java
##########
@@ -600,6 +602,97 @@ public synchronized void start() throws IOException, InterruptedException {
     if (executor == null) {
       executor = Executors.newSingleThreadExecutor();
     }
+
+    verifyUp();
+
+  }
+
+  private void verifyUp() throws InterruptedException, IOException {
+
+    Objects.requireNonNull(getClusterControl().managerProcess, "Error starting Manager");

Review comment:
       Maybe differentiate between the two error conditions?  Guessing on how the difference between the two messages should read.
   ```
       Objects.requireNonNull(getClusterControl().managerProcess, "Error starting manager - no process");
   ```
   and
   ```
   Objects.requireNonNull(getClusterControl().managerProcess.info().startInstant().get(),
           "Error starting Manager - instance not started");
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
milleruntime commented on pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#issuecomment-1005681576


   Were you able to reproduce an IT failure due to missing native map?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
milleruntime commented on pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#issuecomment-1005672375


   > I ran all of the ITs yesterday using GitHub Actions. All succeeded except one in ExternalCompaction3_IT that timed out during a restart of the CompactionCoordinator. The test passes locally.
   > 
   > > Another check that you may be able to use is InstanceOperations.ping()
   > 
   > I'm wondering if I should use `InstanceOperations.waitForBalance()` instead. This would mean that the Manager and TabletServers are up, communicating with each other, and the `root` and `metadata` tables are hosted.
   
   That is a good one too but would definitely add overhead time to the tests that we might not want.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] EdColeman commented on pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
EdColeman commented on pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#issuecomment-1005687820


   Re missing native map - not sure if it applies in this case, but if you have a build fail and then try to run mini it will fail because the native map is missing,  It has to do with the build order / dependencies and  the desire to make sure mini is using native maps during testing.
   
   If I recall, to reproduce - run a build with mvn that fails because of  something like a check-style or formatting error. Run mini in the ide - it will build and run, but not have the native maps and the error shows up in a tserver log, 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii edited a comment on pull request #2408: Attempt to verify that MAC processes are up and running in start

Posted by GitBox <gi...@apache.org>.
ctubbsii edited a comment on pull request #2408:
URL: https://github.com/apache/accumulo/pull/2408#issuecomment-1006371786


   @dlmarion This change broke a bunch of ITs, and the numerous crashing ITs caused my Jenkins server to fall over from OOM due to zombie background MAC processes:
   
   ```
   139 Failed Tests:
       AccumuloFileOutputFormatIT.testRealWrite after 7.9 sec
       AccumuloFileOutputFormatIT.writeBadVisibility after 8.1 sec
       AccumuloInputFormatIT.testCorrectRangeInputSplits after 8 sec
       AccumuloInputFormatIT.testSample after 8 sec
       AccumuloOutputFormatIT.testMapred after 6.8 sec
       AccumuloFileOutputFormatIT.testRealWrite after 7.9 sec
       AccumuloFileOutputFormatIT.writeBadVisibility after 7.9 sec
       AccumuloInputFormatIT.testCorrectRangeInputSplits after 7 sec
       AccumuloInputFormatIT.testMapWithBatchScanner after 6.8 sec
       AccumuloInputFormatIT.testGetSplits after 6.9 sec
       AccumuloInputFormatIT.testSample after 7 sec
       AccumuloInputFormatIT.testPartialInputSplitDelegationToConfiguration after 6.8 sec
       AuditMessageIT.testDataOperationsAudits after 6.9 sec
       AuditMessageIT.testImportExportOperationsAudits after 7 sec
       AuditMessageIT.testDeniedAudits after 6.8 sec
       AuditMessageIT.testTableOperationsAudits after 6.9 sec
       AuditMessageIT.testUserOperationsAudits after 6.8 sec
       BatchWriterInTabletServerIT.testNormalWrite after 8.1 sec
       ClientSideIteratorIT.testIntersect after 7.9 sec
       CloneIT.testClonedMarker after 7.9 sec
       CloneIT.testFilesChange after 7.9 sec
       CloneIT.testNoFiles after 8.1 sec
       CloneIT.testSplit1 after 7.9 sec
       CloneIT.testSplit2 after 7.9 sec
       CloneIT.testSplit3 after 8.1 sec
       ExistingMacIT.testExistingInstance after 34 sec
       LargeSplitRowIT.automaticSplitLater after 6.9 sec
       LargeSplitRowIT.automaticSplitWithGaps after 6.8 sec
       LargeSplitRowIT.automaticSplitWith250Same after 6.9 sec
       LargeSplitRowIT.automaticSplitWithoutGaps after 6.8 sec
       MetaSplitIT.testRootTableSplit after 7.9 sec
       MetaSplitIT.testMetadataTableSplit after 7.8 sec
       MissingWalHeaderCompletesRecoveryIT.testPartialHeaderWalRecoveryCompletes after 7.7 sec
       MultiTableBatchWriterIT.testOfflineTable after 8.1 sec
       MultiTableBatchWriterIT.testTableRenameDataValidation after 8 sec
       MultiT…
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org