You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "dlmarion (via GitHub)" <gi...@apache.org> on 2023/07/06 16:11:13 UTC

[GitHub] [accumulo] dlmarion opened a new pull request, #3571: 3512 tserver not shutting down

dlmarion opened a new pull request, #3571:
URL: https://github.com/apache/accumulo/pull/3571

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#issuecomment-1625623767

   > The TabletStateChangeIterator will return all tablets if the ManagerState is not normal [ref](https://github.com/apache/accumulo/blob/main/server/base/src/main/java/org/apache/accumulo/server/manager/state/TabletStateChangeIterator.java#L170). In this test, the Manager is likely normal as we are not shutting everything down, I'm just shutting down the tablet servers with everything else up.
   
   When the user uses this functionality, will the manager state be adjusted?  Wondering if when the user does this if the TabletStateChangeIterator will switch to a mode of returning all tablets.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#discussion_r1254931178


##########
server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java:
##########
@@ -257,11 +257,24 @@ public void run() {
           if (state == TabletState.ASSIGNED) {
             goal = TabletGoalState.HOSTED;
           }
+          if (Manager.log.isTraceEnabled()) {
+            Manager.log.trace(
+                "[{}] Shutting down all Tservers: {}, dependentCount: {} Extent: {}, state: {}, goal: {}",
+                store.name(), manager.serversToShutdown.equals(currentTServers.keySet()),
+                dependentWatcher == null ? "null" : dependentWatcher.assignedOrHosted(), tls.extent,
+                state, goal);
+          }
 
           // if we are shutting down all the tabletservers, we have to do it in order
           if ((goal == TabletGoalState.SUSPENDED && state == TabletState.HOSTED)
               && manager.serversToShutdown.equals(currentTServers.keySet())) {
-            if (dependentWatcher != null && dependentWatcher.assignedOrHosted() > 0) {
+            // If the stats object in the dependentWatcher is empty, then no tablets

Review Comment:
   > I am wondering if the isEmpty() check could cause things to hang in the case when there are zero user tablets. Can not distinguish that case from not having scanned everything. Should always have a non-zero tablets at the root and metadata level, however can not assume the user level has any tablets.
   
   If you look at how they are [constructed](https://github.com/dlmarion/accumulo/blob/3512-tserver-not-shutting-down/server/manager/src/main/java/org/apache/accumulo/manager/Manager.java#L1140), the user TGW does not have a dependent watcher.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#discussion_r1254907588


##########
server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java:
##########
@@ -257,11 +257,24 @@ public void run() {
           if (state == TabletState.ASSIGNED) {
             goal = TabletGoalState.HOSTED;
           }
+          if (Manager.log.isTraceEnabled()) {
+            Manager.log.trace(
+                "[{}] Shutting down all Tservers: {}, dependentCount: {} Extent: {}, state: {}, goal: {}",
+                store.name(), manager.serversToShutdown.equals(currentTServers.keySet()),
+                dependentWatcher == null ? "null" : dependentWatcher.assignedOrHosted(), tls.extent,
+                state, goal);
+          }
 
           // if we are shutting down all the tabletservers, we have to do it in order
           if ((goal == TabletGoalState.SUSPENDED && state == TabletState.HOSTED)
               && manager.serversToShutdown.equals(currentTServers.keySet())) {
-            if (dependentWatcher != null && dependentWatcher.assignedOrHosted() > 0) {
+            // If the stats object in the dependentWatcher is empty, then no tablets

Review Comment:
   I don't quite understand the `everything is fine` is comment.  Looking at your changes I interpret the added isEmpty to rule out the case when the dependent watcher has no data about what is hosted or not.  
   
   ```
   If the stats object in the dependentWatcher is empty, then it currently does not have data about what is hosted or not.  In that case host these tablets until the dependent watcher can gather some data.
   ```



##########
server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java:
##########
@@ -257,11 +257,24 @@ public void run() {
           if (state == TabletState.ASSIGNED) {
             goal = TabletGoalState.HOSTED;
           }
+          if (Manager.log.isTraceEnabled()) {
+            Manager.log.trace(
+                "[{}] Shutting down all Tservers: {}, dependentCount: {} Extent: {}, state: {}, goal: {}",
+                store.name(), manager.serversToShutdown.equals(currentTServers.keySet()),
+                dependentWatcher == null ? "null" : dependentWatcher.assignedOrHosted(), tls.extent,
+                state, goal);
+          }
 
           // if we are shutting down all the tabletservers, we have to do it in order
           if ((goal == TabletGoalState.SUSPENDED && state == TabletState.HOSTED)
               && manager.serversToShutdown.equals(currentTServers.keySet())) {
-            if (dependentWatcher != null && dependentWatcher.assignedOrHosted() > 0) {
+            // If the stats object in the dependentWatcher is empty, then no tablets
+            // are being reported as needing to be hosted (everything is fine). If
+            // everything is fine, or there are still assigned or hosted tablets,
+            // then we cannot suspend the tablets for this watcher yet. Continue
+            // to host these tablets.
+            if (dependentWatcher != null && (dependentWatcher.getStats().isEmpty()
+                || dependentWatcher.assignedOrHosted() > 0)) {

Review Comment:
   It would be less racy if both isEmpty and assignedOrHosted operated on the same stats map.  Being called at different times they could get different maps.  Could make the assignedOrHosted method a static util method that takes a map as an argument.
   
   ```suggestion
               // to host these tablets.
               var stats = dependentWatcher.getStats();
               if (dependentWatcher != null && (stats.isEmpty()
                   || assignedOrHosted(stats) > 0)) {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#discussion_r1254936827


##########
server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java:
##########
@@ -257,11 +257,24 @@ public void run() {
           if (state == TabletState.ASSIGNED) {
             goal = TabletGoalState.HOSTED;
           }
+          if (Manager.log.isTraceEnabled()) {
+            Manager.log.trace(
+                "[{}] Shutting down all Tservers: {}, dependentCount: {} Extent: {}, state: {}, goal: {}",
+                store.name(), manager.serversToShutdown.equals(currentTServers.keySet()),
+                dependentWatcher == null ? "null" : dependentWatcher.assignedOrHosted(), tls.extent,
+                state, goal);
+          }
 
           // if we are shutting down all the tabletservers, we have to do it in order
           if ((goal == TabletGoalState.SUSPENDED && state == TabletState.HOSTED)
               && manager.serversToShutdown.equals(currentTServers.keySet())) {
-            if (dependentWatcher != null && dependentWatcher.assignedOrHosted() > 0) {
+            // If the stats object in the dependentWatcher is empty, then no tablets

Review Comment:
   Root tablets will continue to be hosted while the metadata tablets are hosted/assigned or no updated information.
   Metadata tables will continue to be hosted while the user tablets are hosted/assigned or no updated information.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#discussion_r1254922179


##########
server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java:
##########
@@ -257,11 +257,24 @@ public void run() {
           if (state == TabletState.ASSIGNED) {
             goal = TabletGoalState.HOSTED;
           }
+          if (Manager.log.isTraceEnabled()) {
+            Manager.log.trace(
+                "[{}] Shutting down all Tservers: {}, dependentCount: {} Extent: {}, state: {}, goal: {}",
+                store.name(), manager.serversToShutdown.equals(currentTServers.keySet()),
+                dependentWatcher == null ? "null" : dependentWatcher.assignedOrHosted(), tls.extent,
+                state, goal);
+          }
 
           // if we are shutting down all the tabletservers, we have to do it in order
           if ((goal == TabletGoalState.SUSPENDED && state == TabletState.HOSTED)
               && manager.serversToShutdown.equals(currentTServers.keySet())) {
-            if (dependentWatcher != null && dependentWatcher.assignedOrHosted() > 0) {
+            // If the stats object in the dependentWatcher is empty, then no tablets

Review Comment:
   I am wondering if the isEmpty() check could cause things to hang in the case when there are zero user tablets.  Can not distinguish that case from not having scanned everything.  Should always have a non-zero tablets at the root and metadata level, however can not assume the user level has any tablets.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#discussion_r1255714020


##########
server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java:
##########
@@ -257,11 +257,24 @@ public void run() {
           if (state == TabletState.ASSIGNED) {
             goal = TabletGoalState.HOSTED;
           }
+          if (Manager.log.isTraceEnabled()) {
+            Manager.log.trace(
+                "[{}] Shutting down all Tservers: {}, dependentCount: {} Extent: {}, state: {}, goal: {}",
+                store.name(), manager.serversToShutdown.equals(currentTServers.keySet()),
+                dependentWatcher == null ? "null" : dependentWatcher.assignedOrHosted(), tls.extent,
+                state, goal);
+          }
 
           // if we are shutting down all the tabletservers, we have to do it in order
           if ((goal == TabletGoalState.SUSPENDED && state == TabletState.HOSTED)
               && manager.serversToShutdown.equals(currentTServers.keySet())) {
-            if (dependentWatcher != null && dependentWatcher.assignedOrHosted() > 0) {
+            // If the stats object in the dependentWatcher is empty, then no tablets
+            // are being reported as needing to be hosted (everything is fine). If
+            // everything is fine, or there are still assigned or hosted tablets,
+            // then we cannot suspend the tablets for this watcher yet. Continue
+            // to host these tablets.
+            if (dependentWatcher != null && (dependentWatcher.getStats().isEmpty()
+                || dependentWatcher.assignedOrHosted() > 0)) {

Review Comment:
   Modified to use the same stats object in c0b9f14



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#discussion_r1254917391


##########
server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java:
##########
@@ -257,11 +257,24 @@ public void run() {
           if (state == TabletState.ASSIGNED) {
             goal = TabletGoalState.HOSTED;
           }
+          if (Manager.log.isTraceEnabled()) {
+            Manager.log.trace(
+                "[{}] Shutting down all Tservers: {}, dependentCount: {} Extent: {}, state: {}, goal: {}",
+                store.name(), manager.serversToShutdown.equals(currentTServers.keySet()),
+                dependentWatcher == null ? "null" : dependentWatcher.assignedOrHosted(), tls.extent,
+                state, goal);
+          }
 
           // if we are shutting down all the tabletservers, we have to do it in order
           if ((goal == TabletGoalState.SUSPENDED && state == TabletState.HOSTED)
               && manager.serversToShutdown.equals(currentTServers.keySet())) {
-            if (dependentWatcher != null && dependentWatcher.assignedOrHosted() > 0) {
+            // If the stats object in the dependentWatcher is empty, then no tablets
+            // are being reported as needing to be hosted (everything is fine). If
+            // everything is fine, or there are still assigned or hosted tablets,
+            // then we cannot suspend the tablets for this watcher yet. Continue
+            // to host these tablets.
+            if (dependentWatcher != null && (dependentWatcher.getStats().isEmpty()
+                || dependentWatcher.assignedOrHosted() > 0)) {

Review Comment:
   If dependentWatcher is null, then the following in the suggestion I made will blow up.
   
   ```java
   var stats = dependentWatcher.getStats();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#discussion_r1255713772


##########
server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java:
##########
@@ -257,11 +257,24 @@ public void run() {
           if (state == TabletState.ASSIGNED) {
             goal = TabletGoalState.HOSTED;
           }
+          if (Manager.log.isTraceEnabled()) {
+            Manager.log.trace(
+                "[{}] Shutting down all Tservers: {}, dependentCount: {} Extent: {}, state: {}, goal: {}",
+                store.name(), manager.serversToShutdown.equals(currentTServers.keySet()),
+                dependentWatcher == null ? "null" : dependentWatcher.assignedOrHosted(), tls.extent,
+                state, goal);
+          }
 
           // if we are shutting down all the tabletservers, we have to do it in order
           if ((goal == TabletGoalState.SUSPENDED && state == TabletState.HOSTED)
               && manager.serversToShutdown.equals(currentTServers.keySet())) {
-            if (dependentWatcher != null && dependentWatcher.assignedOrHosted() > 0) {
+            // If the stats object in the dependentWatcher is empty, then no tablets

Review Comment:
   Updated comment in c0b9f14



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion merged pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion merged PR #3571:
URL: https://github.com/apache/accumulo/pull/3571


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#discussion_r1254914673


##########
server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java:
##########
@@ -257,11 +257,24 @@ public void run() {
           if (state == TabletState.ASSIGNED) {
             goal = TabletGoalState.HOSTED;
           }
+          if (Manager.log.isTraceEnabled()) {
+            Manager.log.trace(
+                "[{}] Shutting down all Tservers: {}, dependentCount: {} Extent: {}, state: {}, goal: {}",
+                store.name(), manager.serversToShutdown.equals(currentTServers.keySet()),
+                dependentWatcher == null ? "null" : dependentWatcher.assignedOrHosted(), tls.extent,
+                state, goal);
+          }
 
           // if we are shutting down all the tabletservers, we have to do it in order
           if ((goal == TabletGoalState.SUSPENDED && state == TabletState.HOSTED)
               && manager.serversToShutdown.equals(currentTServers.keySet())) {
-            if (dependentWatcher != null && dependentWatcher.assignedOrHosted() > 0) {
+            // If the stats object in the dependentWatcher is empty, then no tablets
+            // are being reported as needing to be hosted (everything is fine). If
+            // everything is fine, or there are still assigned or hosted tablets,
+            // then we cannot suspend the tablets for this watcher yet. Continue
+            // to host these tablets.
+            if (dependentWatcher != null && (dependentWatcher.getStats().isEmpty()
+                || dependentWatcher.assignedOrHosted() > 0)) {

Review Comment:
   I just noticed the code suggestion I made could cause an NPE



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#discussion_r1254924464


##########
server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java:
##########
@@ -257,11 +257,24 @@ public void run() {
           if (state == TabletState.ASSIGNED) {
             goal = TabletGoalState.HOSTED;
           }
+          if (Manager.log.isTraceEnabled()) {
+            Manager.log.trace(
+                "[{}] Shutting down all Tservers: {}, dependentCount: {} Extent: {}, state: {}, goal: {}",
+                store.name(), manager.serversToShutdown.equals(currentTServers.keySet()),
+                dependentWatcher == null ? "null" : dependentWatcher.assignedOrHosted(), tls.extent,
+                state, goal);
+          }
 
           // if we are shutting down all the tabletservers, we have to do it in order
           if ((goal == TabletGoalState.SUSPENDED && state == TabletState.HOSTED)
               && manager.serversToShutdown.equals(currentTServers.keySet())) {
-            if (dependentWatcher != null && dependentWatcher.assignedOrHosted() > 0) {
+            // If the stats object in the dependentWatcher is empty, then no tablets

Review Comment:
   everything is fine == the tablet state change iterator didn't return anything == nothing needs to be done.
   
   So, in this case nothing was returned in the previous iterator of the metadata table TGW, so `stats` was empty. In this iteration of the root table TGW dependentWatcher == the metadata table TGW and `assignedOrHosted` == 0. So, this conditional is skipped and the root tablet is unloaded.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#discussion_r1256146003


##########
server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java:
##########
@@ -257,11 +257,24 @@ public void run() {
           if (state == TabletState.ASSIGNED) {
             goal = TabletGoalState.HOSTED;
           }
+          if (Manager.log.isTraceEnabled()) {
+            Manager.log.trace(
+                "[{}] Shutting down all Tservers: {}, dependentCount: {} Extent: {}, state: {}, goal: {}",
+                store.name(), manager.serversToShutdown.equals(currentTServers.keySet()),
+                dependentWatcher == null ? "null" : dependentWatcher.assignedOrHosted(), tls.extent,
+                state, goal);
+          }
 
           // if we are shutting down all the tabletservers, we have to do it in order
           if ((goal == TabletGoalState.SUSPENDED && state == TabletState.HOSTED)
               && manager.serversToShutdown.equals(currentTServers.keySet())) {
-            if (dependentWatcher != null && dependentWatcher.assignedOrHosted() > 0) {
+            // If the stats object in the dependentWatcher is empty, then no tablets

Review Comment:
   Not sure if we can do this with the stats.  Wondering if we need something new, something that indicates the TGW examined all tablets in its last pass and none were hosted.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#issuecomment-1625759501

   Wondering if it make sense for TGW to have an `unhostAndShutdown()` and `isShutdown()` method, kinda like an Executor.  Once a TGW is asked to unhost  and shutdown it gets everything unhosted and then sets isShutdown to true and stops operating.  When a TGW has a dependent it will not start trying to shutdown until its dependent is shutdown.  So we could call unhostAndShutdown() on the root TGW (it would call unhostAndShutdown() on its dependent and wait for that before it started shutting down) and then wait for that to complete. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#discussion_r1256146003


##########
server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java:
##########
@@ -257,11 +257,24 @@ public void run() {
           if (state == TabletState.ASSIGNED) {
             goal = TabletGoalState.HOSTED;
           }
+          if (Manager.log.isTraceEnabled()) {
+            Manager.log.trace(
+                "[{}] Shutting down all Tservers: {}, dependentCount: {} Extent: {}, state: {}, goal: {}",
+                store.name(), manager.serversToShutdown.equals(currentTServers.keySet()),
+                dependentWatcher == null ? "null" : dependentWatcher.assignedOrHosted(), tls.extent,
+                state, goal);
+          }
 
           // if we are shutting down all the tabletservers, we have to do it in order
           if ((goal == TabletGoalState.SUSPENDED && state == TabletState.HOSTED)
               && manager.serversToShutdown.equals(currentTServers.keySet())) {
-            if (dependentWatcher != null && dependentWatcher.assignedOrHosted() > 0) {
+            // If the stats object in the dependentWatcher is empty, then no tablets

Review Comment:
   Not sure if we can do this with the stats.  Wondering if we need something new, something that indicates the TGW examined all tablets in its last pass and none were hosted or assigned.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#discussion_r1256153145


##########
server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java:
##########
@@ -257,11 +257,24 @@ public void run() {
           if (state == TabletState.ASSIGNED) {
             goal = TabletGoalState.HOSTED;
           }
+          if (Manager.log.isTraceEnabled()) {
+            Manager.log.trace(
+                "[{}] Shutting down all Tservers: {}, dependentCount: {} Extent: {}, state: {}, goal: {}",
+                store.name(), manager.serversToShutdown.equals(currentTServers.keySet()),
+                dependentWatcher == null ? "null" : dependentWatcher.assignedOrHosted(), tls.extent,
+                state, goal);
+          }
 
           // if we are shutting down all the tabletservers, we have to do it in order
           if ((goal == TabletGoalState.SUSPENDED && state == TabletState.HOSTED)
               && manager.serversToShutdown.equals(currentTServers.keySet())) {
-            if (dependentWatcher != null && dependentWatcher.assignedOrHosted() > 0) {
+            // If the stats object in the dependentWatcher is empty, then no tablets

Review Comment:
   Wondering if we should consider time.  Like should we inspect the time of the dependent TGW last full pass relative to the current TGW time of a last pass.  Not sure exactly what needs to be done with the comparison though, just seems like we want to make sure the stats from the dependent TGW are fresh.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3571: 3512 tserver not shutting down

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#discussion_r1254660607


##########
test/src/main/java/org/apache/accumulo/test/CloseScannerIT.java:
##########
@@ -71,6 +93,104 @@ public void testManyScans() throws Exception {
     }
   }
 
+  @Test
+  public void testIsolatedScannerPreventsTServerShutdown() throws Exception {
+
+    // 2 TabletServers started for this test, shut them down so we only have 1.
+    getClusterControl().stopAllServers(ServerType.TABLET_SERVER);
+    ((MiniAccumuloClusterControl) getClusterControl()).start(ServerType.TABLET_SERVER,
+        Collections.emptyMap(), 1);
+
+    String tableName = getUniqueNames(1)[0];
+
+    try (AccumuloClient client = Accumulo.newClient().from(getClientProps()).build()) {
+
+      Wait.waitFor(() -> client.instanceOperations().getTabletServers().size() == 1);
+
+      // Thread.sleep(120_000);

Review Comment:
   remove



##########
test/src/main/java/org/apache/accumulo/test/CloseScannerIT.java:
##########
@@ -71,6 +93,104 @@ public void testManyScans() throws Exception {
     }
   }
 
+  @Test
+  public void testIsolatedScannerPreventsTServerShutdown() throws Exception {

Review Comment:
   I think this test is likely poorly named and in the wrong location now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#issuecomment-1624322099

   > do you know if the TabletStateChangeIterator will return everything when the manager is shutting down tservers? Wondering if we should be expect the TGW stats to cover all tablets in this case
   
   The TabletStateChangeIterator will return all tablets if the ManagerState is not normal [ref](https://github.com/apache/accumulo/blob/main/server/base/src/main/java/org/apache/accumulo/server/manager/state/TabletStateChangeIterator.java#L170). In this test, the Manager is likely normal as we are not shutting everything down, I'm just shutting down the tablet servers with everything else up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#discussion_r1256281424


##########
server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java:
##########
@@ -257,11 +257,24 @@ public void run() {
           if (state == TabletState.ASSIGNED) {
             goal = TabletGoalState.HOSTED;
           }
+          if (Manager.log.isTraceEnabled()) {
+            Manager.log.trace(
+                "[{}] Shutting down all Tservers: {}, dependentCount: {} Extent: {}, state: {}, goal: {}",
+                store.name(), manager.serversToShutdown.equals(currentTServers.keySet()),
+                dependentWatcher == null ? "null" : dependentWatcher.assignedOrHosted(), tls.extent,
+                state, goal);
+          }
 
           // if we are shutting down all the tabletservers, we have to do it in order
           if ((goal == TabletGoalState.SUSPENDED && state == TabletState.HOSTED)
               && manager.serversToShutdown.equals(currentTServers.keySet())) {
-            if (dependentWatcher != null && dependentWatcher.assignedOrHosted() > 0) {
+            // If the stats object in the dependentWatcher is empty, then no tablets

Review Comment:
   I have a solution for the no user tables situation, will commit shortly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#discussion_r1256307872


##########
server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java:
##########
@@ -257,11 +257,24 @@ public void run() {
           if (state == TabletState.ASSIGNED) {
             goal = TabletGoalState.HOSTED;
           }
+          if (Manager.log.isTraceEnabled()) {
+            Manager.log.trace(
+                "[{}] Shutting down all Tservers: {}, dependentCount: {} Extent: {}, state: {}, goal: {}",
+                store.name(), manager.serversToShutdown.equals(currentTServers.keySet()),
+                dependentWatcher == null ? "null" : dependentWatcher.assignedOrHosted(), tls.extent,
+                state, goal);
+          }
 
           // if we are shutting down all the tabletservers, we have to do it in order
           if ((goal == TabletGoalState.SUSPENDED && state == TabletState.HOSTED)
               && manager.serversToShutdown.equals(currentTServers.keySet())) {
-            if (dependentWatcher != null && dependentWatcher.assignedOrHosted() > 0) {
+            // If the stats object in the dependentWatcher is empty, then no tablets

Review Comment:
   Added in 7aef709



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#issuecomment-1624310985

    > How long does the new IT take to run?
    
    Just a few minutes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#discussion_r1256048099


##########
server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java:
##########
@@ -257,11 +257,24 @@ public void run() {
           if (state == TabletState.ASSIGNED) {
             goal = TabletGoalState.HOSTED;
           }
+          if (Manager.log.isTraceEnabled()) {
+            Manager.log.trace(
+                "[{}] Shutting down all Tservers: {}, dependentCount: {} Extent: {}, state: {}, goal: {}",
+                store.name(), manager.serversToShutdown.equals(currentTServers.keySet()),
+                dependentWatcher == null ? "null" : dependentWatcher.assignedOrHosted(), tls.extent,
+                state, goal);
+          }
 
           // if we are shutting down all the tabletservers, we have to do it in order
           if ((goal == TabletGoalState.SUSPENDED && state == TabletState.HOSTED)
               && manager.serversToShutdown.equals(currentTServers.keySet())) {
-            if (dependentWatcher != null && dependentWatcher.assignedOrHosted() > 0) {
+            // If the stats object in the dependentWatcher is empty, then no tablets

Review Comment:
   > If you look at how they are [constructed](https://github.com/dlmarion/accumulo/blob/3512-tserver-not-shutting-down/server/manager/src/main/java/org/apache/accumulo/manager/Manager.java#L1140), the user TGW does not have a dependent watcher.
   
   I was thinking about the metadata tablets watcher which has user as a dependent.  I was wondering if the user watcher has zero tablets if its stats would always be empty meaning the metadata tablets would never unhost.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#discussion_r1256054706


##########
server/manager/src/main/java/org/apache/accumulo/manager/TabletGroupWatcher.java:
##########
@@ -257,11 +257,24 @@ public void run() {
           if (state == TabletState.ASSIGNED) {
             goal = TabletGoalState.HOSTED;
           }
+          if (Manager.log.isTraceEnabled()) {
+            Manager.log.trace(
+                "[{}] Shutting down all Tservers: {}, dependentCount: {} Extent: {}, state: {}, goal: {}",
+                store.name(), manager.serversToShutdown.equals(currentTServers.keySet()),
+                dependentWatcher == null ? "null" : dependentWatcher.assignedOrHosted(), tls.extent,
+                state, goal);
+          }
 
           // if we are shutting down all the tabletservers, we have to do it in order
           if ((goal == TabletGoalState.SUSPENDED && state == TabletState.HOSTED)
               && manager.serversToShutdown.equals(currentTServers.keySet())) {
-            if (dependentWatcher != null && dependentWatcher.assignedOrHosted() > 0) {
+            // If the stats object in the dependentWatcher is empty, then no tablets

Review Comment:
   ah, no user tables. The stats would likely be empty. I'll add a test for that and see if I can resolve that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#discussion_r1255729134


##########
test/src/main/java/org/apache/accumulo/test/CloseScannerIT.java:
##########
@@ -71,6 +93,104 @@ public void testManyScans() throws Exception {
     }
   }
 
+  @Test
+  public void testIsolatedScannerPreventsTServerShutdown() throws Exception {

Review Comment:
   Relocated IT to ManagerAssignmentIT and renamed the test method in c91c244



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on pull request #3571: Fix TabletGroupWatcher shutdown order

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on PR #3571:
URL: https://github.com/apache/accumulo/pull/3571#issuecomment-1625848570

   > Wondering if it make sense for TGW to have an `unhostAndShutdown()` and `isShutdown()` method, kinda like an Executor. Once a TGW is asked to unhost and shutdown it gets everything unhosted and then sets isShutdown to true and stops operating. When a TGW has a dependent it will not start trying to shutdown until its dependent is shutdown. So we could call unhostAndShutdown() on the root TGW (it would call unhostAndShutdown() on its dependent and wait for that before it started shutting down) and then wait for that to complete.
   
   I think that makes sense in the case where the Manager is shutting down, as the TGW's run in the Manager. But, in that case, all tablet information is going to get returned as the ManagerState would not be NORMAL. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org