You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2020/07/10 21:01:54 UTC

[GitHub] [accumulo] friedlou opened a new pull request #1656: Check if retries is set to 0 before retrying. #1655

friedlou opened a new pull request #1656:
URL: https://github.com/apache/accumulo/pull/1656


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on pull request #1656: Check if retries is set to 0 before retrying. #1655

Posted by GitBox <gi...@apache.org>.
milleruntime commented on pull request #1656:
URL: https://github.com/apache/accumulo/pull/1656#issuecomment-686515285


   > I'm not sure I buy in to the value of not retrying at all, but I think I'm willing to support the use case. @milleruntime , since you previously commented (and understand the legacy bulk import code better than me), what do you think?
   
   I agree.  I am not sure this change is worth tinkering with the legacy code. I would like to know more as to why there is a need to only attempt the bulk import once.  To me, something strange is going on with the cluster since it sounds like the operation is no idempotent if it can't be retried.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on pull request #1656: Check if retries is set to 0 before retrying. #1655

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on pull request #1656:
URL: https://github.com/apache/accumulo/pull/1656#issuecomment-675686869


   I'm not sure I buy in to the value of not retrying at all, but I think I'm willing to support the use case. @milleruntime , since you previously commented (and understand the legacy bulk import code better than me), what do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on pull request #1656: Check if retries is set to 0 before retrying. #1655

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on pull request #1656:
URL: https://github.com/apache/accumulo/pull/1656#issuecomment-665405809


   @friedlou What does this change accomplish? There is no description of the change that helps us understand what it is trying to accomplish. From what I can tell, the most obvious consequence is fewer log messages: one for all failures, rather than individual messages for each failure in the loop that is skipped later, perhaps? I'm not sure that's much of an improvement, especially since, as @milleruntime says, it only affects the legacy bulk import. Am I missing something? Is there another benefit to this change?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ivakegg commented on pull request #1656: Check if retries is set to 0 before retrying. #1655

Posted by GitBox <gi...@apache.org>.
ivakegg commented on pull request #1656:
URL: https://github.com/apache/accumulo/pull/1656#issuecomment-795906094


   > @friedlou What does this change accomplish? There is no description of the change that helps us understand what it is trying to accomplish. From what I can tell, the most obvious consequence is fewer log messages: one for all failures, rather than individual messages for each failure in the loop that is skipped later, perhaps? I'm not sure that's much of an improvement, especially since, as @milleruntime says, it only affects the legacy bulk import. Am I missing something? Is there another benefit to this change?
   
   The main reason for this is that we have timing issues associated with bulk loading where we need to ensure that the master waits long enough for the tserver to finish its retries. We have multiple situations that occur when tservers fail to assign things in a timely fashion.  The most nasty of those is where the garbage collector has removed the file because all of the other tablets have successfully assigned and major compacted away.  Meanwhile a tablet doing retries finally assigns the file and finds it gone.  When we have retries if becomes difficult to bound the time so we were attempting to set the retries to 0 only to find out that it still retried once.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on pull request #1656: Check if retries is set to 0 before retrying. #1655

Posted by GitBox <gi...@apache.org>.
milleruntime commented on pull request #1656:
URL: https://github.com/apache/accumulo/pull/1656#issuecomment-658220899


   @friedlou thanks for the contribution to the project.  That property is only used by the legacy bulk import technique, which has been deprecated.  Do you have a use case where you do not want to retry?  As misleading as the property is named, typically in a production cluster using the legacy bulk import, there will be the need for multiple retries.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1656: Check if retries is set to 0 before retrying. #1655

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1656:
URL: https://github.com/apache/accumulo/pull/1656#discussion_r472196668



##########
File path: server/base/src/main/java/org/apache/accumulo/server/client/BulkImporter.java
##########
@@ -175,11 +175,17 @@ public void run() {
 
       Map<Path,Integer> failureCount = new TreeMap<>();
 
-      for (Entry<Path,List<KeyExtent>> entry : assignmentFailures.entrySet())
-        failureCount.put(entry.getKey(), 1);
+      int retries = context.getConfiguration().getCount(Property.TSERV_BULK_RETRY);
+      if (retries == 0) {
+        log.warn("Retries set to 0. All failed map file assignments will not be retried.");
+        completeFailures.putAll(assignmentFailures);
+      } else {
+        for (Entry<Path,List<KeyExtent>> entry : assignmentFailures.entrySet())
+          failureCount.put(entry.getKey(), 1);
+      }
 
       long sleepTime = 2 * 1000;
-      while (!assignmentFailures.isEmpty()) {
+      while (!assignmentFailures.isEmpty() && retries > 0) {

Review comment:
       Rather than have this additional check for retries here, I think it makes more sense to put this entire block inside the else statement above. Yes, it makes the diff bigger, but it's easy to ignore indentation changes when reviewing a diff.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1656: Check if retries is set to 0 before retrying. #1655

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1656:
URL: https://github.com/apache/accumulo/pull/1656#discussion_r472450435



##########
File path: server/base/src/main/java/org/apache/accumulo/server/client/BulkImporter.java
##########
@@ -175,78 +175,85 @@ public void run() {
 
       Map<Path,Integer> failureCount = new TreeMap<>();
 
-      for (Entry<Path,List<KeyExtent>> entry : assignmentFailures.entrySet())
-        failureCount.put(entry.getKey(), 1);
-
-      long sleepTime = 2 * 1000;
-      while (!assignmentFailures.isEmpty()) {
-        sleepTime = Math.min(sleepTime * 2, 60 * 1000);
-        locator.invalidateCache();
-        // assumption about assignment failures is that it caused by a split
-        // happening or a missing location
-        //
-        // for splits we need to find children key extents that cover the
-        // same key range and are contiguous (no holes, no overlap)
-
-        timer.start(Timers.SLEEP);
-        sleepUninterruptibly(sleepTime, TimeUnit.MILLISECONDS);
-        timer.stop(Timers.SLEEP);
-
-        log.debug("Trying to assign {} map files that previously failed on some key extents",
-            assignmentFailures.size());
-        assignments.clear();
-
-        // for failed key extents, try to find children key extents to
-        // assign to
+      int retries = context.getConfiguration().getCount(Property.TSERV_BULK_RETRY);
+      if (retries == 0) {
+        log.warn("Retries set to 0. All failed map file assignments will not be retried.");

Review comment:
       Also, this should probably be logged at `ERROR` instead of `WARN`, since this log message is replacing the individual messages for each file, which currently logs at `ERROR`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] friedlou commented on pull request #1656: Check if retries is set to 0 before retrying. #1655

Posted by GitBox <gi...@apache.org>.
friedlou commented on pull request #1656:
URL: https://github.com/apache/accumulo/pull/1656#issuecomment-661300306


   @milleruntime Our government client has multiple large cluster instances of accumulo. We found it better to let the bulk import fail on the first try and have ops replay the bulk import. Multiple tries have produced bogus missing file errors. We would like to eliminate those errors. We find this process less error prone. There are no plans to upgrade anytime soon. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on pull request #1656: Check if retries is set to 0 before retrying. #1655

Posted by GitBox <gi...@apache.org>.
milleruntime commented on pull request #1656:
URL: https://github.com/apache/accumulo/pull/1656#issuecomment-797635529


   I am not convinced setting retries to 0 will necessarily improve your situation as that could lead to a lot more bulk import failures. But I agree that the intuitive behavior of setting "tserver.bulk.retry.max" to 0 should be that it won't retry. I think this would be OK as a bug fix for 1.10. @friedlou is that where you meant this to go? Otherwise, you are just making changes to the deprecated code in 2.1.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] friedlou commented on a change in pull request #1656: Check if retries is set to 0 before retrying. #1655

Posted by GitBox <gi...@apache.org>.
friedlou commented on a change in pull request #1656:
URL: https://github.com/apache/accumulo/pull/1656#discussion_r473363526



##########
File path: server/base/src/main/java/org/apache/accumulo/server/client/BulkImporter.java
##########
@@ -175,78 +175,85 @@ public void run() {
 
       Map<Path,Integer> failureCount = new TreeMap<>();
 
-      for (Entry<Path,List<KeyExtent>> entry : assignmentFailures.entrySet())
-        failureCount.put(entry.getKey(), 1);
-
-      long sleepTime = 2 * 1000;
-      while (!assignmentFailures.isEmpty()) {
-        sleepTime = Math.min(sleepTime * 2, 60 * 1000);
-        locator.invalidateCache();
-        // assumption about assignment failures is that it caused by a split
-        // happening or a missing location
-        //
-        // for splits we need to find children key extents that cover the
-        // same key range and are contiguous (no holes, no overlap)
-
-        timer.start(Timers.SLEEP);
-        sleepUninterruptibly(sleepTime, TimeUnit.MILLISECONDS);
-        timer.stop(Timers.SLEEP);
-
-        log.debug("Trying to assign {} map files that previously failed on some key extents",
-            assignmentFailures.size());
-        assignments.clear();
-
-        // for failed key extents, try to find children key extents to
-        // assign to
+      int retries = context.getConfiguration().getCount(Property.TSERV_BULK_RETRY);
+      if (retries == 0) {
+        log.warn("Retries set to 0. All failed map file assignments will not be retried.");

Review comment:
       Changed in latest commit.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #1656: Check if retries is set to 0 before retrying. #1655

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #1656:
URL: https://github.com/apache/accumulo/pull/1656#discussion_r472447456



##########
File path: server/base/src/main/java/org/apache/accumulo/server/client/BulkImporter.java
##########
@@ -175,78 +175,85 @@ public void run() {
 
       Map<Path,Integer> failureCount = new TreeMap<>();
 
-      for (Entry<Path,List<KeyExtent>> entry : assignmentFailures.entrySet())
-        failureCount.put(entry.getKey(), 1);
-
-      long sleepTime = 2 * 1000;
-      while (!assignmentFailures.isEmpty()) {
-        sleepTime = Math.min(sleepTime * 2, 60 * 1000);
-        locator.invalidateCache();
-        // assumption about assignment failures is that it caused by a split
-        // happening or a missing location
-        //
-        // for splits we need to find children key extents that cover the
-        // same key range and are contiguous (no holes, no overlap)
-
-        timer.start(Timers.SLEEP);
-        sleepUninterruptibly(sleepTime, TimeUnit.MILLISECONDS);
-        timer.stop(Timers.SLEEP);
-
-        log.debug("Trying to assign {} map files that previously failed on some key extents",
-            assignmentFailures.size());
-        assignments.clear();
-
-        // for failed key extents, try to find children key extents to
-        // assign to
+      int retries = context.getConfiguration().getCount(Property.TSERV_BULK_RETRY);
+      if (retries == 0) {
+        log.warn("Retries set to 0. All failed map file assignments will not be retried.");

Review comment:
       Minor grammatical improvement. (Kein vs. Nicht)
   
   ```suggestion
           log.warn("Retries set to 0. No failed map file assignments will be retried.");
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on pull request #1656: Check if retries is set to 0 before retrying. #1655

Posted by GitBox <gi...@apache.org>.
milleruntime commented on pull request #1656:
URL: https://github.com/apache/accumulo/pull/1656#issuecomment-797648755


   Also, this may no longer be a problem with the new bulk import in 2.x (or a different problem altogether since the new one will keep retrying to load the files asynchronously).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] friedlou commented on a change in pull request #1656: Check if retries is set to 0 before retrying. #1655

Posted by GitBox <gi...@apache.org>.
friedlou commented on a change in pull request #1656:
URL: https://github.com/apache/accumulo/pull/1656#discussion_r472428316



##########
File path: server/base/src/main/java/org/apache/accumulo/server/client/BulkImporter.java
##########
@@ -175,11 +175,17 @@ public void run() {
 
       Map<Path,Integer> failureCount = new TreeMap<>();
 
-      for (Entry<Path,List<KeyExtent>> entry : assignmentFailures.entrySet())
-        failureCount.put(entry.getKey(), 1);
+      int retries = context.getConfiguration().getCount(Property.TSERV_BULK_RETRY);
+      if (retries == 0) {
+        log.warn("Retries set to 0. All failed map file assignments will not be retried.");
+        completeFailures.putAll(assignmentFailures);
+      } else {
+        for (Entry<Path,List<KeyExtent>> entry : assignmentFailures.entrySet())
+          failureCount.put(entry.getKey(), 1);
+      }
 
       long sleepTime = 2 * 1000;
-      while (!assignmentFailures.isEmpty()) {
+      while (!assignmentFailures.isEmpty() && retries > 0) {

Review comment:
       Modifications made as suggested in the latest commit. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org