You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/07/22 07:35:22 UTC

[GitHub] [hive] hmangla98 opened a new pull request #2516: HIVE-25330: Make FS calls in CopyUtils retryable

hmangla98 opened a new pull request #2516:
URL: https://github.com/apache/hive/pull/2516


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha merged pull request #2516: HIVE-25330: Make FS calls in CopyUtils retryable

Posted by GitBox <gi...@apache.org>.
aasha merged pull request #2516:
URL: https://github.com/apache/hive/pull/2516


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] ayushtkn commented on a change in pull request #2516: HIVE-25330: Make FS calls in CopyUtils retryable

Posted by GitBox <gi...@apache.org>.
ayushtkn commented on a change in pull request #2516:
URL: https://github.com/apache/hive/pull/2516#discussion_r713671862



##########
File path: ql/src/test/org/apache/hadoop/hive/ql/parse/repl/TestCopyUtils.java
##########
@@ -110,6 +112,100 @@ public void shouldThrowExceptionOnDistcpFailure() throws Exception {
     copyUtils.doCopy(destination, srcPaths);
   }
 
+  @Test
+  public void testFSCallsFailOnParentExceptions() throws Exception {
+    mockStatic(UserGroupInformation.class);
+    mockStatic(ReplChangeManager.class);
+    when(UserGroupInformation.getCurrentUser()).thenReturn(mock(UserGroupInformation.class));
+    HiveConf conf = mock(HiveConf.class);
+    conf.set(HiveConf.ConfVars.REPL_RETRY_INTIAL_DELAY.varname, "1s");
+    FileSystem fs = mock(FileSystem.class);
+    Path source = mock(Path.class);
+    Path destination = mock(Path.class);
+    ContentSummary cs = mock(ContentSummary.class);
+
+    Exception exception = new org.apache.hadoop.fs.PathPermissionException("Failed");
+    when(ReplChangeManager.checksumFor(source, fs)).thenThrow(exception).thenReturn("dummy");
+    when(fs.exists(same(source))).thenThrow(exception).thenReturn(true);
+    when(fs.delete(same(source), anyBoolean())).thenThrow(exception).thenReturn(true);
+    when(fs.mkdirs(same(source))).thenThrow(exception).thenReturn(true);
+    when(fs.rename(same(source), same(destination))).thenThrow(exception).thenReturn(true);
+    when(fs.getContentSummary(same(source))).thenThrow(exception).thenReturn(cs);
+
+    CopyUtils copyUtils = new CopyUtils(UserGroupInformation.getCurrentUser().getUserName(), conf, fs);
+    CopyUtils copyUtilsSpy = Mockito.spy(copyUtils);
+    try {
+      copyUtilsSpy.exists(fs, source);
+    } catch (Exception e) {
+      assertEquals(exception.getClass(), e.getCause().getClass());
+    }
+    Mockito.verify(fs, Mockito.times(1)).exists(source);
+    try {

Review comment:
       Check if you can use
   ``LambdaTestUtils.intercept`` instead of try-catch




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] hmangla98 commented on a change in pull request #2516: HIVE-25330: Make FS calls in CopyUtils retryable

Posted by GitBox <gi...@apache.org>.
hmangla98 commented on a change in pull request #2516:
URL: https://github.com/apache/hive/pull/2516#discussion_r705371253



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/util/Retryable.java
##########
@@ -166,6 +169,25 @@ public synchronized Builder withRetryOnExceptionList(final List<Class<? extends
       return this;
     }
 
+    public synchronized Builder withFailOnParentException(final Class<? extends Exception> exceptionClass) {
+      if (exceptionClass != null &&
+              runnable.failOnParentExceptions.stream().noneMatch(k -> exceptionClass.equals(k))) {
+        runnable.failOnParentExceptions.add(exceptionClass);
+      }
+      return this;
+    }
+
+    public synchronized Builder withFailOnParentExceptionList(final List<Class<?
+            extends Exception>> exceptionClassList) {
+      for (final Class<? extends Exception> exceptionClass : exceptionClassList) {
+        if (exceptionClass != null &&
+                runnable.failOnParentExceptions.stream().noneMatch(k -> exceptionClass.equals(k))) {
+          runnable.failOnParentExceptions.add(exceptionClass);
+        }

Review comment:
       we are using the same data structure to maintain retryOnExceptionList and failOnExceptionList also.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] ayushtkn commented on a change in pull request #2516: HIVE-25330: Make FS calls in CopyUtils retryable

Posted by GitBox <gi...@apache.org>.
ayushtkn commented on a change in pull request #2516:
URL: https://github.com/apache/hive/pull/2516#discussion_r705078639



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/util/Retryable.java
##########
@@ -166,6 +169,25 @@ public synchronized Builder withRetryOnExceptionList(final List<Class<? extends
       return this;
     }
 
+    public synchronized Builder withFailOnParentException(final Class<? extends Exception> exceptionClass) {
+      if (exceptionClass != null &&
+              runnable.failOnParentExceptions.stream().noneMatch(k -> exceptionClass.equals(k))) {
+        runnable.failOnParentExceptions.add(exceptionClass);
+      }
+      return this;
+    }
+
+    public synchronized Builder withFailOnParentExceptionList(final List<Class<?
+            extends Exception>> exceptionClassList) {
+      for (final Class<? extends Exception> exceptionClass : exceptionClassList) {
+        if (exceptionClass != null &&
+                runnable.failOnParentExceptions.stream().noneMatch(k -> exceptionClass.equals(k))) {
+          runnable.failOnParentExceptions.add(exceptionClass);
+        }

Review comment:
       Rather than checking for a unique entry ourselves, can we not use a set instead of a list?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] ayushtkn commented on a change in pull request #2516: HIVE-25330: Make FS calls in CopyUtils retryable

Posted by GitBox <gi...@apache.org>.
ayushtkn commented on a change in pull request #2516:
URL: https://github.com/apache/hive/pull/2516#discussion_r677418736



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##########
@@ -297,21 +343,21 @@ public void renameFileCopiedFromCmPath(Path toPath, FileSystem dstFs, List<ReplC
       String destFileName = srcFile.getCmPath().getName();
       Path destRoot = CopyUtils.getCopyDestination(srcFile, toPath);
       Path destFile = new Path(destRoot, destFileName);
-      if (dstFs.exists(destFile)) {
+      if (exists(dstFs, destFile)) {
         String destFileWithSourceName = srcFile.getSourcePath().getName();
         Path newDestFile = new Path(destRoot, destFileWithSourceName);
 
         // if the new file exist then delete it before renaming, to avoid rename failure. If the copy is done
         // directly to table path (bypassing staging directory) then there might be some stale files from previous
         // incomplete/failed load. No need of recycle as this is a case of stale file.
         try {
-          dstFs.delete(newDestFile, true);
+          delete(dstFs, newDestFile, true);
           LOG.debug(" file " + newDestFile + " is deleted before renaming");
         } catch (FileNotFoundException e) {
           // no problem
         }

Review comment:
       Post this change, Will we get `FileNotFoundException`? `FNF` is part of `IOException` it will be retried, hence in the end, it will throw `IOException` with `REPL_FILE_SYSTEM_OPERATION_RETRY ` which won't get caught.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##########
@@ -76,6 +76,57 @@ public CopyUtils(String distCpDoAsUser, HiveConf hiveConf, FileSystem destinatio
     this.destinationFs = destinationFs;
   }
 
+  private <T> T retryableFxn(Callable<T> callable) throws IOException {
+    Retryable retryable = Retryable.builder()
+            .withHiveConf(hiveConf)
+            .withRetryOnException(IOException.class).build();

Review comment:
       Retrying all `IOE` might not be apt, We might land up retrying doing nothing, as in case of `FileNotFoundException`, it can not get solved even in 10 iteration, So is an `IOE` in case of FileSystem Closed, even if we retry. the result will be same.
   Some example Exceptions which we can include-
   ``ConnectException`` ``EOFException`` ``ConnectTimeoutException`` ``StandbyException`` ``SafemodeException`` ``NoRouteToHostException`` ``SocketException`` and ``RetriableException``

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##########
@@ -333,7 +379,7 @@ private boolean isSourceFileMismatch(FileSystem sourceFs, ReplChangeManager.File
         } catch (IOException e) {

Review comment:
       ``` 
   ReplChangeManager.checksumFor(srcFile.getSourcePath(), sourceFs);
   ```
   This internally calls `` FileChecksum checksum = fs.getFileChecksum(path);``, can make ``ReplChangeManager.checksumFor(srcFile.getSourcePath(), sourceFs);`` also retryable 

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##########
@@ -190,11 +240,12 @@ private void doCopyRetry(FileSystem sourceFs, List<ReplChangeManager.FileInfo> s
         // If copy fails, fall through the retry logic
         LOG.info("file operation failed", e);
 
-        if (repeat >= (MAX_IO_RETRY - 1)) {
-          //no need to wait in the last iteration
+        if (repeat >= (MAX_IO_RETRY - 1)
+                || e.getMessage().equals(ErrorMsg.REPL_FILE_SYSTEM_OPERATION_RETRY.format(e.getCause().getMessage()))) {
+          //Don't retry if this is last iteration or retry is already exhausted by FS operations.
           break;
         }
-
+        closeAllForUGI((proxyUser == null) ? Utils.getUGI() : proxyUser);

Review comment:
       Now we closed the filesystem here. Below we recreate the filesystem only if exception is not `FNF` What will happen in case of a ``FileNotFoundException`` the filesystem will get closed but won't get recreated? 

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##########
@@ -145,14 +196,13 @@ ExecutorService getExecutorService() {
   @VisibleForTesting
   void doCopy(Map.Entry<Path, List<ReplChangeManager.FileInfo>> destMapEntry, UserGroupInformation proxyUser,
                       boolean useRegularCopy, boolean overwrite) throws IOException, LoginException,
-    HiveFatalException {
+          HiveFatalException {

Review comment:
       nit:
   Avoid indentation change




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] ayushtkn commented on a change in pull request #2516: HIVE-25330: Make FS calls in CopyUtils retryable

Posted by GitBox <gi...@apache.org>.
ayushtkn commented on a change in pull request #2516:
URL: https://github.com/apache/hive/pull/2516#discussion_r697128887



##########
File path: ql/src/test/org/apache/hadoop/hive/ql/parse/repl/TestCopyUtils.java
##########
@@ -110,6 +112,40 @@ public void shouldThrowExceptionOnDistcpFailure() throws Exception {
     copyUtils.doCopy(destination, srcPaths);
   }
 
+  @Test
+  public void testRetryableFSCalls() throws Exception {
+    mockStatic(UserGroupInformation.class);
+    mockStatic(ReplChangeManager.class);
+    when(UserGroupInformation.getCurrentUser()).thenReturn(mock(UserGroupInformation.class));
+    HiveConf conf = mock(HiveConf.class);
+    conf.set(HiveConf.ConfVars.REPL_RETRY_INTIAL_DELAY.varname, "1s");
+    FileSystem fs = mock(FileSystem.class);
+    Path source = mock(Path.class);
+    Path destination = mock(Path.class);
+    ContentSummary cs = mock(ContentSummary.class);
+
+    when(ReplChangeManager.checksumFor(source, fs)).thenThrow(new IOException("Failed")).thenReturn("dummy");
+    when(fs.exists(same(source))).thenThrow(new IOException("Failed")).thenReturn(true);
+    when(fs.delete(same(source), anyBoolean())).thenThrow(new IOException("Failed")).thenReturn(true);
+    when(fs.mkdirs(same(source))).thenThrow(new IOException("Failed")).thenReturn(true);
+    when(fs.rename(same(source), same(destination))).thenThrow(new IOException("Failed")).thenReturn(true);
+    when(fs.getContentSummary(same(source))).thenThrow(new IOException("Failed")).thenReturn(cs);
+
+    CopyUtils copyUtils = new CopyUtils(UserGroupInformation.getCurrentUser().getUserName(), conf, fs);
+    CopyUtils copyUtilsSpy = Mockito.spy(copyUtils);
+    assertEquals (copyUtilsSpy.exists(fs, source), true);

Review comment:
       change to `assertTrue` similarly for the others as well

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##########
@@ -66,6 +67,16 @@
   private FileSystem destinationFs;
   private final int maxParallelCopyTask;
 
+  private List<Class<? extends Exception>> failOnExceptions = Arrays.asList(org.apache.hadoop.fs.PathIOException.class,
+          org.apache.hadoop.fs.UnsupportedFileSystemException.class,
+          org.apache.hadoop.fs.InvalidPathException.class,
+          org.apache.hadoop.fs.InvalidRequestException.class,
+          org.apache.hadoop.fs.FileAlreadyExistsException.class,
+          org.apache.hadoop.fs.ChecksumException.class,
+          org.apache.hadoop.fs.ParentNotDirectoryException.class,
+          org.apache.hadoop.hdfs.protocol.NSQuotaExceededException.class,

Review comment:
       We can include other quota exceptions also, say the children and grandchildren of `ClusterStorageCapacityExceededException` or directly `ClusterStorageCapacityExceededException` if retryable function can take the parent class and block its children as well.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##########
@@ -190,11 +247,14 @@ private void doCopyRetry(FileSystem sourceFs, List<ReplChangeManager.FileInfo> s
         // If copy fails, fall through the retry logic
         LOG.info("file operation failed", e);
 
-        if (repeat >= (MAX_IO_RETRY - 1)) {
-          //no need to wait in the last iteration
+        if (repeat >= (MAX_IO_RETRY - 1) || failOnExceptions.stream().anyMatch(k -> e.getClass().equals(k))
+                || ErrorMsg.REPL_FILE_SYSTEM_OPERATION_RETRY.getMsg().equals(e.getMessage())) {
+          //Don't retry in the following cases:

Review comment:
       pull the comment above the if statement

##########
File path: ql/src/test/org/apache/hadoop/hive/ql/parse/repl/TestCopyUtils.java
##########
@@ -110,6 +112,40 @@ public void shouldThrowExceptionOnDistcpFailure() throws Exception {
     copyUtils.doCopy(destination, srcPaths);
   }
 
+  @Test
+  public void testRetryableFSCalls() throws Exception {
+    mockStatic(UserGroupInformation.class);
+    mockStatic(ReplChangeManager.class);
+    when(UserGroupInformation.getCurrentUser()).thenReturn(mock(UserGroupInformation.class));
+    HiveConf conf = mock(HiveConf.class);
+    conf.set(HiveConf.ConfVars.REPL_RETRY_INTIAL_DELAY.varname, "1s");
+    FileSystem fs = mock(FileSystem.class);
+    Path source = mock(Path.class);
+    Path destination = mock(Path.class);
+    ContentSummary cs = mock(ContentSummary.class);
+
+    when(ReplChangeManager.checksumFor(source, fs)).thenThrow(new IOException("Failed")).thenReturn("dummy");
+    when(fs.exists(same(source))).thenThrow(new IOException("Failed")).thenReturn(true);
+    when(fs.delete(same(source), anyBoolean())).thenThrow(new IOException("Failed")).thenReturn(true);
+    when(fs.mkdirs(same(source))).thenThrow(new IOException("Failed")).thenReturn(true);
+    when(fs.rename(same(source), same(destination))).thenThrow(new IOException("Failed")).thenReturn(true);
+    when(fs.getContentSummary(same(source))).thenThrow(new IOException("Failed")).thenReturn(cs);
+
+    CopyUtils copyUtils = new CopyUtils(UserGroupInformation.getCurrentUser().getUserName(), conf, fs);
+    CopyUtils copyUtilsSpy = Mockito.spy(copyUtils);
+    assertEquals (copyUtilsSpy.exists(fs, source), true);
+    Mockito.verify(fs, Mockito.times(2)).exists(source);
+    assertEquals (copyUtils.delete(fs, source, true), true);
+    Mockito.verify(fs, Mockito.times(2)).delete(source, true);
+    assertEquals (copyUtils.mkdirs(fs, source), true);
+    Mockito.verify(fs, Mockito.times(2)).mkdirs(source);
+    assertEquals (copyUtils.rename(fs, source, destination), true);
+    Mockito.verify(fs, Mockito.times(2)).rename(source, destination);
+    assertEquals (copyUtilsSpy.getContentSummary(fs, source), cs);
+    Mockito.verify(fs, Mockito.times(2)).getContentSummary(source);
+    assertEquals (copyUtilsSpy.checkSumFor(source, fs), "dummy");

Review comment:
       flip the entries, say ``assertEquals ("dummy", copyUtilsSpy.checkSumFor(source, fs));``
   in case of ``assertEquals`` the expected goes first and the second arg is the actual value. :-)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] ayushtkn commented on a change in pull request #2516: HIVE-25330: Make FS calls in CopyUtils retryable

Posted by GitBox <gi...@apache.org>.
ayushtkn commented on a change in pull request #2516:
URL: https://github.com/apache/hive/pull/2516#discussion_r697127775



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##########
@@ -190,11 +247,14 @@ private void doCopyRetry(FileSystem sourceFs, List<ReplChangeManager.FileInfo> s
         // If copy fails, fall through the retry logic
         LOG.info("file operation failed", e);
 
-        if (repeat >= (MAX_IO_RETRY - 1)) {
-          //no need to wait in the last iteration
+        if (repeat >= (MAX_IO_RETRY - 1) || failOnExceptions.stream().anyMatch(k -> e.getClass().equals(k))
+                || ErrorMsg.REPL_FILE_SYSTEM_OPERATION_RETRY.getMsg().equals(e.getMessage())) {
+          //Don't retry in the following cases:

Review comment:
       pull the comment above the if statement, and check in case rather than matching entries with ``failOnExceptions`` you can try something isAssignableFrom kind of stuff, so that the child classes of the mentioned exceptions can also be considered.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org