You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/07/29 07:52:18 UTC

[GitHub] [incubator-pinot] jackjlli opened a new pull request #5764: Add retry logic to download segment tar file in pinot server

jackjlli opened a new pull request #5764:
URL: https://github.com/apache/incubator-pinot/pull/5764


   ## Description
   This PR adds retry logic to download segment tar file in pinot server.
   Even if the tar file has been downloaded successfully, the file itself could be corrupted during the transmission. In that case, we should re-download it again.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] xiangfu0 commented on pull request #5764: Add retry logic to download segment tar file in pinot server

Posted by GitBox <gi...@apache.org>.
xiangfu0 commented on pull request #5764:
URL: https://github.com/apache/incubator-pinot/pull/5764#issuecomment-869254328


   is this still a valid pr?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #5764: Add retry logic to download segment tar file in pinot server

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #5764:
URL: https://github.com/apache/incubator-pinot/pull/5764#discussion_r461924349



##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/SegmentFetcherAndLoader.java
##########
@@ -184,39 +185,44 @@ private boolean isNewSegmentMetadata(String tableNameWithType, OfflineSegmentZKM
 
   private String downloadSegmentToLocal(String uri, PinotCrypter crypter, String tableName, String segmentName)
       throws Exception {
-    File tempDir = new File(new File(_instanceDataManager.getSegmentFileDirectory(), tableName),
-        "tmp-" + segmentName + "-" + UUID.randomUUID());
-    FileUtils.forceMkdir(tempDir);
-    File tempDownloadFile = new File(tempDir, segmentName + ENCODED_SUFFIX);
-    File tempTarFile = new File(tempDir, segmentName + TAR_GZ_SUFFIX);
-    File tempSegmentDir = new File(tempDir, segmentName);
-    try {
-      SegmentFetcherFactory.fetchSegmentToLocal(uri, tempDownloadFile);
-      if (crypter != null) {
-        crypter.decrypt(tempDownloadFile, tempTarFile);
-      } else {
-        tempTarFile = tempDownloadFile;
-      }
-
-      LOGGER
-          .info("Downloaded tarred segment: {} for table: {} from: {} to: {}, file length: {}", segmentName, tableName,
-              uri, tempTarFile, tempTarFile.length());
+    // Even if the tar file has been downloaded successfully, the file itself could be corrupted during the transmission.
+    // Thus, we should re-download it again.
+    RetryPolicies.fixedDelayRetryPolicy(5, 5_000L).attempt(() -> {
+      File tempDir = new File(new File(_instanceDataManager.getSegmentFileDirectory(), tableName),
+          "tmp-" + segmentName + "-" + UUID.randomUUID());
+      FileUtils.forceMkdir(tempDir);
+      File tempDownloadFile = new File(tempDir, segmentName + ENCODED_SUFFIX);
+      File tempTarFile = new File(tempDir, segmentName + TAR_GZ_SUFFIX);
+      File tempSegmentDir = new File(tempDir, segmentName);
+      try {
+        SegmentFetcherFactory.fetchSegmentToLocal(uri, tempDownloadFile);

Review comment:
       Comment added. 
   I also added the catch block to check whether `AttemptsExceededException` is thrown from `SegmentFetcherFactory.fetchSegmentToLocal(uri, tempDownloadFile)`. If so, exit the outer retry.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #5764: Add retry logic to download segment tar file in pinot server

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #5764:
URL: https://github.com/apache/incubator-pinot/pull/5764#discussion_r461970668



##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/SegmentFetcherAndLoader.java
##########
@@ -184,39 +186,48 @@ private boolean isNewSegmentMetadata(String tableNameWithType, OfflineSegmentZKM
 
   private String downloadSegmentToLocal(String uri, PinotCrypter crypter, String tableName, String segmentName)
       throws Exception {
-    File tempDir = new File(new File(_instanceDataManager.getSegmentFileDirectory(), tableName),
-        "tmp-" + segmentName + "-" + UUID.randomUUID());
-    FileUtils.forceMkdir(tempDir);
-    File tempDownloadFile = new File(tempDir, segmentName + ENCODED_SUFFIX);
-    File tempTarFile = new File(tempDir, segmentName + TAR_GZ_SUFFIX);
-    File tempSegmentDir = new File(tempDir, segmentName);
-    try {
-      SegmentFetcherFactory.fetchSegmentToLocal(uri, tempDownloadFile);
-      if (crypter != null) {
-        crypter.decrypt(tempDownloadFile, tempTarFile);
-      } else {
-        tempTarFile = tempDownloadFile;
-      }
-
-      LOGGER
-          .info("Downloaded tarred segment: {} for table: {} from: {} to: {}, file length: {}", segmentName, tableName,
-              uri, tempTarFile, tempTarFile.length());
+    // Even if the tar file has been downloaded successfully, the file itself could be corrupted during the transmission.

Review comment:
       Agree, however, we did observe in production that the file downloaded was corrupt (likely truncated), and restarting the server downloaded the segment again and was able to untar it.
   
   Perhaps, we can start off by first emitting a metric and alerting on it to see how often we run into the issue. If this was a one-off due to manual intervention, then perhaps retry is not warranted. But if there's a systematic problem, we should try to understand what's causing the corruption, and then decide whether retry is warranted.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #5764: Add retry logic to download segment tar file in pinot server

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #5764:
URL: https://github.com/apache/incubator-pinot/pull/5764#discussion_r463123037



##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/SegmentFetcherAndLoader.java
##########
@@ -184,39 +186,48 @@ private boolean isNewSegmentMetadata(String tableNameWithType, OfflineSegmentZKM
 
   private String downloadSegmentToLocal(String uri, PinotCrypter crypter, String tableName, String segmentName)
       throws Exception {
-    File tempDir = new File(new File(_instanceDataManager.getSegmentFileDirectory(), tableName),
-        "tmp-" + segmentName + "-" + UUID.randomUUID());
-    FileUtils.forceMkdir(tempDir);
-    File tempDownloadFile = new File(tempDir, segmentName + ENCODED_SUFFIX);
-    File tempTarFile = new File(tempDir, segmentName + TAR_GZ_SUFFIX);
-    File tempSegmentDir = new File(tempDir, segmentName);
-    try {
-      SegmentFetcherFactory.fetchSegmentToLocal(uri, tempDownloadFile);
-      if (crypter != null) {
-        crypter.decrypt(tempDownloadFile, tempTarFile);
-      } else {
-        tempTarFile = tempDownloadFile;
-      }
-
-      LOGGER
-          .info("Downloaded tarred segment: {} for table: {} from: {} to: {}, file length: {}", segmentName, tableName,
-              uri, tempTarFile, tempTarFile.length());
+    // Even if the tar file has been downloaded successfully, the file itself could be corrupted during the transmission.

Review comment:
       Thanks @jackjlli 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #5764: Add retry logic to download segment tar file in pinot server

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #5764:
URL: https://github.com/apache/incubator-pinot/pull/5764#discussion_r461914758



##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/SegmentFetcherAndLoader.java
##########
@@ -184,39 +185,44 @@ private boolean isNewSegmentMetadata(String tableNameWithType, OfflineSegmentZKM
 
   private String downloadSegmentToLocal(String uri, PinotCrypter crypter, String tableName, String segmentName)
       throws Exception {
-    File tempDir = new File(new File(_instanceDataManager.getSegmentFileDirectory(), tableName),
-        "tmp-" + segmentName + "-" + UUID.randomUUID());
-    FileUtils.forceMkdir(tempDir);
-    File tempDownloadFile = new File(tempDir, segmentName + ENCODED_SUFFIX);
-    File tempTarFile = new File(tempDir, segmentName + TAR_GZ_SUFFIX);
-    File tempSegmentDir = new File(tempDir, segmentName);
-    try {
-      SegmentFetcherFactory.fetchSegmentToLocal(uri, tempDownloadFile);
-      if (crypter != null) {
-        crypter.decrypt(tempDownloadFile, tempTarFile);
-      } else {
-        tempTarFile = tempDownloadFile;
-      }
-
-      LOGGER
-          .info("Downloaded tarred segment: {} for table: {} from: {} to: {}, file length: {}", segmentName, tableName,
-              uri, tempTarFile, tempTarFile.length());
+    // Even if the tar file has been downloaded successfully, the file itself could be corrupted during the transmission.
+    // Thus, we should re-download it again.
+    RetryPolicies.fixedDelayRetryPolicy(5, 5_000L).attempt(() -> {
+      File tempDir = new File(new File(_instanceDataManager.getSegmentFileDirectory(), tableName),
+          "tmp-" + segmentName + "-" + UUID.randomUUID());
+      FileUtils.forceMkdir(tempDir);
+      File tempDownloadFile = new File(tempDir, segmentName + ENCODED_SUFFIX);
+      File tempTarFile = new File(tempDir, segmentName + TAR_GZ_SUFFIX);
+      File tempSegmentDir = new File(tempDir, segmentName);
+      try {
+        SegmentFetcherFactory.fetchSegmentToLocal(uri, tempDownloadFile);

Review comment:
       fetchSegmentToLocal already adds a retry  (can you document that in the API? I believe all the implementations add retries)

##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/SegmentFetcherAndLoader.java
##########
@@ -184,39 +186,48 @@ private boolean isNewSegmentMetadata(String tableNameWithType, OfflineSegmentZKM
 
   private String downloadSegmentToLocal(String uri, PinotCrypter crypter, String tableName, String segmentName)
       throws Exception {
-    File tempDir = new File(new File(_instanceDataManager.getSegmentFileDirectory(), tableName),
-        "tmp-" + segmentName + "-" + UUID.randomUUID());
-    FileUtils.forceMkdir(tempDir);
-    File tempDownloadFile = new File(tempDir, segmentName + ENCODED_SUFFIX);
-    File tempTarFile = new File(tempDir, segmentName + TAR_GZ_SUFFIX);
-    File tempSegmentDir = new File(tempDir, segmentName);
-    try {
-      SegmentFetcherFactory.fetchSegmentToLocal(uri, tempDownloadFile);
-      if (crypter != null) {
-        crypter.decrypt(tempDownloadFile, tempTarFile);
-      } else {
-        tempTarFile = tempDownloadFile;
-      }
-
-      LOGGER
-          .info("Downloaded tarred segment: {} for table: {} from: {} to: {}, file length: {}", segmentName, tableName,
-              uri, tempTarFile, tempTarFile.length());
+    // Even if the tar file has been downloaded successfully, the file itself could be corrupted during the transmission.

Review comment:
       Why would the file get corrupted during transmission?
   (TCP works for communication from here to mars :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #5764: Add retry logic to download segment tar file in pinot server

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #5764:
URL: https://github.com/apache/incubator-pinot/pull/5764#discussion_r462422577



##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/SegmentFetcherAndLoader.java
##########
@@ -184,39 +186,48 @@ private boolean isNewSegmentMetadata(String tableNameWithType, OfflineSegmentZKM
 
   private String downloadSegmentToLocal(String uri, PinotCrypter crypter, String tableName, String segmentName)
       throws Exception {
-    File tempDir = new File(new File(_instanceDataManager.getSegmentFileDirectory(), tableName),
-        "tmp-" + segmentName + "-" + UUID.randomUUID());
-    FileUtils.forceMkdir(tempDir);
-    File tempDownloadFile = new File(tempDir, segmentName + ENCODED_SUFFIX);
-    File tempTarFile = new File(tempDir, segmentName + TAR_GZ_SUFFIX);
-    File tempSegmentDir = new File(tempDir, segmentName);
-    try {
-      SegmentFetcherFactory.fetchSegmentToLocal(uri, tempDownloadFile);
-      if (crypter != null) {
-        crypter.decrypt(tempDownloadFile, tempTarFile);
-      } else {
-        tempTarFile = tempDownloadFile;
-      }
-
-      LOGGER
-          .info("Downloaded tarred segment: {} for table: {} from: {} to: {}, file length: {}", segmentName, tableName,
-              uri, tempTarFile, tempTarFile.length());
+    // Even if the tar file has been downloaded successfully, the file itself could be corrupted during the transmission.

Review comment:
       Yes, that works. And if there are cases where the segment downloaded by server is truncated, we need to fix that instead of adding retries. Obvious things to check if it is something more than manual operations:
   *  Are we sending the reload message before moving the segment to its final location? It appears no from a cursory look.
   * Is there a possibility that multiple controllers are fielding the same segment ? (You checked the logs to see that this was not the case, but) We are not handling this case. The source file is likely to get corrupted since each controller will use `pinotFS.copyToDestination()` (not exact API) to set the segment in the destination. We should enhance PinotFS to support copying to a temp destination and then moving in place, or use two apis to move/copy the file. Note that this can happen even if the sender of the segment decides to give up on a (slow) connection and retry (without closing the old connection, or closing it too late), and get a different controller to upload the segment.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #5764: Add retry logic to download segment tar file in pinot server

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #5764:
URL: https://github.com/apache/incubator-pinot/pull/5764#discussion_r462575993



##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/SegmentFetcherAndLoader.java
##########
@@ -184,39 +186,48 @@ private boolean isNewSegmentMetadata(String tableNameWithType, OfflineSegmentZKM
 
   private String downloadSegmentToLocal(String uri, PinotCrypter crypter, String tableName, String segmentName)
       throws Exception {
-    File tempDir = new File(new File(_instanceDataManager.getSegmentFileDirectory(), tableName),
-        "tmp-" + segmentName + "-" + UUID.randomUUID());
-    FileUtils.forceMkdir(tempDir);
-    File tempDownloadFile = new File(tempDir, segmentName + ENCODED_SUFFIX);
-    File tempTarFile = new File(tempDir, segmentName + TAR_GZ_SUFFIX);
-    File tempSegmentDir = new File(tempDir, segmentName);
-    try {
-      SegmentFetcherFactory.fetchSegmentToLocal(uri, tempDownloadFile);
-      if (crypter != null) {
-        crypter.decrypt(tempDownloadFile, tempTarFile);
-      } else {
-        tempTarFile = tempDownloadFile;
-      }
-
-      LOGGER
-          .info("Downloaded tarred segment: {} for table: {} from: {} to: {}, file length: {}", segmentName, tableName,
-              uri, tempTarFile, tempTarFile.length());
+    // Even if the tar file has been downloaded successfully, the file itself could be corrupted during the transmission.

Review comment:
       I just filed another PR to add the server metric: https://github.com/apache/incubator-pinot/pull/5768.
   
   We can hold off this PR a bit and see how often the exception happens on untarring segments.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org