You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/07/29 07:56:30 UTC

[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #5764: Add retry logic to download segment tar file in pinot server

mayankshriv commented on a change in pull request #5764:
URL: https://github.com/apache/incubator-pinot/pull/5764#discussion_r461970668



##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/SegmentFetcherAndLoader.java
##########
@@ -184,39 +186,48 @@ private boolean isNewSegmentMetadata(String tableNameWithType, OfflineSegmentZKM
 
   private String downloadSegmentToLocal(String uri, PinotCrypter crypter, String tableName, String segmentName)
       throws Exception {
-    File tempDir = new File(new File(_instanceDataManager.getSegmentFileDirectory(), tableName),
-        "tmp-" + segmentName + "-" + UUID.randomUUID());
-    FileUtils.forceMkdir(tempDir);
-    File tempDownloadFile = new File(tempDir, segmentName + ENCODED_SUFFIX);
-    File tempTarFile = new File(tempDir, segmentName + TAR_GZ_SUFFIX);
-    File tempSegmentDir = new File(tempDir, segmentName);
-    try {
-      SegmentFetcherFactory.fetchSegmentToLocal(uri, tempDownloadFile);
-      if (crypter != null) {
-        crypter.decrypt(tempDownloadFile, tempTarFile);
-      } else {
-        tempTarFile = tempDownloadFile;
-      }
-
-      LOGGER
-          .info("Downloaded tarred segment: {} for table: {} from: {} to: {}, file length: {}", segmentName, tableName,
-              uri, tempTarFile, tempTarFile.length());
+    // Even if the tar file has been downloaded successfully, the file itself could be corrupted during the transmission.

Review comment:
       Agree, however, we did observe in production that the file downloaded was corrupt (likely truncated), and restarting the server downloaded the segment again and was able to untar it.
   
   Perhaps, we can start off by first emitting a metric and alerting on it to see how often we run into the issue. If this was a one-off due to manual intervention, then perhaps retry is not warranted. But if there's a systematic problem, we should try to understand what's causing the corruption, and then decide whether retry is warranted.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org