You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "jugomezv (via GitHub)" <gi...@apache.org> on 2023/05/17 21:02:24 UTC

[GitHub] [pinot] jugomezv opened a new pull request, #10777: Introduce a metric to count individual segment fetch failures.

jugomezv opened a new pull request, #10777:
URL: https://github.com/apache/pinot/pull/10777

   Current metric for failures ticks only after 3 consecutive failures and that almost never happens, I am introducing a new metric so we can track individual failures in segment fetcher.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1201015181


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/fetcher/BaseSegmentFetcher.java:
##########
@@ -102,7 +104,8 @@ public void fetchSegmentToLocal(List<URI> uris, File dest)
     });
   }
 
-  public File fetchUntarSegmentToLocalStreamed(URI uri, File dest, long rateLimit)
+  public File fetchUntarSegmentToLocalStreamed(URI uri, File dest, long rateLimit,
+      AtomicInteger attempts)

Review Comment:
   We only need the one we are changing, but yes let me look if we can propagate this to count in all uses



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jasperjiaguo commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jasperjiaguo (via GitHub)" <gi...@apache.org>.
jasperjiaguo commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1198323476


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -628,13 +628,16 @@ private File downloadAndStreamUntarWithRateLimit(String segmentName, SegmentZKMe
         maxStreamRateInByte);
     String uri = zkMetadata.getDownloadUrl();
     try {
-      File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
-      LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
-      return ret;
-    } catch (AttemptsExceededException e) {
-      LOGGER.error("Attempts exceeded when stream download-untarring segment: {} for table: {} from: {} to: {}",
-          segmentName, _tableNameWithType, uri, tempRootDir);
-      _serverMetrics.addMeteredTableValue(_tableNameWithType, ServerMeter.SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES, 1L);
+        File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
+        LOGGER.info("Downloaded and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
+        return ret;
+    } catch (Exception e) {

Review Comment:
   Discussed further and we decided to let BaseRetryPolicy.attempt() return the number of failure attempts, so that we can get it outside the actual code even upon success and add to the counter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jasperjiaguo commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jasperjiaguo (via GitHub)" <gi...@apache.org>.
jasperjiaguo commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1204778892


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/fetcher/HttpSegmentFetcher.java:
##########
@@ -97,54 +100,67 @@ public void fetchSegmentToLocal(URI downloadURI, File dest)
   }
 
   @Override
-  public File fetchUntarSegmentToLocalStreamed(URI downloadURI, File dest, long maxStreamRateInByte)
+  public File fetchUntarSegmentToLocalStreamed(URI downloadURI, File dest, long maxStreamRateInByte,
+      AtomicInteger attempts)
       throws Exception {
-    // Create a RoundRobinURIProvider to round robin IP addresses when retry uploading. Otherwise may always try to
+    // Create a RoundRobinURIProvider to round robin IP addresses when retry uploading. Otherwise, may always try to
     // download from a same broken host as: 1) DNS may not RR the IP addresses 2) OS cache the DNS resolution result.
     RoundRobinURIProvider uriProvider = new RoundRobinURIProvider(downloadURI);
     int retryCount = Math.max(_retryCount, uriProvider.numAddresses());
     AtomicReference<File> ret = new AtomicReference<>(); // return the untared segment directory
     _logger.info("Retry downloading for {} times. retryCount from pinot server config: {}, number of IP addresses for "
         + "download URI: {}", retryCount, _retryCount, uriProvider.numAddresses());
-    RetryPolicies.exponentialBackoffRetryPolicy(retryCount, _retryWaitMs, _retryDelayScaleFactor).attempt(() -> {
-      URI uri = uriProvider.next();
-      try {
-        String hostName = downloadURI.getHost();
-        int port = downloadURI.getPort();
-        // If the original download address is specified as host name, need add a "HOST" HTTP header to the HTTP
-        // request. Otherwise, if the download address is a LB address, when the LB be configured as "disallow direct
-        // access by IP address", downloading will fail.
-        List<Header> httpHeaders = new LinkedList<>();
-        if (!InetAddresses.isInetAddress(hostName)) {
-          httpHeaders.add(new BasicHeader(HttpHeaders.HOST, hostName + ":" + port));
-        }
-        ret.set(_httpClient.downloadUntarFileStreamed(uri, dest, _authProvider, httpHeaders, maxStreamRateInByte));
+    int tries = 0;
+    try {
+      tries = RetryPolicies.exponentialBackoffRetryPolicy(retryCount, _retryWaitMs, _retryDelayScaleFactor).attempt(
+          () -> {
+        URI uri = uriProvider.next();
+        try {
+          String hostName = downloadURI.getHost();
+          int port = downloadURI.getPort();
+          // If the original download address is specified as host name, need add a "HOST" HTTP header to the HTTP
+          // request. Otherwise, if the download address is a LB address, when the LB be configured as "disallow direct
+          // access by IP address", downloading will fail.
+          List<Header> httpHeaders = new LinkedList<>();
+          if (!InetAddresses.isInetAddress(hostName)) {
+            httpHeaders.add(new BasicHeader(HttpHeaders.HOST, hostName + ":" + port));
+          }
+          ret.set(_httpClient.downloadUntarFileStreamed(uri, dest, _authProvider, httpHeaders, maxStreamRateInByte));
 
-        return true;
-      } catch (HttpErrorStatusException e) {
-        int statusCode = e.getStatusCode();
-        if (statusCode == HttpStatus.SC_NOT_FOUND || statusCode >= 500) {
-          // Temporary exception
-          // 404 is treated as a temporary exception, as the downloadURI may be backed by multiple hosts,
-          // if singe host is down, can retry with another host.
-          _logger.warn("Got temporary error status code: {} while downloading segment from: {} to: {}", statusCode, uri,
+          return true;
+        } catch (HttpErrorStatusException e) {
+          int statusCode = e.getStatusCode();
+          if (statusCode == HttpStatus.SC_NOT_FOUND || statusCode >= 500) {
+            // Temporary exception
+            // 404 is treated as a temporary exception, as the downloadURI may be backed by multiple hosts,
+            // if singe host is down, can retry with another host.
+            _logger.warn("Got temporary error status code: {} while downloading segment from: {} to: {}", statusCode,
+                uri,

Review Comment:
   (nit) remove the new line here?



##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/retry/BaseRetryPolicy.java:
##########
@@ -42,26 +42,26 @@ protected BaseRetryPolicy(int maxNumAttempts) {
   protected abstract long getDelayMs(int currentAttempt);
 
   @Override
-  public void attempt(Callable<Boolean> operation)
+  public int attempt(Callable<Boolean> operation)
       throws AttemptsExceededException, RetriableOperationException {
     int attempt = 0;
     try {
       while (attempt < _maxNumAttempts - 1) {
         if (Boolean.TRUE.equals(operation.call())) {
           // Succeeded
-          return;
+          return attempt;
         } else {
           // Failed
           Thread.sleep(getDelayMs(attempt++));
         }
       }
       if (_maxNumAttempts > 0 && Boolean.TRUE.equals(operation.call())) {
         // Succeeded
-        return;
+        return attempt;
       }
     } catch (Exception e) {
-      throw new RetriableOperationException(e);
+      throw new RetriableOperationException(e, attempt + 1);
     }
-    throw new AttemptsExceededException("Operation failed after " + _maxNumAttempts + " attempts");
+    throw new AttemptsExceededException("Operation failed after " + _maxNumAttempts + " attempts", attempt);

Review Comment:
   ```suggestion
       throw new AttemptsExceededException("Operation failed after " + _maxNumAttempts + " attempts", _maxNumAttempts);
   ```
   Should we return _maxNumAttempts here? 



##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -628,13 +630,22 @@ private File downloadAndStreamUntarWithRateLimit(String segmentName, SegmentZKMe
         maxStreamRateInByte);
     String uri = zkMetadata.getDownloadUrl();
     try {
-      File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
-      LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
-      return ret;
+        AtomicInteger attempts = new AtomicInteger(0);
+        File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte, attempts);
+        _serverMetrics.addMeteredTableValue(_tableNameWithType, ServerMeter.SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES,
+            attempts.get());
+        LOGGER.info("Downloaded and untarred segment: {} for table: {} from: {} attempts: {}", segmentName,
+            _tableNameWithType, uri, attempts.get());
+        return ret;
     } catch (AttemptsExceededException e) {
+      _serverMetrics.addMeteredTableValue(_tableNameWithType, ServerMeter.SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES,
+          e.getAttempts());
       LOGGER.error("Attempts exceeded when stream download-untarring segment: {} for table: {} from: {} to: {}",
           segmentName, _tableNameWithType, uri, tempRootDir);
-      _serverMetrics.addMeteredTableValue(_tableNameWithType, ServerMeter.SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES, 1L);
+      throw e;
+    } catch (RetriableOperationException e) {
+      _serverMetrics.addMeteredTableValue(_tableNameWithType, ServerMeter.SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES,
+          e.getAttempts());

Review Comment:
   Should we use attempts.get() here as well for consistency, since we have already populated attempts upon exception in fetchUntarSegmentToLocalStreamed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10777: Introduce a metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1198102431


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -628,9 +628,15 @@ private File downloadAndStreamUntarWithRateLimit(String segmentName, SegmentZKMe
         maxStreamRateInByte);
     String uri = zkMetadata.getDownloadUrl();
     try {
-      File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
-      LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
-      return ret;
+      try {
+        File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
+        LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
+        return ret;
+      } catch (Exception e) {
+        _serverMetrics.addMeteredTableValue(_tableNameWithType,
+            ServerMeter.SEGMENT_TOTAL_STREAMED_DOWNLOAD_UNTAR_FAILURES, 1L);

Review Comment:
   That is what I though, so to make sure we are in sync: we will redefine the old metric to tick on individual failures and may be just log when we exceed the max retries?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1204610778


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/fetcher/HttpSegmentFetcher.java:
##########
@@ -106,7 +108,8 @@ public File fetchUntarSegmentToLocalStreamed(URI downloadURI, File dest, long ma
     AtomicReference<File> ret = new AtomicReference<>(); // return the untared segment directory
     _logger.info("Retry downloading for {} times. retryCount from pinot server config: {}, number of IP addresses for "
         + "download URI: {}", retryCount, _retryCount, uriProvider.numAddresses());
-    RetryPolicies.exponentialBackoffRetryPolicy(retryCount, _retryWaitMs, _retryDelayScaleFactor).attempt(() -> {
+    int tries = RetryPolicies.exponentialBackoffRetryPolicy(retryCount, _retryWaitMs,

Review Comment:
   Never mind I think I get your point now, lets sync in person 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jasperjiaguo commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jasperjiaguo (via GitHub)" <gi...@apache.org>.
jasperjiaguo commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1198288230


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -628,13 +628,16 @@ private File downloadAndStreamUntarWithRateLimit(String segmentName, SegmentZKMe
         maxStreamRateInByte);
     String uri = zkMetadata.getDownloadUrl();
     try {
-      File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
-      LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
-      return ret;
-    } catch (AttemptsExceededException e) {
-      LOGGER.error("Attempts exceeded when stream download-untarring segment: {} for table: {} from: {} to: {}",
-          segmentName, _tableNameWithType, uri, tempRootDir);
-      _serverMetrics.addMeteredTableValue(_tableNameWithType, ServerMeter.SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES, 1L);
+        File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
+        LOGGER.info("Downloaded and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
+        return ret;
+    } catch (Exception e) {

Review Comment:
   Hey @jugomezv, I might miss something here, but if our purpose here is to add to a metric every time a download-untar fails, then increment the counter in this catch block will not capture all the failures? If we want to have best observability and capture all the failures that happens even in the attempts, should we do that in the following function?https://github.com/apache/pinot/blob/2050376adcb67041972d9707b338373a5e6bc468/pinot-common/src/main/java/org/apache/pinot/common/utils/fetcher/HttpSegmentFetcher.java#L109



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jasperjiaguo commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jasperjiaguo (via GitHub)" <gi...@apache.org>.
jasperjiaguo commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1198288230


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -628,13 +628,16 @@ private File downloadAndStreamUntarWithRateLimit(String segmentName, SegmentZKMe
         maxStreamRateInByte);
     String uri = zkMetadata.getDownloadUrl();
     try {
-      File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
-      LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
-      return ret;
-    } catch (AttemptsExceededException e) {
-      LOGGER.error("Attempts exceeded when stream download-untarring segment: {} for table: {} from: {} to: {}",
-          segmentName, _tableNameWithType, uri, tempRootDir);
-      _serverMetrics.addMeteredTableValue(_tableNameWithType, ServerMeter.SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES, 1L);
+        File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
+        LOGGER.info("Downloaded and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
+        return ret;
+    } catch (Exception e) {

Review Comment:
   Hey @jugomezv, I might miss something here, but if our purpose here is to add to a metric every time a download-untar fails, then adding to the counter here in this catch block will not capture all the failures? If we want to have best observability and capture all the failures that happens even in the attempts, should we do that in https://github.com/apache/pinot/blob/2050376adcb67041972d9707b338373a5e6bc468/pinot-common/src/main/java/org/apache/pinot/common/utils/fetcher/HttpSegmentFetcher.java#L109



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jtao15 commented on a diff in pull request #10777: Introduce a metric to count individual segment fetch failures.

Posted by "jtao15 (via GitHub)" <gi...@apache.org>.
jtao15 commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1197073484


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -628,9 +628,15 @@ private File downloadAndStreamUntarWithRateLimit(String segmentName, SegmentZKMe
         maxStreamRateInByte);
     String uri = zkMetadata.getDownloadUrl();
     try {
-      File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
-      LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
-      return ret;
+      try {
+        File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
+        LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
+        return ret;
+      } catch (Exception e) {
+        _serverMetrics.addMeteredTableValue(_tableNameWithType,
+            ServerMeter.SEGMENT_TOTAL_STREAMED_DOWNLOAD_UNTAR_FAILURES, 1L);

Review Comment:
   IMO, it's better to reuse the old metric and mark this pr as an incompatible change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jasperjiaguo commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jasperjiaguo (via GitHub)" <gi...@apache.org>.
jasperjiaguo commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1198288230


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -628,13 +628,16 @@ private File downloadAndStreamUntarWithRateLimit(String segmentName, SegmentZKMe
         maxStreamRateInByte);
     String uri = zkMetadata.getDownloadUrl();
     try {
-      File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
-      LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
-      return ret;
-    } catch (AttemptsExceededException e) {
-      LOGGER.error("Attempts exceeded when stream download-untarring segment: {} for table: {} from: {} to: {}",
-          segmentName, _tableNameWithType, uri, tempRootDir);
-      _serverMetrics.addMeteredTableValue(_tableNameWithType, ServerMeter.SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES, 1L);
+        File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
+        LOGGER.info("Downloaded and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
+        return ret;
+    } catch (Exception e) {

Review Comment:
   Hey @jugomezv, I might miss something here, but if our purpose is to add to a metric every time a download-untar fails, then increment the counter in this catch block will not capture all the failures? If we want to have best observability and capture all the failures that happens even in the attempts, should we do that in the following function?https://github.com/apache/pinot/blob/2050376adcb67041972d9707b338373a5e6bc468/pinot-common/src/main/java/org/apache/pinot/common/utils/fetcher/HttpSegmentFetcher.java#L109



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] navina commented on pull request #10777: Introduce a metric to count individual segment fetch failures.

Posted by "navina (via GitHub)" <gi...@apache.org>.
navina commented on PR #10777:
URL: https://github.com/apache/pinot/pull/10777#issuecomment-1552206271

   btw, which metric are you trying to re-define? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1204815318


##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/retry/RetryPolicy.java:
##########
@@ -33,7 +33,8 @@ public interface RetryPolicy {
    * @param operation The operation to attempt, which returns true on success and false on failure.
    * @throws AttemptsExceededException
    * @throws RetriableOperationException
+   * @return the number of attempts used for the operation. 0 means the first try was successful.
    */
-  void attempt(Callable<Boolean> operation)
+  int attempt(Callable<Boolean> operation)

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10777: Introduce a metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1198107469


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -628,9 +628,15 @@ private File downloadAndStreamUntarWithRateLimit(String segmentName, SegmentZKMe
         maxStreamRateInByte);
     String uri = zkMetadata.getDownloadUrl();
     try {
-      File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
-      LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
-      return ret;
+      try {
+        File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
+        LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
+        return ret;
+      } catch (Exception e) {

Review Comment:
   Yes I think if we redefine the semantic of the old metric we dont have this issue, let me resend a new PR with that...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on PR #10777:
URL: https://github.com/apache/pinot/pull/10777#issuecomment-1553517387

   Test ADLSGen2PinotFSTest runs fine locally, will send new empty commit to retrigger tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jackjlli merged pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jackjlli (via GitHub)" <gi...@apache.org>.
jackjlli merged PR #10777:
URL: https://github.com/apache/pinot/pull/10777


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1204815137


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/fetcher/HttpSegmentFetcher.java:
##########
@@ -145,6 +148,7 @@ public File fetchUntarSegmentToLocalStreamed(URI downloadURI, File dest, long ma
         return false;
       }
     });
+    attempts.set(tries);

Review Comment:
   Done on earlier commit



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1204820751


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/fetcher/HttpSegmentFetcher.java:
##########
@@ -97,54 +100,67 @@ public void fetchSegmentToLocal(URI downloadURI, File dest)
   }
 
   @Override
-  public File fetchUntarSegmentToLocalStreamed(URI downloadURI, File dest, long maxStreamRateInByte)
+  public File fetchUntarSegmentToLocalStreamed(URI downloadURI, File dest, long maxStreamRateInByte,
+      AtomicInteger attempts)
       throws Exception {
-    // Create a RoundRobinURIProvider to round robin IP addresses when retry uploading. Otherwise may always try to
+    // Create a RoundRobinURIProvider to round robin IP addresses when retry uploading. Otherwise, may always try to
     // download from a same broken host as: 1) DNS may not RR the IP addresses 2) OS cache the DNS resolution result.
     RoundRobinURIProvider uriProvider = new RoundRobinURIProvider(downloadURI);
     int retryCount = Math.max(_retryCount, uriProvider.numAddresses());
     AtomicReference<File> ret = new AtomicReference<>(); // return the untared segment directory
     _logger.info("Retry downloading for {} times. retryCount from pinot server config: {}, number of IP addresses for "
         + "download URI: {}", retryCount, _retryCount, uriProvider.numAddresses());
-    RetryPolicies.exponentialBackoffRetryPolicy(retryCount, _retryWaitMs, _retryDelayScaleFactor).attempt(() -> {
-      URI uri = uriProvider.next();
-      try {
-        String hostName = downloadURI.getHost();
-        int port = downloadURI.getPort();
-        // If the original download address is specified as host name, need add a "HOST" HTTP header to the HTTP
-        // request. Otherwise, if the download address is a LB address, when the LB be configured as "disallow direct
-        // access by IP address", downloading will fail.
-        List<Header> httpHeaders = new LinkedList<>();
-        if (!InetAddresses.isInetAddress(hostName)) {
-          httpHeaders.add(new BasicHeader(HttpHeaders.HOST, hostName + ":" + port));
-        }
-        ret.set(_httpClient.downloadUntarFileStreamed(uri, dest, _authProvider, httpHeaders, maxStreamRateInByte));
+    int tries = 0;
+    try {
+      tries = RetryPolicies.exponentialBackoffRetryPolicy(retryCount, _retryWaitMs, _retryDelayScaleFactor).attempt(
+          () -> {
+        URI uri = uriProvider.next();
+        try {
+          String hostName = downloadURI.getHost();
+          int port = downloadURI.getPort();
+          // If the original download address is specified as host name, need add a "HOST" HTTP header to the HTTP
+          // request. Otherwise, if the download address is a LB address, when the LB be configured as "disallow direct
+          // access by IP address", downloading will fail.
+          List<Header> httpHeaders = new LinkedList<>();
+          if (!InetAddresses.isInetAddress(hostName)) {
+            httpHeaders.add(new BasicHeader(HttpHeaders.HOST, hostName + ":" + port));
+          }
+          ret.set(_httpClient.downloadUntarFileStreamed(uri, dest, _authProvider, httpHeaders, maxStreamRateInByte));
 
-        return true;
-      } catch (HttpErrorStatusException e) {
-        int statusCode = e.getStatusCode();
-        if (statusCode == HttpStatus.SC_NOT_FOUND || statusCode >= 500) {
-          // Temporary exception
-          // 404 is treated as a temporary exception, as the downloadURI may be backed by multiple hosts,
-          // if singe host is down, can retry with another host.
-          _logger.warn("Got temporary error status code: {} while downloading segment from: {} to: {}", statusCode, uri,
+          return true;
+        } catch (HttpErrorStatusException e) {
+          int statusCode = e.getStatusCode();
+          if (statusCode == HttpStatus.SC_NOT_FOUND || statusCode >= 500) {
+            // Temporary exception
+            // 404 is treated as a temporary exception, as the downloadURI may be backed by multiple hosts,
+            // if singe host is down, can retry with another host.
+            _logger.warn("Got temporary error status code: {} while downloading segment from: {} to: {}", statusCode,
+                uri,

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10777: Introduce a metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1198187493


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -628,9 +628,15 @@ private File downloadAndStreamUntarWithRateLimit(String segmentName, SegmentZKMe
         maxStreamRateInByte);
     String uri = zkMetadata.getDownloadUrl();
     try {
-      File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
-      LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
-      return ret;
+      try {
+        File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
+        LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
+        return ret;
+      } catch (Exception e) {
+        _serverMetrics.addMeteredTableValue(_tableNameWithType,
+            ServerMeter.SEGMENT_TOTAL_STREAMED_DOWNLOAD_UNTAR_FAILURES, 1L);

Review Comment:
   Done in commit 84c64c1f925b9fabe62f51c9ca2a466414fff34d



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1204422894


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/fetcher/BaseSegmentFetcher.java:
##########
@@ -49,6 +50,7 @@ public abstract class BaseSegmentFetcher implements SegmentFetcher {
   protected int _retryDelayScaleFactor;
   protected AuthProvider _authProvider;
 
+

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jackjlli commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jackjlli (via GitHub)" <gi...@apache.org>.
jackjlli commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1200829713


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/fetcher/BaseSegmentFetcher.java:
##########
@@ -102,7 +104,8 @@ public void fetchSegmentToLocal(List<URI> uris, File dest)
     });
   }
 
-  public File fetchUntarSegmentToLocalStreamed(URI uri, File dest, long rateLimit)
+  public File fetchUntarSegmentToLocalStreamed(URI uri, File dest, long rateLimit,
+      AtomicInteger attempts)

Review Comment:
   Now that we record the attempts for this signature. Do we plan to add this attempt to the other one(s) as well? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jasperjiaguo commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jasperjiaguo (via GitHub)" <gi...@apache.org>.
jasperjiaguo commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1202698942


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/fetcher/BaseSegmentFetcher.java:
##########
@@ -49,6 +50,7 @@ public abstract class BaseSegmentFetcher implements SegmentFetcher {
   protected int _retryDelayScaleFactor;
   protected AuthProvider _authProvider;
 
+

Review Comment:
   (nit) please remove the whitespace



##########
pinot-common/src/main/java/org/apache/pinot/common/utils/fetcher/HttpSegmentFetcher.java:
##########
@@ -145,6 +148,7 @@ public File fetchUntarSegmentToLocalStreamed(URI downloadURI, File dest, long ma
         return false;
       }
     });
+    attempts.set(tries);

Review Comment:
   ```suggestion
       attempts.set(tries);
       } catch (AttemptsExceededException e) {
         attempts.set(retryCount);
         throw e;
       }
   ```
   I think we need some code like this to return the correct retry count upon failure of all attempts



##########
pinot-common/src/main/java/org/apache/pinot/common/utils/fetcher/HttpSegmentFetcher.java:
##########
@@ -106,7 +108,8 @@ public File fetchUntarSegmentToLocalStreamed(URI downloadURI, File dest, long ma
     AtomicReference<File> ret = new AtomicReference<>(); // return the untared segment directory
     _logger.info("Retry downloading for {} times. retryCount from pinot server config: {}, number of IP addresses for "
         + "download URI: {}", retryCount, _retryCount, uriProvider.numAddresses());
-    RetryPolicies.exponentialBackoffRetryPolicy(retryCount, _retryWaitMs, _retryDelayScaleFactor).attempt(() -> {
+    int tries = RetryPolicies.exponentialBackoffRetryPolicy(retryCount, _retryWaitMs,

Review Comment:
   ```suggestion
   try{
       int tries = RetryPolicies.exponentialBackoffRetryPolicy(retryCount, _retryWaitMs,
   ```



##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/retry/RetryPolicy.java:
##########
@@ -33,7 +33,8 @@ public interface RetryPolicy {
    * @param operation The operation to attempt, which returns true on success and false on failure.
    * @throws AttemptsExceededException
    * @throws RetriableOperationException
+   * @return the number of attempts used for the operation. 0 means the first try was successful.

Review Comment:
   (nit) return the number of attempts used for the operation, upon successful attempt? In case of failure this wouldn't work right?



##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -628,13 +629,16 @@ private File downloadAndStreamUntarWithRateLimit(String segmentName, SegmentZKMe
         maxStreamRateInByte);
     String uri = zkMetadata.getDownloadUrl();
     try {
-      File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
-      LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
-      return ret;
+        AtomicInteger attempts = new AtomicInteger(0);
+        File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte, attempts);
+        _serverMetrics.addMeteredTableValue(_tableNameWithType, ServerMeter.SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES,
+            attempts.get());
+        LOGGER.info("Downloaded and untarred segment: {} for table: {} from: {} attempts: {}", segmentName,
+            _tableNameWithType, uri, attempts.get());
+        return ret;
     } catch (AttemptsExceededException e) {
       LOGGER.error("Attempts exceeded when stream download-untarring segment: {} for table: {} from: {} to: {}",
-          segmentName, _tableNameWithType, uri, tempRootDir);
-      _serverMetrics.addMeteredTableValue(_tableNameWithType, ServerMeter.SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES, 1L);

Review Comment:
   We still need to add the count to the metric here even if an error is thrown, otherwise it only covers the (partially) successful case?
   



##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/retry/RetryPolicy.java:
##########
@@ -33,7 +33,8 @@ public interface RetryPolicy {
    * @param operation The operation to attempt, which returns true on success and false on failure.
    * @throws AttemptsExceededException
    * @throws RetriableOperationException
+   * @return the number of attempts used for the operation. 0 means the first try was successful.
    */
-  void attempt(Callable<Boolean> operation)
+  int attempt(Callable<Boolean> operation)

Review Comment:
   can we add some tests in RetryPolicyTest for this new return value?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1198298970


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -628,13 +628,16 @@ private File downloadAndStreamUntarWithRateLimit(String segmentName, SegmentZKMe
         maxStreamRateInByte);
     String uri = zkMetadata.getDownloadUrl();
     try {
-      File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
-      LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
-      return ret;
-    } catch (AttemptsExceededException e) {
-      LOGGER.error("Attempts exceeded when stream download-untarring segment: {} for table: {} from: {} to: {}",
-          segmentName, _tableNameWithType, uri, tempRootDir);
-      _serverMetrics.addMeteredTableValue(_tableNameWithType, ServerMeter.SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES, 1L);
+        File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
+        LOGGER.info("Downloaded and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
+        return ret;
+    } catch (Exception e) {

Review Comment:
   Yes I think you are correct, this will likely catch only permanent errors likely missed all the others we handle specifically there, let me move



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1204828810


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -628,13 +630,22 @@ private File downloadAndStreamUntarWithRateLimit(String segmentName, SegmentZKMe
         maxStreamRateInByte);
     String uri = zkMetadata.getDownloadUrl();
     try {
-      File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
-      LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
-      return ret;
+        AtomicInteger attempts = new AtomicInteger(0);
+        File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte, attempts);
+        _serverMetrics.addMeteredTableValue(_tableNameWithType, ServerMeter.SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES,
+            attempts.get());
+        LOGGER.info("Downloaded and untarred segment: {} for table: {} from: {} attempts: {}", segmentName,
+            _tableNameWithType, uri, attempts.get());
+        return ret;
     } catch (AttemptsExceededException e) {
+      _serverMetrics.addMeteredTableValue(_tableNameWithType, ServerMeter.SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES,
+          e.getAttempts());
       LOGGER.error("Attempts exceeded when stream download-untarring segment: {} for table: {} from: {} to: {}",
           segmentName, _tableNameWithType, uri, tempRootDir);
-      _serverMetrics.addMeteredTableValue(_tableNameWithType, ServerMeter.SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES, 1L);
+      throw e;
+    } catch (RetriableOperationException e) {
+      _serverMetrics.addMeteredTableValue(_tableNameWithType, ServerMeter.SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES,
+          e.getAttempts());

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1204822195


##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/retry/BaseRetryPolicy.java:
##########
@@ -42,26 +42,26 @@ protected BaseRetryPolicy(int maxNumAttempts) {
   protected abstract long getDelayMs(int currentAttempt);
 
   @Override
-  public void attempt(Callable<Boolean> operation)
+  public int attempt(Callable<Boolean> operation)
       throws AttemptsExceededException, RetriableOperationException {
     int attempt = 0;
     try {
       while (attempt < _maxNumAttempts - 1) {
         if (Boolean.TRUE.equals(operation.call())) {
           // Succeeded
-          return;
+          return attempt;
         } else {
           // Failed
           Thread.sleep(getDelayMs(attempt++));
         }
       }
       if (_maxNumAttempts > 0 && Boolean.TRUE.equals(operation.call())) {
         // Succeeded
-        return;
+        return attempt;
       }
     } catch (Exception e) {
-      throw new RetriableOperationException(e);
+      throw new RetriableOperationException(e, attempt + 1);
     }
-    throw new AttemptsExceededException("Operation failed after " + _maxNumAttempts + " attempts");
+    throw new AttemptsExceededException("Operation failed after " + _maxNumAttempts + " attempts", attempt);

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] navina commented on a diff in pull request #10777: Introduce a metric to count individual segment fetch failures.

Posted by "navina (via GitHub)" <gi...@apache.org>.
navina commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1197155941


##########
pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerMeter.java:
##########
@@ -70,6 +70,7 @@ public enum ServerMeter implements AbstractMetrics.Meter {
   REFRESH_FAILURES("segments", false),
   UNTAR_FAILURES("segments", false),
   SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES("segments", false),
+  SEGMENT_TOTAL_STREAMED_DOWNLOAD_UNTAR_FAILURES("segments", false),

Review Comment:
   Can you please add metrics description as a part of this definition? You can see some examples in [`ServerTimer`](https://github.com/apache/pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerTimer.java#L29) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jackjlli commented on a diff in pull request #10777: Introduce a metric to count individual segment fetch failures.

Posted by "jackjlli (via GitHub)" <gi...@apache.org>.
jackjlli commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1197235975


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -628,9 +628,15 @@ private File downloadAndStreamUntarWithRateLimit(String segmentName, SegmentZKMe
         maxStreamRateInByte);
     String uri = zkMetadata.getDownloadUrl();
     try {
-      File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
-      LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
-      return ret;
+      try {
+        File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
+        LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);

Review Comment:
   nit: `Downloaded`



##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -628,9 +628,15 @@ private File downloadAndStreamUntarWithRateLimit(String segmentName, SegmentZKMe
         maxStreamRateInByte);
     String uri = zkMetadata.getDownloadUrl();
     try {
-      File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
-      LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
-      return ret;
+      try {
+        File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
+        LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
+        return ret;
+      } catch (Exception e) {

Review Comment:
   qq: why not put a catch block for `Exception e` right after Line 645? We might not need nested try-catch block which will affect the performance.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on pull request #10777: Introduce a metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on PR #10777:
URL: https://github.com/apache/pinot/pull/10777#issuecomment-1553400847

   
   
   
   
   > btw, which metric are you trying to re-define?
   SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES: it currently counts the times we exceed 3 failures but I want a count of individual failures, as we almost never get 3 failures in sequence but just one


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10777: Introduce a metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1198187955


##########
pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerMeter.java:
##########
@@ -70,6 +70,7 @@ public enum ServerMeter implements AbstractMetrics.Meter {
   REFRESH_FAILURES("segments", false),
   UNTAR_FAILURES("segments", false),
   SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES("segments", false),
+  SEGMENT_TOTAL_STREAMED_DOWNLOAD_UNTAR_FAILURES("segments", false),

Review Comment:
   Done in commit 84c64c1f925b9fabe62f51c9ca2a466414fff34d



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] navina commented on a diff in pull request #10777: Introduce a metric to count individual segment fetch failures.

Posted by "navina (via GitHub)" <gi...@apache.org>.
navina commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1197154934


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -628,9 +628,15 @@ private File downloadAndStreamUntarWithRateLimit(String segmentName, SegmentZKMe
         maxStreamRateInByte);
     String uri = zkMetadata.getDownloadUrl();
     try {
-      File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
-      LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
-      return ret;
+      try {
+        File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
+        LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
+        return ret;
+      } catch (Exception e) {
+        _serverMetrics.addMeteredTableValue(_tableNameWithType,
+            ServerMeter.SEGMENT_TOTAL_STREAMED_DOWNLOAD_UNTAR_FAILURES, 1L);

Review Comment:
   +1 I would rather not introduce another metric. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1200806156


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -628,13 +628,16 @@ private File downloadAndStreamUntarWithRateLimit(String segmentName, SegmentZKMe
         maxStreamRateInByte);
     String uri = zkMetadata.getDownloadUrl();
     try {
-      File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
-      LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
-      return ret;
-    } catch (AttemptsExceededException e) {
-      LOGGER.error("Attempts exceeded when stream download-untarring segment: {} for table: {} from: {} to: {}",
-          segmentName, _tableNameWithType, uri, tempRootDir);
-      _serverMetrics.addMeteredTableValue(_tableNameWithType, ServerMeter.SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES, 1L);
+        File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
+        LOGGER.info("Downloaded and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
+        return ret;
+    } catch (Exception e) {

Review Comment:
   OK @jasperjiaguo commit 7a38730d55925495bba863d4bb6991dbfe6be271 should address the concern above, please take a look when you can. @navina, @jackjlli please take a look at the new commit that addresses Jia's observation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1204449486


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/fetcher/HttpSegmentFetcher.java:
##########
@@ -106,7 +108,8 @@ public File fetchUntarSegmentToLocalStreamed(URI downloadURI, File dest, long ma
     AtomicReference<File> ret = new AtomicReference<>(); // return the untared segment directory
     _logger.info("Retry downloading for {} times. retryCount from pinot server config: {}, number of IP addresses for "
         + "download URI: {}", retryCount, _retryCount, uriProvider.numAddresses());
-    RetryPolicies.exponentialBackoffRetryPolicy(retryCount, _retryWaitMs, _retryDelayScaleFactor).attempt(() -> {
+    int tries = RetryPolicies.exponentialBackoffRetryPolicy(retryCount, _retryWaitMs,

Review Comment:
   We already catch when we call this function right? why would we like to do it once again here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1204433164


##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/retry/RetryPolicy.java:
##########
@@ -33,7 +33,8 @@ public interface RetryPolicy {
    * @param operation The operation to attempt, which returns true on success and false on failure.
    * @throws AttemptsExceededException
    * @throws RetriableOperationException
+   * @return the number of attempts used for the operation. 0 means the first try was successful.
    */
-  void attempt(Callable<Boolean> operation)
+  int attempt(Callable<Boolean> operation)

Review Comment:
   The challenge is when you throw, you don't return a value. I could assume is the max Retry count and add that, else this change continues to sprawl by adding a new out arg to that method that may need to change everything that uses it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1204814974


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/fetcher/HttpSegmentFetcher.java:
##########
@@ -106,7 +108,8 @@ public File fetchUntarSegmentToLocalStreamed(URI downloadURI, File dest, long ma
     AtomicReference<File> ret = new AtomicReference<>(); // return the untared segment directory
     _logger.info("Retry downloading for {} times. retryCount from pinot server config: {}, number of IP addresses for "
         + "download URI: {}", retryCount, _retryCount, uriProvider.numAddresses());
-    RetryPolicies.exponentialBackoffRetryPolicy(retryCount, _retryWaitMs, _retryDelayScaleFactor).attempt(() -> {
+    int tries = RetryPolicies.exponentialBackoffRetryPolicy(retryCount, _retryWaitMs,

Review Comment:
   Completed on previous commit



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1201268380


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/fetcher/BaseSegmentFetcher.java:
##########
@@ -102,7 +104,8 @@ public void fetchSegmentToLocal(List<URI> uris, File dest)
     });
   }
 
-  public File fetchUntarSegmentToLocalStreamed(URI uri, File dest, long rateLimit)
+  public File fetchUntarSegmentToLocalStreamed(URI uri, File dest, long rateLimit,
+      AtomicInteger attempts)

Review Comment:
   I looked into bubbling up the attempts in the other calls like fetchSegmentToLocal() but it will not make sense for the metric we are trying to maintain, we would have to define new metrics and at this point we don't need those. These calls are also used from controllers and realtime so they will sprawl metrics to other places so I will skip that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1204454938


##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/retry/RetryPolicy.java:
##########
@@ -33,7 +33,8 @@ public interface RetryPolicy {
    * @param operation The operation to attempt, which returns true on success and false on failure.
    * @throws AttemptsExceededException
    * @throws RetriableOperationException
+   * @return the number of attempts used for the operation. 0 means the first try was successful.

Review Comment:
   If you are referring to the exception case, I do not think the programmer should expect any return value there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jasperjiaguo commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jasperjiaguo (via GitHub)" <gi...@apache.org>.
jasperjiaguo commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1204787553


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -628,13 +630,22 @@ private File downloadAndStreamUntarWithRateLimit(String segmentName, SegmentZKMe
         maxStreamRateInByte);
     String uri = zkMetadata.getDownloadUrl();
     try {
-      File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
-      LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
-      return ret;
+        AtomicInteger attempts = new AtomicInteger(0);
+        File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte, attempts);
+        _serverMetrics.addMeteredTableValue(_tableNameWithType, ServerMeter.SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES,
+            attempts.get());
+        LOGGER.info("Downloaded and untarred segment: {} for table: {} from: {} attempts: {}", segmentName,
+            _tableNameWithType, uri, attempts.get());
+        return ret;
     } catch (AttemptsExceededException e) {
+      _serverMetrics.addMeteredTableValue(_tableNameWithType, ServerMeter.SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES,
+          e.getAttempts());
       LOGGER.error("Attempts exceeded when stream download-untarring segment: {} for table: {} from: {} to: {}",
           segmentName, _tableNameWithType, uri, tempRootDir);
-      _serverMetrics.addMeteredTableValue(_tableNameWithType, ServerMeter.SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES, 1L);
+      throw e;
+    } catch (RetriableOperationException e) {
+      _serverMetrics.addMeteredTableValue(_tableNameWithType, ServerMeter.SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES,
+          e.getAttempts());

Review Comment:
   Should we use attempts.get() here as well for consistency, since we have already populated attempts even upon exception in fetchUntarSegmentToLocalStreamed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10777: Introduce a metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1198188665


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -628,9 +628,15 @@ private File downloadAndStreamUntarWithRateLimit(String segmentName, SegmentZKMe
         maxStreamRateInByte);
     String uri = zkMetadata.getDownloadUrl();
     try {
-      File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
-      LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
-      return ret;
+      try {
+        File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
+        LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
+        return ret;
+      } catch (Exception e) {

Review Comment:
   commit 84c64c1f925b9fabe62f51c9ca2a466414fff34d should address this please take a look



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jugomezv commented on a diff in pull request #10777: Introduce a metric to count individual segment fetch failures.

Posted by "jugomezv (via GitHub)" <gi...@apache.org>.
jugomezv commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1198188997


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -628,9 +628,15 @@ private File downloadAndStreamUntarWithRateLimit(String segmentName, SegmentZKMe
         maxStreamRateInByte);
     String uri = zkMetadata.getDownloadUrl();
     try {
-      File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
-      LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
-      return ret;
+      try {
+        File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
+        LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);

Review Comment:
   done commit 84c64c1f925b9fabe62f51c9ca2a466414fff34d



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jasperjiaguo commented on a diff in pull request #10777: Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures.

Posted by "jasperjiaguo (via GitHub)" <gi...@apache.org>.
jasperjiaguo commented on code in PR #10777:
URL: https://github.com/apache/pinot/pull/10777#discussion_r1198323476


##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java:
##########
@@ -628,13 +628,16 @@ private File downloadAndStreamUntarWithRateLimit(String segmentName, SegmentZKMe
         maxStreamRateInByte);
     String uri = zkMetadata.getDownloadUrl();
     try {
-      File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
-      LOGGER.info("Download and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
-      return ret;
-    } catch (AttemptsExceededException e) {
-      LOGGER.error("Attempts exceeded when stream download-untarring segment: {} for table: {} from: {} to: {}",
-          segmentName, _tableNameWithType, uri, tempRootDir);
-      _serverMetrics.addMeteredTableValue(_tableNameWithType, ServerMeter.SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES, 1L);
+        File ret = SegmentFetcherFactory.fetchAndStreamUntarToLocal(uri, tempRootDir, maxStreamRateInByte);
+        LOGGER.info("Downloaded and untarred segment: {} for table: {} from: {}", segmentName, _tableNameWithType, uri);
+        return ret;
+    } catch (Exception e) {

Review Comment:
   Discussed further and we decided to let BaseRetryPolicy.attempt() return the number of failure attempts.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #10777: Introduce a metric to count individual segment fetch failures.

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #10777:
URL: https://github.com/apache/pinot/pull/10777#issuecomment-1552127228

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/10777?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#10777](https://app.codecov.io/gh/apache/pinot/pull/10777?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (61033aa) into [master](https://app.codecov.io/gh/apache/pinot/commit/59599872fbdc39af364996d9212d8a6066e56187?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (5959987) will **decrease** coverage by `38.77%`.
   > The diff coverage is `14.28%`.
   
   ```diff
   @@              Coverage Diff              @@
   ##             master   #10777       +/-   ##
   =============================================
   - Coverage     70.30%   31.54%   -38.77%     
   + Complexity     6469      453     -6016     
   =============================================
     Files          2157     2157               
     Lines        116018   116062       +44     
     Branches      17552    17559        +7     
   =============================================
   - Hits          81566    36608    -44958     
   - Misses        28761    76143    +47382     
   + Partials       5691     3311     -2380     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `24.00% <14.28%> (-0.08%)` | :arrow_down: |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `13.64% <0.00%> (-0.02%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://app.codecov.io/gh/apache/pinot/pull/10777?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [.../pinot/core/data/manager/BaseTableDataManager.java](https://app.codecov.io/gh/apache/pinot/pull/10777?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvQmFzZVRhYmxlRGF0YU1hbmFnZXIuamF2YQ==) | `56.31% <0.00%> (-19.29%)` | :arrow_down: |
   | [...a/org/apache/pinot/common/metrics/ServerMeter.java](https://app.codecov.io/gh/apache/pinot/pull/10777?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9TZXJ2ZXJNZXRlci5qYXZh) | `98.79% <100.00%> (+0.01%)` | :arrow_up: |
   
   ... and [1372 files with indirect coverage changes](https://app.codecov.io/gh/apache/pinot/pull/10777/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org