You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "abhishekrb19 (via GitHub)" <gi...@apache.org> on 2023/05/19 00:44:29 UTC

[GitHub] [druid] abhishekrb19 commented on a diff in pull request #14292: Do not emit negative lag because of stale offsets

abhishekrb19 commented on code in PR #14292:
URL: https://github.com/apache/druid/pull/14292#discussion_r1198412627


##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java:
##########
@@ -4220,6 +4220,18 @@ protected void emitLag()
           return;
         }
 
+        // Try emitting lag even with stale metrics provided that none of the partitions has negative lag
+        final boolean areOffsetsStale =
+            sequenceLastUpdated != null
+            && sequenceLastUpdated.getMillis()
+               < System.currentTimeMillis() - tuningConfig.getOffsetFetchPeriod().getMillis();
+        if (areOffsetsStale && partitionLags.values().stream().anyMatch(x -> x < 0)) {
+          log.warn("Lag is negative and will not be emitted because topic offsets have become stale. "

Review Comment:
   For troubleshooting, I think it'll also be good to log the topic:partition info where the offsets may potentially be  stale



##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java:
##########
@@ -4220,6 +4220,18 @@ protected void emitLag()
           return;
         }
 
+        // Try emitting lag even with stale metrics provided that none of the partitions has negative lag

Review Comment:
   ```suggestion
           // Try emitting lag even with stale metrics provided that none of the partitions have negative lag
   ```



##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java:
##########
@@ -4220,6 +4220,18 @@ protected void emitLag()
           return;
         }
 
+        // Try emitting lag even with stale metrics provided that none of the partitions has negative lag
+        final boolean areOffsetsStale =
+            sequenceLastUpdated != null
+            && sequenceLastUpdated.getMillis()
+               < System.currentTimeMillis() - tuningConfig.getOffsetFetchPeriod().getMillis();
+        if (areOffsetsStale && partitionLags.values().stream().anyMatch(x -> x < 0)) {
+          log.warn("Lag is negative and will not be emitted because topic offsets have become stale. "
+                   + "This will not impact data processing. "
+                   + "Offsets may become stale because of connectivity issues.");
+          return;

Review Comment:
   Should we skip emitting lag metrics only for the stale partitions? I think in general, it'll be helpful to emit metrics for partitions that have non-zero lag. For example, if a topic's partitions are spread across multiple brokers and only some have connectivity issues. Or for a topic where some partitions receive little to no data, those may selectively be considered "stale". 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org