You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "talatuyarer (via GitHub)" <gi...@apache.org> on 2023/02/19 05:41:59 UTC

[GitHub] [beam] talatuyarer opened a new pull request, #25554: Backlog metrics do not showing up in FlinkRunner

talatuyarer opened a new pull request, #25554:
URL: https://github.com/apache/beam/pull/25554

   We have a Flink Job which does not emit backlog metrics. Actually metrics are emitting in KafkaIO. However I could not see them on Flink Metric system. Looks like Beam ->. Flink wiring is broken. I set metricContext in Checkpointing phase which is the place metrics emit on UnBoundedReader.
   
   @mxm @tweise @angoenka Could you please reviwe my mr ? I tested in our env. It is working as expected.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #25554: Backlog metrics do not showing up in FlinkRunner

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25554:
URL: https://github.com/apache/beam/pull/25554#issuecomment-1591118357

   This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #25554: Backlog metrics do not showing up in FlinkRunner

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25554:
URL: https://github.com/apache/beam/pull/25554#issuecomment-1442635430

   No reviewers could be found from any of the labels on the PR or in the fallback reviewers list. Check the config file to make sure reviewers are configured


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] je-ik commented on pull request #25554: Backlog metrics do not showing up in FlinkRunner

Posted by "je-ik (via GitHub)" <gi...@apache.org>.
je-ik commented on PR #25554:
URL: https://github.com/apache/beam/pull/25554#issuecomment-1447849927

   No, I think there is good reason the backlog is computed on checkpoint only. :)
   Yes, there are two threads, one thread runs the source and the other does the checkpoint. There is a lock (getCheckpointLock()) that ensures that this is consistent. If the source updates its internal metrics from the checkpoint thread, I would feel these metrics should be reported to flink on the first call to advance() - which is apparently not the case, I'm just a little struggling to see why. This might indicate some other bug somewhere.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] je-ik commented on pull request #25554: Backlog metrics do not showing up in FlinkRunner

Posted by "je-ik (via GitHub)" <gi...@apache.org>.
je-ik commented on PR #25554:
URL: https://github.com/apache/beam/pull/25554#issuecomment-1446374915

   > We have a Flink Job which does not emit backlog metrics. Actually metrics are emitting in KafkaIO. However I could not see them on Flink Metric system. Looks like Beam -> Flink wiring is broken. I set metricContext in Checkpointing phase which is the place metrics emit on UnBoundedReader.
   > 
   > @mxm @tweise @angoenka Could you please review my MR ? I tested in our env. It is working as expected.
   
   I wonder, why do we need to write metrics into the context on checkpoint? The current code does so in the call to `advance()`, is that not sufficient?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] je-ik commented on pull request #25554: Backlog metrics do not showing up in FlinkRunner

Posted by "je-ik (via GitHub)" <gi...@apache.org>.
je-ik commented on PR #25554:
URL: https://github.com/apache/beam/pull/25554#issuecomment-1447795221

   If I understand that correctly (and I'm definitely not sure about that :)), then you refer to the time when metrics are compute in the source (KafkaSource in this case). But the wrapper should update the metric container on each call to `advance`, see [1]. I wonder why it is needed to do the same when doing checkpoint (though it sort might make more sense than to do that on each call to `advance()`).- I'm not saying it is wrong, I'd just like to understand the reason.
   
   [1] https://github.com/apache/beam/pull/25554/files#diff-3f0d15eaf4f2c4979c6ac3fb42cd02819b1e5159ad7b28f7ac052ecd0bb21768L60


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] je-ik commented on pull request #25554: Backlog metrics do not showing up in FlinkRunner

Posted by "je-ik (via GitHub)" <gi...@apache.org>.
je-ik commented on PR #25554:
URL: https://github.com/apache/beam/pull/25554#issuecomment-1462829667

   What you describe sounds like a deadlock. That was what I was trying to figure out, the metrics are generated in the checkpoint thread, but should be read in the thread that calls `advance()`. Therefore the backlog metrics _should_ be accessible, the fact that they are not are quite suspicious. But other than that I currently don't have enough understanding of how the metrics are binded between beam metrics and flink metrics, very quickly looking into the code, I do not see how the backlog metrics could get out of the KafkaUnboundedReader, because I'm struggling to find how the metrics are "registered" in any container for export.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #25554: Backlog metrics do not showing up in FlinkRunner

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25554:
URL: https://github.com/apache/beam/pull/25554#issuecomment-1437037169

   Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment `assign set of reviewers`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] je-ik commented on pull request #25554: Backlog metrics do not showing up in FlinkRunner

Posted by "je-ik (via GitHub)" <gi...@apache.org>.
je-ik commented on PR #25554:
URL: https://github.com/apache/beam/pull/25554#issuecomment-1446332418

   > Could we add test to verify that the change brings the expected behavior?
   > 
   > The test might probably just verify that the metrics are available after running a simple pipeline with some source that updates the metrics.
   
   There is an analogous test in `SourceInputFormatTest.testAccumulatorRegistrationOnOperatorClose` that can serve as an inspiration.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] talatuyarer commented on pull request #25554: Backlog metrics do not showing up in FlinkRunner

Posted by "talatuyarer (via GitHub)" <gi...@apache.org>.
talatuyarer commented on PR #25554:
URL: https://github.com/apache/beam/pull/25554#issuecomment-1447818939

   > If I understand that correctly (and I'm definitely not sure about that :)), then you refer to the time when metrics are compute in the source (KafkaSource in this case). But the wrapper should update the metric container on each call to `advance`, see [1]. I wonder why it is needed to do the same when doing checkpoint (though it sort might make more sense than to do that on each call to `advance()`).- I'm not saying it is wrong, I'd just like to understand the reason.
   
   Backlog is reported from getCheckpointMark(), which is done by some other thread. Not sure why it is done there. But this is the main issue. Advance function runs on main thread and it has Metric context so I am able to see element count metrics. However Checkpoint thread does not have context thats why It can not emit metrics. 
   
   If I move reportBacklog() function in advance function i am able to see backlog metrics too. We could do that in advance(), but that would unnecessary overhead for every single record. I am not suggesting to do everything in advance(), it was merely a test for me to verify the problem.  :) 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #25554: Backlog metrics do not showing up in FlinkRunner

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25554:
URL: https://github.com/apache/beam/pull/25554#issuecomment-1578698271

   This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] apilloud commented on pull request #25554: Backlog metrics do not showing up in FlinkRunner

Posted by "apilloud (via GitHub)" <gi...@apache.org>.
apilloud commented on PR #25554:
URL: https://github.com/apache/beam/pull/25554#issuecomment-1442817188

   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] je-ik commented on pull request #25554: Backlog metrics do not showing up in FlinkRunner

Posted by "je-ik (via GitHub)" <gi...@apache.org>.
je-ik commented on PR #25554:
URL: https://github.com/apache/beam/pull/25554#issuecomment-1455785601

   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] talatuyarer commented on pull request #25554: Backlog metrics do not showing up in FlinkRunner

Posted by "talatuyarer (via GitHub)" <gi...@apache.org>.
talatuyarer commented on PR #25554:
URL: https://github.com/apache/beam/pull/25554#issuecomment-1460892717

   @je-ik Looks like this change is not working. Based on initial experiment. Whenever I enable beammetrics Pipeline stop processing. Do you have any suggestion ? I believe there is a concurrency issue. But I still count not define the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] closed pull request #25554: Backlog metrics do not showing up in FlinkRunner

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #25554: Backlog metrics do not showing up in FlinkRunner
URL: https://github.com/apache/beam/pull/25554


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] talatuyarer commented on pull request #25554: Backlog metrics do not showing up in FlinkRunner

Posted by "talatuyarer (via GitHub)" <gi...@apache.org>.
talatuyarer commented on PR #25554:
URL: https://github.com/apache/beam/pull/25554#issuecomment-1447448279

   Hey @je-ik 
   
   > I wonder, why do we need to write metrics into the context on checkpoint? The current code does so in the call to `advance()`, is that not sufficient?
   
   Current code does not emit backlog metrics on advance. I see it is on checkpointing phase https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java#L255
   
   Do I miss anything ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] je-ik commented on a diff in pull request #25554: Backlog metrics do not showing up in FlinkRunner

Posted by "je-ik (via GitHub)" <gi...@apache.org>.
je-ik commented on code in PR #25554:
URL: https://github.com/apache/beam/pull/25554#discussion_r1118751090


##########
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapper.java:
##########
@@ -391,6 +391,9 @@ public void snapshotState(FunctionSnapshotContext functionSnapshotContext) throw
         return;
       }
 
+      ReaderInvocationUtil<OutputT, UnboundedSource.UnboundedReader<OutputT>> readerInvoker =
+          new ReaderInvocationUtil<>(stepName, serializedOptions.get(), metricContainer);

Review Comment:
   This can be moved to a field and reused between calls to `run` and `snapshotState`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] apilloud commented on pull request #25554: Backlog metrics do not showing up in FlinkRunner

Posted by "apilloud (via GitHub)" <gi...@apache.org>.
apilloud commented on PR #25554:
URL: https://github.com/apache/beam/pull/25554#issuecomment-1442634237

   assign set of reviewers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org