You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/03/11 11:55:02 UTC

[GitHub] [flink] JingGe opened a new pull request #19060: [FLINK-26532][metrics][test] using the numRecordsSend counter to get the correct metric

JingGe opened a new pull request #19060:
URL: https://github.com/apache/flink/pull/19060


   ## What is the purpose of the change
   
   Fix the flaky test in SinkMetricsITCase.
   
   Since the target metric for the test `numRecordsSend` is initialized in `SinkWriterMetricGroup` via parent `OperatorMetricGroup` and the SinkMetricsITCase is working on the abstract layer of `OperatorMetricGroup`. There is no direct access to the `SinkWriterMetricGroup` and the `numRecordsSend` counter.
   
   
   ## Brief change log
   
   *(for example:)*
     - check the metric by getting the numRecordsSend metric from `InMemoryReporter` and cast it to `Counter`
   
   ## Verifying this change
   
   This change is to fix a flaky test.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / **no**)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
     - The serializers: (yes / **no** / don't know)
     - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't know)
     - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (yes / **no**)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #19060: [FLINK-26532][metrics][test] using the numRecordsSend counter to get the correct metric

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #19060:
URL: https://github.com/apache/flink/pull/19060#issuecomment-1065047082


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "870a443f569667a35b1f71c8228269d5563aed3c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32911",
       "triggerID" : "870a443f569667a35b1f71c8228269d5563aed3c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 870a443f569667a35b1f71c8228269d5563aed3c Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32911) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] fapaul commented on a change in pull request #19060: [FLINK-26532][metrics][test] using the numRecordsSend counter to get the correct metric

Posted by GitBox <gi...@apache.org>.
fapaul commented on a change in pull request #19060:
URL: https://github.com/apache/flink/pull/19060#discussion_r824881013



##########
File path: flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkMetricsITCase.java
##########
@@ -130,8 +132,10 @@ private void assertSinkMetrics(
         int subtaskWithMetrics = 0;
         for (OperatorMetricGroup group : groups) {
             Map<String, Metric> metrics = reporter.getMetricsByGroup(group);
-            // there are only 2 splits assigned; so two groups will not update metrics
-            if (group.getIOMetricGroup().getNumRecordsOutCounter().getCount() != 0) {
+            // There are only 2 splits assigned; so two groups will not update metrics.
+            // There is no other way to access the counter via OperatorMetricGroup, we have to use
+            // metrics from the reporter.
+            if (((Counter) metrics.get(MetricNames.NUM_RECORDS_SEND)).getCount() == 0) {

Review comment:
       The test changed from using the records **out** counter to records **send** counter. I guess it should be consistent.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #19060: [FLINK-26532][metrics][test] using the numRecordsSend counter to get the correct metric

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #19060:
URL: https://github.com/apache/flink/pull/19060#issuecomment-1065047082


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "567bb2fb595f846c94951df0cdc58e4d08175890",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "567bb2fb595f846c94951df0cdc58e4d08175890",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 567bb2fb595f846c94951df0cdc58e4d08175890 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #19060: [FLINK-26532][metrics][test] using the numRecordsSend counter to get the correct metric

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #19060:
URL: https://github.com/apache/flink/pull/19060#issuecomment-1065047082


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "870a443f569667a35b1f71c8228269d5563aed3c",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32911",
       "triggerID" : "870a443f569667a35b1f71c8228269d5563aed3c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 870a443f569667a35b1f71c8228269d5563aed3c Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32911) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe edited a comment on pull request #19060: [FLINK-26532][metrics][test] using the numRecordsSend counter to get the correct metric

Posted by GitBox <gi...@apache.org>.
JingGe edited a comment on pull request #19060:
URL: https://github.com/apache/flink/pull/19060#issuecomment-1066524534


   > I do not fully understand the fix. AFAIK the `InternalSinkWriterMetricGroup` delegates everything to the parent metric group so when updating something in the `SinkWriterMetricGroup` it should be immediately available also in the parent group.
   
   It is true. But just because the metric is initialized via the parent, does not means it can be accessed by the parent metric group. if you know any other way to do it except adding extra `@VisibleForTesting` method into `AbstractMetricGroup` which will break its current design, please let me know.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on pull request #19060: [FLINK-26532][metrics][test] using the numRecordsSend counter to get the correct metric

Posted by GitBox <gi...@apache.org>.
JingGe commented on pull request #19060:
URL: https://github.com/apache/flink/pull/19060#issuecomment-1066524534


   > I do not fully understand the fix. AFAIK the `InternalSinkWriterMetricGroup` delegates everything to the parent metric group so when updating something in the `SinkWriterMetricGroup` it should be immediately available also in the parent group.
   
   It is true. But just because the metric is initialized via the parent, does not means it can be accessed by the parent metric group. if you know any other way to do it except adding extra `@VisibleForTesting` method into `AbstractMetricGroup` which will break it current design, please let me know.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #19060: [FLINK-26532][metrics][test] using the numRecordsSend counter to get the correct metric

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #19060:
URL: https://github.com/apache/flink/pull/19060#issuecomment-1065047082


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "870a443f569667a35b1f71c8228269d5563aed3c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "870a443f569667a35b1f71c8228269d5563aed3c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 870a443f569667a35b1f71c8228269d5563aed3c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise commented on a change in pull request #19060: [FLINK-26532][metrics][test] using the numRecordsSend counter to get the correct metric

Posted by GitBox <gi...@apache.org>.
AHeise commented on a change in pull request #19060:
URL: https://github.com/apache/flink/pull/19060#discussion_r825800063



##########
File path: flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkMetricsITCase.java
##########
@@ -130,8 +132,10 @@ private void assertSinkMetrics(
         int subtaskWithMetrics = 0;
         for (OperatorMetricGroup group : groups) {
             Map<String, Metric> metrics = reporter.getMetricsByGroup(group);
-            // there are only 2 splits assigned; so two groups will not update metrics
-            if (group.getIOMetricGroup().getNumRecordsOutCounter().getCount() != 0) {
+            // There are only 2 splits assigned; so two groups will not update metrics.
+            // There is no other way to access the counter via OperatorMetricGroup, we have to use
+            // metrics from the reporter.
+            if (((Counter) metrics.get(MetricNames.NUM_RECORDS_SEND)).getCount() == 0) {

Review comment:
       You actually made it consistent.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #19060: [FLINK-26532][metrics][test] using the numRecordsSend counter to get the correct metric

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #19060:
URL: https://github.com/apache/flink/pull/19060#discussion_r825714830



##########
File path: flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkMetricsITCase.java
##########
@@ -130,8 +132,10 @@ private void assertSinkMetrics(
         int subtaskWithMetrics = 0;
         for (OperatorMetricGroup group : groups) {
             Map<String, Metric> metrics = reporter.getMetricsByGroup(group);
-            // there are only 2 splits assigned; so two groups will not update metrics
-            if (group.getIOMetricGroup().getNumRecordsOutCounter().getCount() != 0) {
+            // There are only 2 splits assigned; so two groups will not update metrics.
+            // There is no other way to access the counter via OperatorMetricGroup, we have to use
+            // metrics from the reporter.
+            if (((Counter) metrics.get(MetricNames.NUM_RECORDS_SEND)).getCount() == 0) {

Review comment:
       Could you explain what does "consistent" mean in this case? The `numRecordsOutCounter` is defined in the `OperatorIOMetricGroup` and the `numRecordsSendCounter` is in the `SinkWriterMetricGroup`. Since they are on different abstraction levels and there is no way to access the `numRecordsSendCounter` from `OperatorMetricGroup`, how could they be consistent? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise merged pull request #19060: [FLINK-26532][metrics][test] using the numRecordsSend counter to get the correct metric

Posted by GitBox <gi...@apache.org>.
AHeise merged pull request #19060:
URL: https://github.com/apache/flink/pull/19060


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org