You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "hgeraldino (via GitHub)" <gi...@apache.org> on 2023/02/03 04:48:15 UTC

[GitHub] [kafka] hgeraldino opened a new pull request, #13193: KAFKA-14659 source-record-write-[rate|total] metrics include filtered records

hgeraldino opened a new pull request, #13193:
URL: https://github.com/apache/kafka/pull/13193

   This PR aligns the description for `source-record-write-rate` and `source-record-write-total` metrics, and excludes the filtered records from the metrics being published.
   
   Today, the `source-record-poll-[rate|total]` and `source-record-write-[rate|total]` are publishing similar values (except for cases where exceptions are thrown and retries are needed, which might throw off the rate a little bit). This means that it is not possible to find an accurate number of records written to Kafka by source connectors that use filter predicates.
   
   **Note**: the test for this PR was written on top of https://github.com/apache/kafka/pull/13191, so a rebase will be required after https://github.com/apache/kafka/pull/13191 is merged.
   
   Finally, I think it makes sense to publish a new metric with the number of filtered records, so I'll file a new KIP for it.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] C0urante commented on a diff in pull request #13193: KAFKA-14659 source-record-write-[rate|total] metrics include filtered records

Posted by "C0urante (via GitHub)" <gi...@apache.org>.
C0urante commented on code in PR #13193:
URL: https://github.com/apache/kafka/pull/13193#discussion_r1119262008


##########
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/ConnectMetricsRegistry.java:
##########
@@ -187,7 +187,8 @@ public ConnectMetricsRegistry(Set<String> tags) {
         sourceRecordWriteTotal = createTemplate("source-record-write-total", SOURCE_TASK_GROUP_NAME,
                                                 "The number of records output from the transformations and written to Kafka for this" +
                                                 " task belonging to the named source connector in this worker, since the task was " +
-                                                "last restarted.",
+                                                "last restarted. This is after transformations are applied and excludes any records " +
+                                                "filtered out by the transformations.",

Review Comment:
   👍



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] hgeraldino commented on a diff in pull request #13193: KAFKA-14659 source-record-write-[rate|total] metrics include filtered records

Posted by "hgeraldino (via GitHub)" <gi...@apache.org>.
hgeraldino commented on code in PR #13193:
URL: https://github.com/apache/kafka/pull/13193#discussion_r1119279751


##########
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/ConnectMetricsRegistry.java:
##########
@@ -187,7 +187,8 @@ public ConnectMetricsRegistry(Set<String> tags) {
         sourceRecordWriteTotal = createTemplate("source-record-write-total", SOURCE_TASK_GROUP_NAME,
                                                 "The number of records output from the transformations and written to Kafka for this" +
                                                 " task belonging to the named source connector in this worker, since the task was " +
-                                                "last restarted.",
+                                                "last restarted. This is after transformations are applied and excludes any records " +
+                                                "filtered out by the transformations.",

Review Comment:
   Updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] hgeraldino commented on pull request #13193: KAFKA-14659 source-record-write-[rate|total] metrics include filtered records

Posted by "hgeraldino (via GitHub)" <gi...@apache.org>.
hgeraldino commented on PR #13193:
URL: https://github.com/apache/kafka/pull/13193#issuecomment-1446570858

   Thanks @clolov, 
   
   Now that https://github.com/apache/kafka/pull/13191 has been merged, this PR has been rebased and is ready for review. 
   
   CC @C0urante 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] hgeraldino commented on a diff in pull request #13193: KAFKA-14659 source-record-write-[rate|total] metrics include filtered records

Posted by "hgeraldino (via GitHub)" <gi...@apache.org>.
hgeraldino commented on code in PR #13193:
URL: https://github.com/apache/kafka/pull/13193#discussion_r1119228442


##########
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/ConnectMetricsRegistry.java:
##########
@@ -187,7 +187,8 @@ public ConnectMetricsRegistry(Set<String> tags) {
         sourceRecordWriteTotal = createTemplate("source-record-write-total", SOURCE_TASK_GROUP_NAME,
                                                 "The number of records output from the transformations and written to Kafka for this" +
                                                 " task belonging to the named source connector in this worker, since the task was " +
-                                                "last restarted.",
+                                                "last restarted. This is after transformations are applied and excludes any records " +
+                                                "filtered out by the transformations.",

Review Comment:
   Ok, how about something like: 
   
   `
   The average per-second number of records written to Kafka for this task belonging to the named source connector in this worker. This is after transformations are applied, and excludes any records filtered out by the transformations.
   `



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] C0urante merged pull request #13193: KAFKA-14659 source-record-write-[rate|total] metrics include filtered records

Posted by "C0urante (via GitHub)" <gi...@apache.org>.
C0urante merged PR #13193:
URL: https://github.com/apache/kafka/pull/13193


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] hgeraldino commented on a diff in pull request #13193: KAFKA-14659 source-record-write-[rate|total] metrics include filtered records

Posted by "hgeraldino (via GitHub)" <gi...@apache.org>.
hgeraldino commented on code in PR #13193:
URL: https://github.com/apache/kafka/pull/13193#discussion_r1119214853


##########
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/ConnectMetricsRegistry.java:
##########
@@ -187,7 +187,8 @@ public ConnectMetricsRegistry(Set<String> tags) {
         sourceRecordWriteTotal = createTemplate("source-record-write-total", SOURCE_TASK_GROUP_NAME,
                                                 "The number of records output from the transformations and written to Kafka for this" +
                                                 " task belonging to the named source connector in this worker, since the task was " +
-                                                "last restarted.",
+                                                "last restarted. This is after transformations are applied and excludes any records " +
+                                                "filtered out by the transformations.",

Review Comment:
   Fair question. The `source-record-write-rate` template declared just above  (L181) includes this text, so I thought about adding it here as well to maintain uniformity between the two.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] C0urante commented on a diff in pull request #13193: KAFKA-14659 source-record-write-[rate|total] metrics include filtered records

Posted by "C0urante (via GitHub)" <gi...@apache.org>.
C0urante commented on code in PR #13193:
URL: https://github.com/apache/kafka/pull/13193#discussion_r1119219553


##########
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/ConnectMetricsRegistry.java:
##########
@@ -187,7 +187,8 @@ public ConnectMetricsRegistry(Set<String> tags) {
         sourceRecordWriteTotal = createTemplate("source-record-write-total", SOURCE_TASK_GROUP_NAME,
                                                 "The number of records output from the transformations and written to Kafka for this" +
                                                 " task belonging to the named source connector in this worker, since the task was " +
-                                                "last restarted.",
+                                                "last restarted. This is after transformations are applied and excludes any records " +
+                                                "filtered out by the transformations.",

Review Comment:
   Ah yeah, uniformity is probably worth striving for here. IMO adding this sentence makes some of language earlier in the description redundant; WDYT about changing the descriptions of both to start with "The [average per-second] number of records written to Kafka for this" and dropping the "output from the transformations and" phrase altogether?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] clolov commented on pull request #13193: KAFKA-14659 source-record-write-[rate|total] metrics include filtered records

Posted by "clolov (via GitHub)" <gi...@apache.org>.
clolov commented on PR #13193:
URL: https://github.com/apache/kafka/pull/13193#issuecomment-1415746102

   Thank you for spotting this behaviour and contributing a fix! The proposed solution makes sense to me, I will just add @C0urante to put the appropriate tags for "connect" to this pull request and review it once it has been rebased.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] C0urante commented on a diff in pull request #13193: KAFKA-14659 source-record-write-[rate|total] metrics include filtered records

Posted by "C0urante (via GitHub)" <gi...@apache.org>.
C0urante commented on code in PR #13193:
URL: https://github.com/apache/kafka/pull/13193#discussion_r1097934299


##########
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/ConnectMetricsRegistry.java:
##########
@@ -187,7 +187,8 @@ public ConnectMetricsRegistry(Set<String> tags) {
         sourceRecordWriteTotal = createTemplate("source-record-write-total", SOURCE_TASK_GROUP_NAME,
                                                 "The number of records output from the transformations and written to Kafka for this" +
                                                 " task belonging to the named source connector in this worker, since the task was " +
-                                                "last restarted.",
+                                                "last restarted. This is after transformations are applied and excludes any records " +
+                                                "filtered out by the transformations.",

Review Comment:
   Is this clarification necessary? The description already includes the phrase "records output from the transformations and written to Kafka".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] C0urante commented on a diff in pull request #13193: KAFKA-14659 source-record-write-[rate|total] metrics include filtered records

Posted by "C0urante (via GitHub)" <gi...@apache.org>.
C0urante commented on code in PR #13193:
URL: https://github.com/apache/kafka/pull/13193#discussion_r1119219553


##########
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/ConnectMetricsRegistry.java:
##########
@@ -187,7 +187,8 @@ public ConnectMetricsRegistry(Set<String> tags) {
         sourceRecordWriteTotal = createTemplate("source-record-write-total", SOURCE_TASK_GROUP_NAME,
                                                 "The number of records output from the transformations and written to Kafka for this" +
                                                 " task belonging to the named source connector in this worker, since the task was " +
-                                                "last restarted.",
+                                                "last restarted. This is after transformations are applied and excludes any records " +
+                                                "filtered out by the transformations.",

Review Comment:
   Ah yeah, uniformity is probably worth striving for here. IMO adding this sentence makes some of language earlier in the description redundant; WDYT about changing the descriptions of both to start with "The [average per-second] number of records written to Kafka for this"?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org