You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2022/09/06 19:34:45 UTC

[GitHub] [kafka] wcarlson5 opened a new pull request, #12594: Minor: When building streams metadata we want to build even if the ho…

wcarlson5 opened a new pull request, #12594:
URL: https://github.com/apache/kafka/pull/12594

   When building streams metadata we want to build even if the host is empty as it is a common way to find the other host addresses
   
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] ableegoldman merged pull request #12594: MINOR: Include all hosts in metadata for topology

Posted by GitBox <gi...@apache.org>.
ableegoldman merged PR #12594:
URL: https://github.com/apache/kafka/pull/12594


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] wcarlson5 commented on a diff in pull request #12594: MINOR: Include all hosts in metadata for topology

Posted by GitBox <gi...@apache.org>.
wcarlson5 commented on code in PR #12594:
URL: https://github.com/apache/kafka/pull/12594#discussion_r964956521


##########
streams/src/test/java/org/apache/kafka/streams/integration/NamedTopologyIntegrationTest.java:
##########
@@ -924,8 +926,10 @@ private void verifyPartitionsAndStoresForTopology(final String topologyName, fin
                                                             .collect(Collectors.toList())),
             is(true));
 
-        // then verify that only this topology's one store appears
-        assertThat(metadata.stateStoreNames(), equalTo(singleton("store-" + topologyName)));
+        // then verify that only this topology's one store appears if the host has partitions assigned
+        if (!metadata.topicPartitions().isEmpty()) {
+            assertThat(metadata.stateStoreNames(), equalTo(singleton("store-" + topologyName)));
+        }

Review Comment:
   Well, not all named topology topics are going to be in all the source metadata, we can check the stores are limited to the relevant topics. Though I am not sure how useful that is as the method for getting the metadata is filtering the store by if they contain the topology name.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] wcarlson5 commented on a diff in pull request #12594: MINOR: Include all hosts in metadata for topology

Posted by GitBox <gi...@apache.org>.
wcarlson5 commented on code in PR #12594:
URL: https://github.com/apache/kafka/pull/12594#discussion_r964956521


##########
streams/src/test/java/org/apache/kafka/streams/integration/NamedTopologyIntegrationTest.java:
##########
@@ -924,8 +926,10 @@ private void verifyPartitionsAndStoresForTopology(final String topologyName, fin
                                                             .collect(Collectors.toList())),
             is(true));
 
-        // then verify that only this topology's one store appears
-        assertThat(metadata.stateStoreNames(), equalTo(singleton("store-" + topologyName)));
+        // then verify that only this topology's one store appears if the host has partitions assigned
+        if (!metadata.topicPartitions().isEmpty()) {
+            assertThat(metadata.stateStoreNames(), equalTo(singleton("store-" + topologyName)));
+        }

Review Comment:
   Well, not all named topology topics are going to be in all the source metadata, we can check the stores are limited to the relevant topics. This might just be useful for is the topic partitions are empty.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] ableegoldman commented on pull request #12594: MINOR: Include all hosts in metadata for topology

Posted by GitBox <gi...@apache.org>.
ableegoldman commented on PR #12594:
URL: https://github.com/apache/kafka/pull/12594#issuecomment-1271097851

   > Wow every single test passed. Don't see that very often on a pr
   
   Yeah that is legitimately amazing. Love to see it! Maybe things are turning around...heh... 😭 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] vvcephei commented on pull request #12594: MINOR: Include all hosts in metadata for topology

Posted by GitBox <gi...@apache.org>.
vvcephei commented on PR #12594:
URL: https://github.com/apache/kafka/pull/12594#issuecomment-1250971460

   ```
   [2022-09-07T15:38:17.417Z] [ant:checkstyle] [ERROR] /home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-12594/streams/src/test/java/org/apache/kafka/streams/integration/NamedTopologyIntegrationTest.java:64:8: Unused import - java.util.Collections. [UnusedImports]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] vvcephei commented on a diff in pull request #12594: MINOR: Include all hosts in metadata for topology

Posted by GitBox <gi...@apache.org>.
vvcephei commented on code in PR #12594:
URL: https://github.com/apache/kafka/pull/12594#discussion_r964920274


##########
streams/src/test/java/org/apache/kafka/streams/integration/NamedTopologyIntegrationTest.java:
##########
@@ -924,8 +926,10 @@ private void verifyPartitionsAndStoresForTopology(final String topologyName, fin
                                                             .collect(Collectors.toList())),
             is(true));
 
-        // then verify that only this topology's one store appears
-        assertThat(metadata.stateStoreNames(), equalTo(singleton("store-" + topologyName)));
+        // then verify that only this topology's one store appears if the host has partitions assigned
+        if (!metadata.topicPartitions().isEmpty()) {
+            assertThat(metadata.stateStoreNames(), equalTo(singleton("store-" + topologyName)));
+        }

Review Comment:
   It seems like this test is generally missing the converse:
   
   In addition to checking if all the assigned topics are in the named topology sources, should we also check that all named topology sources are assigned?
   
   Likewise, in this new block, should we assert that the store is not present if the partition is not assigned?



##########
streams/src/test/java/org/apache/kafka/streams/integration/NamedTopologyIntegrationTest.java:
##########
@@ -422,6 +422,8 @@ public void shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorr
                 TOPOLOGY_1,
                 streams.allStreamsClientsMetadataForTopology(TOPOLOGY_1),
                 streams2.allStreamsClientsMetadataForTopology(TOPOLOGY_1));
+            assertThat(streams.allStreamsClientsMetadataForTopology(TOPOLOGY_2).size(), equalTo(2));
+            assertThat(streams2.allStreamsClientsMetadataForTopology(TOPOLOGY_2).size(), equalTo(2));

Review Comment:
   Should we also verify that topology1 metadata includes the full set of hosts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] wcarlson5 commented on a diff in pull request #12594: MINOR: Include all hosts in metadata for topology

Posted by GitBox <gi...@apache.org>.
wcarlson5 commented on code in PR #12594:
URL: https://github.com/apache/kafka/pull/12594#discussion_r964950624


##########
streams/src/test/java/org/apache/kafka/streams/integration/NamedTopologyIntegrationTest.java:
##########
@@ -422,6 +422,8 @@ public void shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorr
                 TOPOLOGY_1,
                 streams.allStreamsClientsMetadataForTopology(TOPOLOGY_1),
                 streams2.allStreamsClientsMetadataForTopology(TOPOLOGY_1));
+            assertThat(streams.allStreamsClientsMetadataForTopology(TOPOLOGY_2).size(), equalTo(2));
+            assertThat(streams2.allStreamsClientsMetadataForTopology(TOPOLOGY_2).size(), equalTo(2));

Review Comment:
   Sure why not



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] wcarlson5 commented on pull request #12594: MINOR: Include all hosts in metadata for topology

Posted by GitBox <gi...@apache.org>.
wcarlson5 commented on PR #12594:
URL: https://github.com/apache/kafka/pull/12594#issuecomment-1255288713

   @vvcephei Wow every single test passed. Don't see that very often on a pr


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] ableegoldman commented on a diff in pull request #12594: MINOR: Include all hosts in metadata for topology

Posted by GitBox <gi...@apache.org>.
ableegoldman commented on code in PR #12594:
URL: https://github.com/apache/kafka/pull/12594#discussion_r989672094


##########
streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamsMetadataState.java:
##########
@@ -389,22 +389,18 @@ private List<StreamsMetadata> rebuildMetadataForNamedTopologies(final Map<HostIn
                         standbyStoresOnHost.addAll(getStoresOnHost(storeToSourceTopics, standbyPartitionsOnHost));
                     }
 
-                    if (!(activeStoresOnHost.isEmpty() && activePartitionsOnHost.isEmpty() && standbyStoresOnHost.isEmpty() && standbyPartitionsOnHost.isEmpty())) {

Review Comment:
   I'm trying to remember what the original intention behind this check was, I imagine it was to avoid rebuilding the metadata unnecessarily and that we did not have this check before named topologies because this only becomes nontrivial as the metadata rebuild cost scales with the number of topologies...? 
   
   That said, wouldn't we still want/need to update the metadata if it's become empty? It seems like we should actually only ever skip it when the delta is zero/empty, not when the current set is empty...but I guess that's the root of this fix to begin with. Anyways we should err on the side of correctness rather than performance issues, especially perf issues we haven't seen/confirmed to exist



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org