You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2021/09/22 14:54:34 UTC

[GitHub] [kafka] mprusakov opened a new pull request #11353: Reducing amount of garbage that gets generated during a poll operation

mprusakov opened a new pull request #11353:
URL: https://github.com/apache/kafka/pull/11353


   Presently KafkaConsumer creates an enourmous amount of garbage during a poll. Polls are generaly very frequent and so allocations during a poll decrease efficiency of the overall solution. This commit removes a large (and unnecessary allocation).
   
   This is a non functional change and so has been tested by existing tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] dajac commented on pull request #11353: KAFKA-13322: Reducing amount of garbage that gets generated during a poll operation

Posted by GitBox <gi...@apache.org>.
dajac commented on pull request #11353:
URL: https://github.com/apache/kafka/pull/11353#issuecomment-1030306522


   @mprusakov Are you still interested by doing this? I can help reviewing the PR once the existing comments are addressed. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] cmccabe commented on a change in pull request #11353: KAFKA-13322: Reducing amount of garbage that gets generated during a poll operation

Posted by GitBox <gi...@apache.org>.
cmccabe commented on a change in pull request #11353:
URL: https://github.com/apache/kafka/pull/11353#discussion_r740554575



##########
File path: clients/src/main/java/org/apache/kafka/clients/consumer/internals/SubscriptionState.java
##########
@@ -423,9 +423,9 @@ synchronized int numAssignedPartitions() {
     }
 
     // Visible for testing
-    public synchronized List<TopicPartition> fetchablePartitions(Predicate<TopicPartition> isAvailable) {
+    // This method returns its results in a list provided as it is on hot path and causes considerable pressure on gc otherwise

Review comment:
       Can you add JavaDoc explaining that the result is stored in the final parameter?
   
   Also, if we are not allocating a new List here, I don't see why we should return anything from this function.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] cmccabe commented on a change in pull request #11353: KAFKA-13322: Reducing amount of garbage that gets generated during a poll operation

Posted by GitBox <gi...@apache.org>.
cmccabe commented on a change in pull request #11353:
URL: https://github.com/apache/kafka/pull/11353#discussion_r740554575



##########
File path: clients/src/main/java/org/apache/kafka/clients/consumer/internals/SubscriptionState.java
##########
@@ -423,9 +423,9 @@ synchronized int numAssignedPartitions() {
     }
 
     // Visible for testing
-    public synchronized List<TopicPartition> fetchablePartitions(Predicate<TopicPartition> isAvailable) {
+    // This method returns its results in a list provided as it is on hot path and causes considerable pressure on gc otherwise

Review comment:
       Can you add JavaDoc explaining that the result is stored in the final parameter?
   
   Also, if we are not allocating a new List here, I don't see why we should return anything from this function.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mprusakov commented on pull request #11353: KAFKA-13322: Reducing amount of garbage that gets generated during a poll operation

Posted by GitBox <gi...@apache.org>.
mprusakov commented on pull request #11353:
URL: https://github.com/apache/kafka/pull/11353#issuecomment-1030340572


   Unfortunately i have realized that this fix is a drop in the ocean and further changes are much more intrusive. This change alone did not meet my client's requirements so I've implemented my own version using your lower level api thus no need in this change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mprusakov commented on a change in pull request #11353: KAFKA-13322: Reducing amount of garbage that gets generated during a poll operation

Posted by GitBox <gi...@apache.org>.
mprusakov commented on a change in pull request #11353:
URL: https://github.com/apache/kafka/pull/11353#discussion_r715587090



##########
File path: clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java
##########
@@ -1153,7 +1155,9 @@ private void handleListOffsetResponse(ListOffsetsResponse listOffsetsResponse,
         for (CompletedFetch completedFetch : completedFetches) {
             exclude.add(completedFetch.partition);
         }
-        return subscriptions.fetchablePartitions(tp -> !exclude.contains(tp));
+        // Need to clear the buffer first otherwise it will contain results from the previous fetch
+        fetchablePartitionsResultBuffer.clear();

Review comment:
       Added the comment requested. Please let me know if further changes are required.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] dongjinleekr commented on a change in pull request #11353: KAFKA-13322: Reducing amount of garbage that gets generated during a poll operation

Posted by GitBox <gi...@apache.org>.
dongjinleekr commented on a change in pull request #11353:
URL: https://github.com/apache/kafka/pull/11353#discussion_r715574255



##########
File path: clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java
##########
@@ -1153,7 +1155,9 @@ private void handleListOffsetResponse(ListOffsetsResponse listOffsetsResponse,
         for (CompletedFetch completedFetch : completedFetches) {
             exclude.add(completedFetch.partition);
         }
-        return subscriptions.fetchablePartitions(tp -> !exclude.contains(tp));
+        // Need to clear the buffer first otherwise it will contain results from the previous fetch
+        fetchablePartitionsResultBuffer.clear();

Review comment:
       I love this approach; however, could you kindly add some justification comments on why `fetchablePartitionsResultBuffer` should be a shared state here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] cmccabe commented on a change in pull request #11353: KAFKA-13322: Reducing amount of garbage that gets generated during a poll operation

Posted by GitBox <gi...@apache.org>.
cmccabe commented on a change in pull request #11353:
URL: https://github.com/apache/kafka/pull/11353#discussion_r740554575



##########
File path: clients/src/main/java/org/apache/kafka/clients/consumer/internals/SubscriptionState.java
##########
@@ -423,9 +423,9 @@ synchronized int numAssignedPartitions() {
     }
 
     // Visible for testing
-    public synchronized List<TopicPartition> fetchablePartitions(Predicate<TopicPartition> isAvailable) {
+    // This method returns its results in a list provided as it is on hot path and causes considerable pressure on gc otherwise

Review comment:
       Can you add JavaDoc explaining that the result is stored in the final parameter?
   
   Also, if we are not allocating a new List here, I don't see why we should return anything from this function.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mprusakov commented on pull request #11353: KAFKA-13322: Reducing amount of garbage that gets generated during a poll operation

Posted by GitBox <gi...@apache.org>.
mprusakov commented on pull request #11353:
URL: https://github.com/apache/kafka/pull/11353#issuecomment-926534122


   @showuon Raised KAFKA-13322 for this and similar issues
   
   for the complexity argument, the present implementation starts off with an empty array list and adds a potentially large amount of entries. To accommodate them the list needs to grow and so copy the existing vaues (https://github.com/openjdk/jdk/blob/739769c8fc4b496f08a92225a12d07414537b6c0/src/java.base/share/classes/java/util/ArrayList.java#L237). Since the new implementation doesn't need to do any of that it is actualy a lot more efficient.
   
   In my test that particular allocation was 19.4%
   
   Please let me know if any further changes are required. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mprusakov closed pull request #11353: KAFKA-13322: Reducing amount of garbage that gets generated during a poll operation

Posted by GitBox <gi...@apache.org>.
mprusakov closed pull request #11353:
URL: https://github.com/apache/kafka/pull/11353


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org