You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Scott Hendricks (Jira)" <ji...@apache.org> on 2021/06/28 21:22:00 UTC

[jira] [Created] (KAFKA-13004) Trogdor performance decreases sharply with large amounts of tasks.

Scott Hendricks created KAFKA-13004:
---------------------------------------

             Summary: Trogdor performance decreases sharply with large amounts of tasks.
                 Key: KAFKA-13004
                 URL: https://issues.apache.org/jira/browse/KAFKA-13004
             Project: Kafka
          Issue Type: Bug
          Components: tools
         Environment: We run our Trogdor clusters within Kubernetes.
            Reporter: Scott Hendricks
            Assignee: Scott Hendricks


As part of my performance tests, I am running 3000 workloads within Trogdor.  The clients seem to be able to handle this fine, but when I go to reset and run the same test again, Trogdor seems sluggish.

Here are the steps to reproduce this:
 # Run 3000 workloads in Trogdor, a combination of Produce/Consume workloads.
 # Wait for the workloads to complete.
 # Run the DELETE API calls to destroy all 3000 workloads to reset for the next run.
 # Confirm via the API that there are no workloads defined in the system.
 # Run an additional 3000 workloads in Trogdor similar to step 1.

The Coordinator takes a long time to start the second batch of 3000. There seems to be some performance issue in the framework that will take a while to debug. At this point I don't know if it only affects the Coordinator, or if the Agents are affected as well. I do not currently have the time to look into this, so I am creating this issue to track it.

The workaround I am employing is destroying and recreating the Trogdor cluster in between test runs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)