You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@kafka.apache.org by "Scott Hendricks (Jira)" <ji...@apache.org> on 2021/06/28 21:22:00 UTC

[jira] [Created] (KAFKA-13004) Trogdor performance decreases sharply with large amounts of tasks.

Scott Hendricks created KAFKA-13004:
---------------------------------------

Summary: Trogdor performance decreases sharply with large amounts of tasks.
Key: KAFKA-13004
URL: https://issues.apache.org/jira/browse/KAFKA-13004
Project: Kafka
Issue Type: Bug
Components: tools
Environment: We run our Trogdor clusters within Kubernetes.
Reporter: Scott Hendricks
Assignee: Scott Hendricks

As part of my performance tests, I am running 3000 workloads within Trogdor. The clients seem to be able to handle this fine, but when I go to reset and run the same test again, Trogdor seems sluggish.

Here are the steps to reproduce this:
# Run 3000 workloads in Trogdor, a combination of Produce/Consume workloads.
# Wait for the workloads to complete.
# Run the DELETE API calls to destroy all 3000 workloads to reset for the next run.
# Confirm via the API that there are no workloads defined in the system.
# Run an additional 3000 workloads in Trogdor similar to step 1.

The Coordinator takes a long time to start the second batch of 3000. There seems to be some performance issue in the framework that will take a while to debug. At this point I don't know if it only affects the Coordinator, or if the Agents are affected as well. I do not currently have the time to look into this, so I am creating this issue to track it.

The workaround I am employing is destroying and recreating the Trogdor cluster in between test runs.

--
This message was sent by Atlassian Jira
(v8.3.4#803005)