You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/10/13 23:43:16 UTC

[GitHub] [druid] churromorales commented on pull request #13156: Support for middle manager less druid, tasks launch as k8s jobs

churromorales commented on PR #13156:
URL: https://github.com/apache/druid/pull/13156#issuecomment-1278289541

I noticed a slight issue when testing on heavily loaded k8s clusters. Those that take a while to spin up / down pods.

Currently the way things work:
1. The overlord launches a task, waits for the pod to come up, it not only waits for a pod ip to be available, but also opens a socket connection to the peon processes web server. Once you have a socket connection the task is considered started.
2. Then it will monitor the job to complete, but this is a blocking call, after the job completes, sends pushes task reports, logs and then does a delete for the peon k8s job.

What was problematic here, is I noticed that the process was complete, the task itself was finished, but the k8s operations were slow on my cluster. The task itself took less than a minute to complete, but the entire k8s lifecycle was taking much longer around 10 minutes. (A very heavily loaded cluster).

Thus the task status only updates in the finally block so the interval lock is not released until the status updates. I have another patch I will test on our clusters but a brief description of the patch is this:

1. The overlord launches a task, waits for the pod to come up. No more opening a socket for the webserver. Instead I added 2 TaskActions: One to update the `TaskStatus`, one to update the `TaskLocation`.
2. Now in the `AbstractTask` the setup method will update its own location before the `run` method is called. If the overlord goes down for whatever reason, that is fine. We don't lose anything really, the call fails and the task itself dies.
3. In the cleanup method of the `AbstractTask`we also update the `TaskLocation` to `unknown` and update the status to `success` or `failure`. So when the process exits, we can give up the lock and things wont be blocked. In case the pod goes away during the cleanup, that is okay, the overlord will still monitor the job and report the correct results albeit a bit slower.

I thought about this approach originally, but didn't want to make too many changes to `core`, but after running this patch on our clusters, I do think this is the best approach. I will call it out in a separate commit so you guys can review this work, it doesn't seem too abrasive to me, but we should focus on correctness here and not hold the locks longer than necessary as we were doing in the forking task runner.

I will test this out over the next day or so and then add the commit to this PR.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org