You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/03/18 07:46:24 UTC

[GitHub] [druid] splunk-tschetter commented on issue #8139: KIS throwing error messages

splunk-tschetter commented on issue #8139:
URL: https://github.com/apache/druid/issues/8139#issuecomment-801706838

As @rgidwani-splunk mentions, we ran into this with druid-0.18. In our case, it seems to be some sort of race condition that exists when we assign many partitions to a single task. At least in our case, it also seems to be the result of quite a large cocktail of issues. For the bug that pdeva reported, his log line only has a single partition, so I'm not sure if what pdeva experienced and what we experienced are really the same thing, but I'll go into what I've been able to piece together.

I'll start by saying that this issue and error resulted in the ingestion not making any progress (all tasks would fail because of not being able to commit offsets). We worked around that issue by setting the number of tasks equal to the number of kafka partitions that we have, this has allowed our tasks to make forward progress without hindrance.

After looking at the logs from the overlord and tasks that failed, it appears that what happens is

1) At the end of the time of the task (1 hour), the overlord pauses, gets the current offsets and sets end offsets for the tasks in the replica sets.
2) Each task accepts the intended end offsets and starts processing to try to achieve them
3) After 30 minutes, the tasks still haven't caught up and committed a full set of segments. They get killed.
4) In the meantime, the "new hour" tasks never get to commit anything because the previous tasks haven't actually completed committing their own set of tasks yet (i.e. when the new tasks go to commit, the offsets sitting in the DB for the most up to date segment are still ones from the previous tasks that were trying to work their way up to the final offsets)
5) Because the previous tasks were killed without ever catching up, the system gets into a bad state where the supervisor is firing up new tasks based on the offsets that the previous tasks should've worked their way up to, but the previous tasks are not around to complete their job.

Now, the cocktail that makes this actually work out is a mix of tuning parameters. Specifically,
a) the tasks where we ran into this are committing local persists very rapidly such that they are essentially always throttled on waiting for the next persist to hit disk. This means that they actually make rather slow progress through the data.
b) On top of that, with 8 partitions per task, there's was more of a chance for differences in rate of processing of data between the tasks to cause a significant backlog that needs to be caught up

By switching to just one partition per task, we are guaranteed that at least one of the tasks is actually fully caught up with the final offset, which means that our metadata is in order, even though things are still relatively poorly configured for the given data that we are ingesting.

It seems like the only recourse for trying to get back to a good state in this case might be for the supervisor to somehow detect that it's not making progress and adjust the offsets that it uses. A maybe meaningful heuristic could be: Looking for no actively running tasks ++ a mismatch between the offsets committed for the ingestion and the offsets that it gives for new tasks.

Additionally, I have not looked to see if this is still an issue in the latest version.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org