You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/09/24 06:48:14 UTC
[GitHub] [incubator-druid] zen201415 opened a new issue #8580:
KafkaIndexingTask pending forever after restart in remote mode
zen201415 opened a new issue #8580: KafkaIndexingTask pending forever after restart in remote mode
URL: https://github.com/apache/incubator-druid/issues/8580
### Affected Version
0.14.2
### Description
- Cluster size
2Master Node,2 Query Node,4Data Node
In fact,if you want to reproduce the problem , you are advised to use only 1 overlord and 1 middlemanager.
- Configurations in use
druid.indexer.runner.type = remote
default configurations
### Reason
Code 1:
org.apache.druid.indexing.overlord.TaskQueue#manage
while (active) {
**for (final Task task : ImmutableList.copyOf(tasks)) {
if (!taskFutures.containsKey(task.getId())) {**
// doSomething...
taskFutures.put(task.getId(), attachCallbacks(task, runnerTaskFuture));
}
}
managementMayBeNecessary.awaitNanos(60s);
}
If a task both in tasks and taskFutures , it will be considered to be monitored with callbacks; when it status changed,it will notify mysql to sync status from zk.
**But what if it status changed, and callbacks did not work ?**
Code 2:
org.apache.druid.indexing.overlord.RemoteTaskRunner#addWorker
zkWorker.addListener(
new PathChildrenCacheListener()
{
public void childEvent(CuratorFramework client, PathChildrenCacheEvent event)
{
...
case CHILD_UPDATED:
...
if ((tmp = runningTasks.get(taskId)) != null) {
taskRunnerWorkItem = tmp;
} else {
final RemoteTaskRunnerWorkItem newTaskRunnerWorkItem = new RemoteTaskRunnerWorkItem(
taskId,
announcement.getTaskType(),
zkWorker.getWorker(),
TaskLocation.unknown(),
announcement.getTaskDataSource()
);
final RemoteTaskRunnerWorkItem existingItem = runningTasks.putIfAbsent(
taskId,
newTaskRunnerWorkItem
);
if (existingItem == null) {
log.warn(
"Worker[%s] announced a status for a task I didn't know about, adding to runningTasks: %s",
zkWorker.getWorker().getHost(),
taskId
);
taskRunnerWorkItem = newTaskRunnerWorkItem;
} else {
taskRunnerWorkItem = existingItem;
}
}
...
if (announcement.getTaskStatus().isComplete()) {
**taskComplete(taskRunnerWorkItem, zkWorker, announcement.getTaskStatus());**
runPendingTasks();
}
}
}
)
if the task in zk not exisit in runningTasks, it will new a taskRunnerWorkItem without attch callbacks on it .And if the task status change to failed,it will execute the new callback . So the task
miss to sync zk status to mysql,but it still stay in tasks and taskFutures.
it will be pending forever.
### Steps to reproduce the problem
1. start overlord ,start coordinator ,start middlemanager ,start historical ,start broker .
2. when there is a task running in middlemanager ,stop middlemanager ,and stop overlord after middlemanager.
3. a few minutes later,start overlord.
4. tail -f overlord's log , when you see "Beginning management".it means start code 1, 30s or 90s after it appeared, start middlemanager.
5. after middlemanager started , there would be a task in pending status.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org