You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/09/24 06:48:14 UTC

[GitHub] [incubator-druid] zen201415 opened a new issue #8580: KafkaIndexingTask pending forever after restart in remote mode

zen201415 opened a new issue #8580:  KafkaIndexingTask pending forever after restart in remote mode
URL: https://github.com/apache/incubator-druid/issues/8580
 
 
   
   ### Affected Version
   
   0.14.2
   
   ### Description   
   
   - Cluster size   
   
   2Master Node,2 Query Node,4Data Node 
   In fact,if you want to  reproduce the problem , you are advised to use only 1 overlord and 1 middlemanager.
   
   - Configurations in use
   
   druid.indexer.runner.type = remote 
   
   default configurations 
   
   ###   Reason    
   
   Code 1:
   org.apache.druid.indexing.overlord.TaskQueue#manage
     while (active) {
       **for (final Task task : ImmutableList.copyOf(tasks)) {
             if (!taskFutures.containsKey(task.getId())) {**
              // doSomething...
              taskFutures.put(task.getId(), attachCallbacks(task, runnerTaskFuture));
            }
        }
   
      managementMayBeNecessary.awaitNanos(60s);
   }
   
   If a task both  in tasks and taskFutures , it will be considered to be monitored with callbacks; when it status changed,it will notify mysql to sync status from zk.
   
   **But what if it status changed, and callbacks did not work ?**
   
   Code 2:  
      
   org.apache.druid.indexing.overlord.RemoteTaskRunner#addWorker      
   
   zkWorker.addListener(
   
   new PathChildrenCacheListener()
   {
   
   public void childEvent(CuratorFramework client, PathChildrenCacheEvent event)
   {
         ...
   
   case CHILD_UPDATED:
   
         ...
   
   if ((tmp = runningTasks.get(taskId)) != null) {
     taskRunnerWorkItem = tmp;
   } else {
     final RemoteTaskRunnerWorkItem newTaskRunnerWorkItem = new RemoteTaskRunnerWorkItem(
         taskId,
         announcement.getTaskType(),
         zkWorker.getWorker(),
         TaskLocation.unknown(),
         announcement.getTaskDataSource()
     );
     final RemoteTaskRunnerWorkItem existingItem = runningTasks.putIfAbsent(
         taskId,
         newTaskRunnerWorkItem
     );
     if (existingItem == null) {
       log.warn(
           "Worker[%s] announced a status for a task I didn't know about, adding to runningTasks: %s",
           zkWorker.getWorker().getHost(),
           taskId
       );
       taskRunnerWorkItem = newTaskRunnerWorkItem;
     } else {
       taskRunnerWorkItem = existingItem;
     }
   }
   ...
   if (announcement.getTaskStatus().isComplete()) {
     **taskComplete(taskRunnerWorkItem, zkWorker, announcement.getTaskStatus());**
     runPendingTasks();
   }
   }
   }
   )
   
   if the task in zk not exisit in runningTasks, it will new a taskRunnerWorkItem  without attch callbacks on it .And if the task status change to failed,it will execute the new callback . So the task 
   miss to sync zk status to mysql,but it still stay in  tasks and taskFutures.    
   
   it will be pending forever.
   
   
   ###   Steps to reproduce the problem
   
   1.  start overlord ,start coordinator ,start middlemanager ,start historical ,start broker .          
   2.   when there is a task running in middlemanager ,stop middlemanager ,and stop overlord after middlemanager.                     
   3.   a few minutes later,start overlord.       
   4. tail -f overlord's log , when you see "Beginning management".it means start code 1, 30s or 90s after it appeared, start middlemanager.                               
   5.  after middlemanager started , there would be a task in pending status.                 
    
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org