You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/05/04 16:38:26 UTC

[GitHub] [druid] shachar-ash opened a new issue #11196: Ingestion tasks stuck on PENDING for a while

shachar-ash opened a new issue #11196:
URL: https://github.com/apache/druid/issues/11196


   ### Affected Version
   
   We are using version 0.21.0 
   
   ### Description
   
   When running ingestion tasks we see that some of our tasks are stuck at "PENDING" state before starting even though there are no running tasks for the same interval or even the same datasource. Also, there are plenty of available slots in the middle-managers where we saw this error.
   
   - Cluster size
      We have ~30 middle-managers of different types and JVMs - We have different middle-managers for native ingestion/EMR ingestion/Kafka ingestion.
      We run ~20k ingestion tasks a day.
   
   - Configurations in use
      We use a custom JavaScript affinity to determine where to navigate each task.
   
   - The error message or stack traces encountered. Providing more context, such as nearby log messages or even entire logs, can be helpful.
     We didn't encounter any error, the task is stuck at PENDING state for a while, no task log is available and we couldn't find any indicative message in the middle-manager logs.
   
   - Any debugging that you have already done
   * Searched middle-manager logs
   * Searched task logs
   
   Any help would be really appreciated!
   
   Thanks,
   Shachar


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on issue #11196: Ingestion tasks stuck on PENDING for a while

Posted by GitBox <gi...@apache.org>.
jihoonson commented on issue #11196:
URL: https://github.com/apache/druid/issues/11196#issuecomment-832086040


   Hi @shachar-ash, thank you for the report. The pending task status means that the Overlord thinks there is no task slot available for the given task for some reason. Can you look at the overlord logs and see if there is any interesting logs? Also, do you see any pattern in those pending tasks? Are they all different task types (`index_kafka`, `index_parallel`, `index_hadoop`)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] shachar-ash commented on issue #11196: Ingestion tasks stuck on PENDING for a while

Posted by GitBox <gi...@apache.org>.
shachar-ash commented on issue #11196:
URL: https://github.com/apache/druid/issues/11196#issuecomment-832103267


   Hi @jihoonson, first of all, thanks for the quick response, I really appreciate it!
   
   I'm seeing a lot of these logs in overlord logs:
   
   > 2021-05-04T16:59:03,632 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.RemoteTaskRunner - Assigned a task[my_task_123_20210504_2021-04-04_2021-05-04T16:59:01_aeab9d4c-40d6-4479-a717-cbdd199e9e43] that is already pending!
   
   We currently don't use index_parallel, earlier this week we upgraded our cluster and it wasn't available for us before.
   It doesn't happen for `index_kafka` tasks, only for `index` and `index_hadoop` tasks.
   
   We suffer from it the most in a middle-manager with only 1 worker, and therefore no tasks are running when it happens on it. 
   
   Couldn't find a strong pattern so far, but I'll do my best to note every time it happened and update here when I'll have more information.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org