You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/06/30 03:21:29 UTC

[GitHub] [druid] didip opened a new issue #11397: Too many PENDING tasks even though middle managers have plenty of capacity.

didip opened a new issue #11397:
URL: https://github.com/apache/druid/issues/11397


   Please provide a detailed title (e.g. "Broker crashes when using TopN query with Bound filter" instead of just "Broker crashes").
   
   ### Affected Version
   
   0.21.1
   
   ### Description
   
   Please include as much detailed information about the problem as possible.
   - 15 middle managers with 20 workers each.
   - Druid is running inside Kubernetes. each middle manager has 32GB RAM and 20 cores requested.
   - When launching a native index_parallel task with maximum subtasks of 100, only 20 are running and the rest are PENDING even though I have plenty of capacity.
   
   Why can't Druid runs all 100 subtasks? In theory I have 15*20 = 300 capacity.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on issue #11397: Too many PENDING tasks even though middle managers have plenty of capacity.

Posted by GitBox <gi...@apache.org>.
jihoonson commented on issue #11397:
URL: https://github.com/apache/druid/issues/11397#issuecomment-871107066


   It would be more useful if you see the lifecycle of only one or a few tasks, such as when it was submitted, when it started pending, when it got running, where it was running, etc.
   
   > 2021-06-30T04:38:57,607 ERROR [rtr-pending-tasks-runner-0] org.apache.druid.indexing.overlord.RemoteTaskRunner - Task assignment timed out on worker [172.19.214.19:8091], never ran task [single_phase_sub_task_mydata_cgbgomoi_2021-06-30T03:51:48.075Z]! Timeout: (513604 >= PT5M)!: {class=org.apache.druid.indexing.overlord.RemoteTaskRunner}
   
   This means the middleManager didn't ack to the new task assignment request in 5 min. The remote taskRunner uses ZooKeeper for task assignment, so you may want to check around it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] didip closed issue #11397: Too many PENDING tasks even though middle managers have plenty of capacity.

Posted by GitBox <gi...@apache.org>.
didip closed issue #11397:
URL: https://github.com/apache/druid/issues/11397


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] didip commented on issue #11397: Too many PENDING tasks even though middle managers have plenty of capacity.

Posted by GitBox <gi...@apache.org>.
didip commented on issue #11397:
URL: https://github.com/apache/druid/issues/11397#issuecomment-871558264


   OK, so I did something,
   
   1. I shutdown all overlords and middle managers.
   
   2. I cleared out the task_locks table.
   
   3. I cleared out ingestion related records on ZK.
   
   4. I restarted all overlords and middle managers.
   
   And now my ingestion is running as expected (100 tasks simultaneously).
   
   What's the reasoning here? Can anyone explain it?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] hqx871 commented on issue #11397: Too many PENDING tasks even though middle managers have plenty of capacity.

Posted by GitBox <gi...@apache.org>.
hqx871 commented on issue #11397:
URL: https://github.com/apache/druid/issues/11397#issuecomment-946564726


   > OK, so I did something,
   > 
   > 1. I shutdown all overlords and middle managers.
   > 2. I cleared out the task_locks table.
   > 3. I cleared out ingestion related records on ZK.
   > 4. I restarted all overlords and middle managers.
   > 
   > And now my ingestion is running as expected (100 tasks simultaneously).
   > 
   > What's the reasoning here? Can anyone explain it?
   
   I have come to the problem too. Have you found some useful information about zk?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] didip commented on issue #11397: Too many PENDING tasks even though middle managers have plenty of capacity.

Posted by GitBox <gi...@apache.org>.
didip commented on issue #11397:
URL: https://github.com/apache/druid/issues/11397#issuecomment-871092508


   I see a lot of these:
   
   ```
   2021-06-30T04:38:24,834 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.RemoteTaskRunner - Assigned a task[single_phase_sub_task_mydata_dcodaahg_2021-06-30T03:51:48.334Z] that is already pending!
   ```
   and a lot of these:
   ```
   2021-06-30T04:37:36,965 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.TaskQueue - Asking taskRunner to run: index_parallel_mydata_gcjhdhak_2021-06-30T04:37:36.950Z
   2021-06-30T04:37:36,965 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.RemoteTaskRunner - Added pending task index_parallel_mydata_gcjhdhak_2021-06-30T04:37:36.950Z
   2021-06-30T04:37:36,965 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.TaskQueue - Asking taskRunner to clean up 4 tasks.
   2021-06-30T04:37:36,965 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.RemoteTaskRunner - Shutdown [single_phase_sub_task_mydata_cfdbeihn_2021-06-26T04:14:06.444Z] because: [task is not in knownTaskIds[[single_phase_sub_task_mydata_keijolhd_2021-06-30T04:19:48.958Z, single_phase_sub_task_mydata_ofdfacjh_2021-06-30T04:05:38.429Z, single_phase_sub_task_mydata_igjfljdb_2021-06-30T04:05:57.470Z, index_parallel_mydata_kklpkeci_2021-06-30T04:37:27.541Z, single_phase_sub_task_mydata_gbfbmdga_2021-06-30T04:20:14.880Z, single_phase_sub_task_mydata_gdcpnmok_2021-06-30T04:20:14.921Z, single_phase_sub_task_mydata_acdlabpn_2021-06-30T04:33:49.050Z, single_phase_sub_task_mydata_lniojeic_2021-06-30T03:51:53.935Z, single_phase_sub_task_mydata_bpjnfifd_2021-06-30T04:20:15.904Z, single_phase_sub_task_mydata_moehkdgm_2021-06-30T04:05:58.143Z, single_phase_sub_task_mydata_abkgdhfp_2021-06-30T04:20:13.947Z, single_phase_sub_task_mydata_naffkmcj_2021-06-30T04:06:01.414Z, single_phase_sub_t
 ask_mydata_haninkha_2021-06-30T04:33:42.148Z, single_phase_sub_task_mydata_ifmfjggm_2021-06-30T03:51:52.049Z, single_phase_sub_task_mydata_gncnpkih_2021-06-30T04:33:55.047Z, single_phase_sub_task_mydata_gboogald_2021-06-30T04:05:36.201Z, single_phase_sub_task_mydata_dhfdeaid_2021-06-30T04:06:05.154Z, single_phase_sub_task_mydata_hjpphmad_2021-06-30T04:19:50.672Z, single_phase_sub_task_mydata_omeocibn_2021-06-30T04:06:06.087Z, index_parallel_mydata_dkfjnnjn_2021-06-30T04:37:07.248Z, single_phase_sub_task_mydata_bonbjnal_2021-06-30T04:05:46.466Z, single_phase_sub_task_mydata_kjdlmiel_2021-06-30T04:19:50.860Z, single_phase_sub_task_mydata_kdpffoii_2021-06-30T04:06:06.371Z, single_phase_sub_task_mydata_cipjohga_2021-06-30T04:20:14.940Z, single_phase_sub_task_mydata_hbaapnhl_2021-06-30T04:01:58.700Z, single_phase_sub_task_mydata_ldamdnkd_2021-06-30T04:19:51.025Z, single_phase_sub_task_mydata_npdkjgbk_2021-06-30T04:19:55.099Z, single_phase_sub_task_mydata_nkgnoppo_2021-06-30T04:05:46.354Z
 , single_phase_sub_task_mydata_lppkleoo_2021-06-30T04:05:58.727Z, single_phase_sub_task_mydata_ioieblfa_2021-06-30T04:05:30.444Z, single_phase_sub_task_mydata_ammamngf_2021-06-30T04:33:51.003Z, single_phase_sub_task_mydata_nhdffopk_2021-06-30T04:20:07.927Z, single_phase_sub_task_mydata_bggndljg_2021-06-30T03:51:57.302Z, single_phase_sub_task_mydata_bbfjjdkc_2021-06-30T04:19:42.814Z, single_phase_sub_task_mydata_gleimmha_2021-06-30T04:20:10.879Z, single_phase_sub_task_mydata_eagnhgnc_2021-06-30T04:25:23.962Z, single_phase_sub_task_mydata_jfdagdli_2021-06-30T03:51:57.931Z, single_phase_sub_task_mydata_icbdciaf_2021-06-30T04:05:56.520Z, single_phase_sub_task_mydata_jaeidggd_2021-06-30T04:05:38.193Z, single_phase_sub_task_mydata_hifieong_2021-06-30T04:05:42.510Z, single_phase_sub_task_mydata_lmkgidog_2021-06-30T04:06:05.466Z, single_phase_sub_task_mydata_lbnkncpk_2021-06-30T04:16:11.084Z, single_phase_sub_task_mydata_aekfflnn_2021-06-30T04:20:17.680Z, single_phase_sub_task_mydata_cfplho
 jj_2021-06-30T04:05:47.160Z, single_phase_sub_task_mydata_ookaddkk_2021-06-30T04:33:42.951Z, single_phase_sub_task_mydata_nbmpmjcc_2021-06-30T04:20:17.876Z, single_phase_sub_task_mydata_khnhdlho_2021-06-30T04:05:44.221Z, single_phase_sub_task_mydata_dpngdcjl_2021-06-30T04:19:52.780Z, single_phase_sub_task_mydata_iefddgjk_2021-06-30T03:51:52.947Z, single_phase_sub_task_mydata_fgdphjil_2021-06-30T03:56:58.268Z, single_phase_sub_task_mydata_cdljkeee_2021-06-30T04:30:24.134Z, single_phase_sub_task_mydata_kbmlkcbh_2021-06-30T04:33:28.911Z, single_phase_sub_task_mydata_emjdoeeg_2021-06-30T04:20:14.901Z, single_phase_sub_task_mydata_bjjeofkn_2021-06-30T04:33:58.078Z, single_phase_sub_task_mydata_bnmboccb_2021-06-30T04:33:58.014Z, single_phase_sub_task_mydata_bobpajkb_2021-06-30T04:19:45.683Z, single_phase_sub_task_mydata_bmgbcppj_2021-06-30T04:19:55.001Z, single_phase_sub_task_mydata_pnghlhpe_2021-06-30T04:20:12.892Z, single_phase_sub_task_mydata_odfelbbh_2021-06-30T04:33:42.030Z, single_p
 hase_sub_task_mydata_gkmbemnj_2021-06-30T04:19:44.958Z, single_phase_sub_task_mydata_gaaoicfp_2021-06-30T04:20:03.884Z, single_phase_sub_task_mydata_dcodaahg_2021-06-30T03:51:48.334Z, single_phase_sub_task_mydata_lmpcigfi_2021-06-30T04:05:50.176Z, single_phase_sub_task_mydata_agdleoed_2021-06-30T04:33:53.077Z, single_phase_sub_task_mydata_lainegjl_2021-06-30T04:33:29.181Z, single_phase_sub_task_mydata_jfdmhncn_2021-06-30T04:19:43.680Z, single_phase_sub_task_mydata_gglajoaj_2021-06-30T04:06:01.346Z, single_phase_sub_task_mydata_dfllkggb_2021-06-30T04:20:15.924Z, single_phase_sub_task_mydata_klbonbfl_2021-06-30T04:05:54.289Z, single_phase_sub_task_mydata_cgbgomoi_2021-06-30T03:51:48.075Z, single_phase_sub_task_mydata_oiplicel_2021-06-30T04:33:47.979Z, single_phase_sub_task_mydata_oghhkhnc_2021-06-30T04:19:45.815Z, index_parallel_mydata_idpakmne_2021-06-30T03:16:11.810Z, single_phase_sub_task_mydata_jjddmakb_2021-06-30T04:05:26.667Z, single_phase_sub_task_mydata_eobikjeo_2021-06-30T04:
 20:20.795Z, single_phase_sub_task_mydata_nfkoildc_2021-06-30T04:05:31.550Z, single_phase_sub_task_mydata_cekcajic_2021-06-30T04:19:42.793Z, single_phase_sub_task_mydata_cjgnefem_2021-06-30T04:20:19.788Z, single_phase_sub_task_mydata_bkgdlnbg_2021-06-30T04:33:56.018Z, single_phase_sub_task_mydata_obeiammk_2021-06-30T04:20:22.881Z, single_phase_sub_task_mydata_hincjmbk_2021-06-30T04:05:25.451Z, single_phase_sub_task_mydata_ogacpbok_2021-06-30T04:11:09.776Z, single_phase_sub_task_mydata_hcjpefhj_2021-06-30T04:06:05.339Z, single_phase_sub_task_mydata_ofbgipie_2021-06-30T04:33:55.017Z, single_phase_sub_task_mydata_dfejdlba_2021-06-30T04:20:13.881Z, single_phase_sub_task_mydata_jbggacnd_2021-06-30T04:06:09.263Z, single_phase_sub_task_mydata_kkeenabe_2021-06-30T04:05:50.450Z, single_phase_sub_task_mydata_jngphnhl_2021-06-30T04:05:31.529Z, single_phase_sub_task_mydata_pnnhaggc_2021-06-30T04:06:08.356Z, single_phase_sub_task_mydata_ioibbfnh_2021-06-30T04:33:38.080Z, single_phase_sub_task_myd
 ata_cjafcalb_2021-06-30T04:06:06.758Z, single_phase_sub_task_mydata_hdcmkapl_2021-06-30T04:20:21.886Z, index_parallel_mydata_gcjhdhak_2021-06-30T04:37:36.950Z, single_phase_sub_task_mydata_mpjcncli_2021-06-30T04:05:58.275Z, single_phase_sub_task_mydata_pinefnhc_2021-06-30T04:33:48.986Z, single_phase_sub_task_mydata_nfphglaa_2021-06-30T04:05:29.436Z, single_phase_sub_task_mydata_mhdebkhc_2021-06-30T04:05:38.173Z, single_phase_sub_task_mydata_nflpjfdo_2021-06-30T04:20:23.894Z, single_phase_sub_task_mydata_mgehkgmg_2021-06-30T04:20:15.881Z, single_phase_sub_task_mydata_gnfbjapf_2021-06-30T04:05:36.443Z, single_phase_sub_task_mydata_pbjnnnnn_2021-06-30T04:20:08.875Z, single_phase_sub_task_mydata_adjmggom_2021-06-30T04:19:56.860Z, single_phase_sub_task_mydata_keeolmna_2021-06-30T04:33:53.056Z, single_phase_sub_task_mydata_mgannkma_2021-06-30T04:05:26.328Z]]]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] didip commented on issue #11397: Too many PENDING tasks even though middle managers have plenty of capacity.

Posted by GitBox <gi...@apache.org>.
didip commented on issue #11397:
URL: https://github.com/apache/druid/issues/11397#issuecomment-871109240


   ok, thank you for the pointer, I will grep around for some more info.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on issue #11397: Too many PENDING tasks even though middle managers have plenty of capacity.

Posted by GitBox <gi...@apache.org>.
jihoonson commented on issue #11397:
URL: https://github.com/apache/druid/issues/11397#issuecomment-871908641


   Good to hear that your druid is running OK now. I'm not sure exactly what happened, but it seems that either the metadata store or ZK went wrong somehow. You may be able to find some clue in your overlord logs, but I wouldn't say it will be easy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] didip commented on issue #11397: Too many PENDING tasks even though middle managers have plenty of capacity.

Posted by GitBox <gi...@apache.org>.
didip commented on issue #11397:
URL: https://github.com/apache/druid/issues/11397#issuecomment-871104718


   I am literally seeing almost nothing assigned to my workers:
   
   <img width="923" alt="Screen Shot 2021-06-29 at 10 19 27 PM" src="https://user-images.githubusercontent.com/72918/123905995-41c0b680-d928-11eb-94c7-22d27de37fce.png">
   
   <img width="1301" alt="Screen Shot 2021-06-29 at 10 19 58 PM" src="https://user-images.githubusercontent.com/72918/123906027-4c7b4b80-d928-11eb-8c71-fd1dc8c5642e.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] didip commented on issue #11397: Too many PENDING tasks even though middle managers have plenty of capacity.

Posted by GitBox <gi...@apache.org>.
didip commented on issue #11397:
URL: https://github.com/apache/druid/issues/11397#issuecomment-871102781


   overlord config:
   ```
   # HTTP server threads
   druid.server.http.readTimeout=PT10M
   druid.server.http.numConnections=500
   druid.server.http.numMaxThreads=500
   druid.server.http.numThreads=200
   
   druid.indexer.queue.startDelay=PT30S
   
   druid.indexer.runner.type=remote
   druid.indexer.storage.type=metadata
   
   druid.indexer.storage.recentlyFinishedThreshold=PT10M
   ```
   
   middlemanager config:
   ```
   # Number of tasks per middleManager
   druid.worker.capacity=20
   
   # Task launch parameters
   druid.indexer.runner.javaOpts=-server -Xms512m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+UseG1GC -XX:-UseBiasedLocking -XX:+CrashOnOutOfMemoryError -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -Daws.region=us-west-2
   
   # HTTP server threads
   druid.server.http.numThreads=60
   
   # Processing threads and buffers
   druid.processing.buffer.sizeBytes=100000000
   druid.processing.numMergeBuffers=2
   druid.processing.numThreads=2
   druid.processing.tmpDir=var/ebs/middlemanager/processing
   
   
   # Peon configurations
   druid.indexer.task.baseDir=var/ebs/middlemanager
   druid.indexer.task.baseTaskDir=var/ebs/middlemanager/task
   druid.indexer.task.defaultRowFlushBoundary=100000
   druid.indexer.task.directoryLockTimeout=PT10M
   druid.indexer.task.gracefulShutdownTimeout=PT5M
   druid.indexer.task.restoreTasksOnRestart=true
   
   druid.indexer.fork.property.druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"]
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] didip commented on issue #11397: Too many PENDING tasks even though middle managers have plenty of capacity.

Posted by GitBox <gi...@apache.org>.
didip commented on issue #11397:
URL: https://github.com/apache/druid/issues/11397#issuecomment-871414490


   I don't see any errors in my ZK at all.
   
   Is there a way to "start over" with my middle managers? aka. removing all entries in ZK about middle managers and scaling down all my middle managers to 0?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] liuxiaohui1221 commented on issue #11397: Too many PENDING tasks even though middle managers have plenty of capacity.

Posted by GitBox <gi...@apache.org>.
liuxiaohui1221 commented on issue #11397:
URL: https://github.com/apache/druid/issues/11397#issuecomment-1001455934


   > 
   
    may be same reason httpRemote? like this #11514


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] hqx871 removed a comment on issue #11397: Too many PENDING tasks even though middle managers have plenty of capacity.

Posted by GitBox <gi...@apache.org>.
hqx871 removed a comment on issue #11397:
URL: https://github.com/apache/druid/issues/11397#issuecomment-946564726


   > OK, so I did something,
   > 
   > 1. I shutdown all overlords and middle managers.
   > 2. I cleared out the task_locks table.
   > 3. I cleared out ingestion related records on ZK.
   > 4. I restarted all overlords and middle managers.
   > 
   > And now my ingestion is running as expected (100 tasks simultaneously).
   > 
   > What's the reasoning here? Can anyone explain it?
   
   I have come to the problem too. Have you found some useful information about zk?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] didip commented on issue #11397: Too many PENDING tasks even though middle managers have plenty of capacity.

Posted by GitBox <gi...@apache.org>.
didip commented on issue #11397:
URL: https://github.com/apache/druid/issues/11397#issuecomment-871093670


   When it comes to errors, I see a lot of these:
   
   ```
   2021-06-30T04:38:57,607 ERROR [rtr-pending-tasks-runner-0] org.apache.druid.indexing.overlord.RemoteTaskRunner - Task assignment timed out on worker [172.19.214.19:8091], never ran task [single_phase_sub_task_mydata_cgbgomoi_2021-06-30T03:51:48.075Z]! Timeout: (513604 >= PT5M)!: {class=org.apache.druid.indexing.overlord.RemoteTaskRunner}
   ```
   and these:
   ```
   2021-06-30T04:38:57,607 ERROR [rtr-pending-tasks-runner-0] org.apache.druid.indexing.overlord.RemoteTaskRunner - Asked to cleanup nonexistent task: {class=org.apache.druid.indexing.overlord.RemoteTaskRunner, taskId=single_phase_sub_task_mydata_cgbgomoi_2021-06-30T03:51:48.075Z}
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on issue #11397: Too many PENDING tasks even though middle managers have plenty of capacity.

Posted by GitBox <gi...@apache.org>.
jihoonson commented on issue #11397:
URL: https://github.com/apache/druid/issues/11397#issuecomment-871088857


   I'm not certain how this can happen. Can you check the overlord logs and see why those tasks are pending?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org