You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/07/24 06:18:30 UTC
[GitHub] [incubator-druid] clintropolis commented on issue #8107: Add CliIndexer process type and initial task runner implementation

clintropolis commented on issue #8107: Add CliIndexer process type and initial task runner implementation
URL: https://github.com/apache/incubator-druid/pull/8107#issuecomment-514496119
 
 
   I did some more testing with this on my laptop with a setup of 1 of each broker, router, coordinator, overlord, and 2 indexer and historicals
   
   <img width="1669" alt="Screen Shot 2019-07-23 at 6 20 58 PM" src="https://user-images.githubusercontent.com/1577461/61769231-6906c480-ad9e-11e9-8f73-a9cf7c34083f.png">
   
   Doing small scale some kafka indexing testing to make sure realtime queries and handoff were functioning
   
   <img width="1674" alt="Screen Shot 2019-07-23 at 6 43 32 PM" src="https://user-images.githubusercontent.com/1577461/61769298-a2d7cb00-ad9e-11e9-8b85-7c2230674167.png">
   
   Overall things are working nicely. I did run into an issue when trying to stop an indexer node (`SIGTERM`), I believe the issue lies with the order of lifecycle shutdown, in that the tasks are gracefully stopped _after_ jetty is stopped. This causes the lifecycle stop on the indexer to hang during graceful task stop, because the task is waiting to hear from the overlord a message it will never be able to hear without a running jetty. 
   
   The supervisor on the overlord is then forever stuck in a loop, performing an action it can never complete because the indexer has stopped listening.
   
   ```2019-07-24T01:58:21,268 INFO [KafkaSupervisor-wikipedia] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - [wikipedia] supervisor is running.
   2019-07-24T01:58:21,268 INFO [KafkaSupervisor-wikipedia] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - {id='wikipedia', generationTime=2019-07-24T01:58:21.268Z, payload=KafkaSupervisorReportPayload{dataSource='wikipedia', topic='wikipedia', partitions=1, replicas=2, durationSeconds=600, active=[{id='index_kafka_wikipedia_ed020815fc3c3f4_bebmfiod', startTime=2019-07-24T01:55:50.422Z, remainingSeconds=449}, {id='index_kafka_wikipedia_ed020815fc3c3f4_cfcibngo', startTime=2019-07-24T01:55:50.525Z, remainingSeconds=449}], publishing=[], suspended=false, healthy=true, state=RUNNING, detailedState=RUNNING, recentErrors=[]}}
   2019-07-24T01:58:36,268 INFO [IndexTaskClient-wikipedia-0] org.apache.druid.indexing.common.IndexTaskClient - submitRequest failed for [http://localhost:8092/druid/worker/v1/chat/index_kafka_wikipedia_ed020815fc3c3f4_bebmfiod/offsets/current], with message [Connection refused (Connection refused)]
   ```
   
   The indexer eventually gives up after 5 minute timeout and ungracefully stops, but the supervisor/overlord appears to remain stuck until either the indexer comes back on the same host/port or the overlord is restarted. This also jams up what the web ui displays as the task status, where the task of the stuck indexer remains in the 'running' state until the same condition of the indexer returning or the overlord is restarting is met.
   
   This issue aside, I'm still +1 on this if you'd rather fix this in a follow-up PR, since this is currently an undocumented feature anyway.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org