You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@druid.apache.org by GitBox <gi...@apache.org> on 2018/07/09 14:43:40 UTC

[GitHub] erankor commented on issue #5979: Kafka Indexing Service lagging every hour

erankor commented on issue #5979: Kafka Indexing Service lagging every hour
URL: https://github.com/apache/incubator-druid/issues/5979#issuecomment-403503558
 
 
   The issue got much worse today :( reached 30-40 min lag during handoff. I tried to stop realtime ingestion completely but it didn't help -
   1. shutdown KIS tasks (had to kill the tasks manually to have it complete)
   2. stopped the overlord, coordinator, middle managers
   3. restarted all the services
   4. re-enabled KIS tasks
   
   Adding some logs that may hopefully help troubleshoot this
   [overlord-exception2.txt](https://github.com/apache/incubator-druid/files/2176642/overlord-exception2.txt)
   [overlord-exception3.txt](https://github.com/apache/incubator-druid/files/2176643/overlord-exception3.txt)
   [stalled-task.txt](https://github.com/apache/incubator-druid/files/2176644/stalled-task.txt)
   [overlord-exception1.txt](https://github.com/apache/incubator-druid/files/2176645/overlord-exception1.txt)
   
   1. stalled-task.txt - if I'm reading this correctly, the worker was waiting for ~15 min on some request that was issued to the overlord
   2. overlord-exception1-3.txt - some exceptions that I saw on the overlord logs - 
     a. "io.druid.java.util.common.RetryUtils - Failed on try x" + MySQLIntegrityConstraintViolationException
     b. "The RuntimeException could not be mapped to a response, re-throwing to the HTTP container" + MySQLIntegrityConstraintViolationException
     c. "The RuntimeException could not be mapped to a response, re-throwing to the HTTP container" + "Unable to grant lock to inactive Task"
   
   These exceptions seem to be happening quite a lot, here the numbers for some random 2 hours -
   $ journalctl -u druid_over* -S -7200 | grep WARN | grep -c 'io.druid.java.util.common.RetryUtils'
   92
   $ journalctl -u druid_over* -S -7200 | grep -c 'Unable to grant lock to inactive Task'
   135
   $ journalctl -u druid_over* -S -7200 | grep -c 'UnableToExecuteStatementException'
   179
   
   Any direction you can give me here would be appreciated
   
   Thank you
   
   Eran

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@druid.apache.org
For additional commands, e-mail: dev-help@druid.apache.org