You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/04/01 15:41:07 UTC

[GitHub] [accumulo] milleruntime opened a new issue #1994: Tablets processed by multiple tservers

milleruntime opened a new issue #1994:
URL: https://github.com/apache/accumulo/issues/1994


   **Describe the bug**
   On a very active 1.9.3 cluster, Zookeeper reached a certain state where it wasn't functioning 100%, causing tservers to see warnings about Transient connections. The tservers did not lose the ZK lock during this time but were spamming Transient connection warnings. In this time, the Master marked the tserver as dead and began reassigning its tablets to other tservers. ZK was able to recovery and allowed most of the tservers to resume functioning normally. The half alive tservers eventually lost their locks and reported rfiles missing because they were still trying to process files that had been assigned to and compacted by other tservers.
   
   **Versions (OS, Maven, Java, and others, as appropriate):**
    - Affected version(s) of this project: 1.9.3, 1.10
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on issue #1994: Tablets processed by multiple tservers

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #1994:
URL: https://github.com/apache/accumulo/issues/1994#issuecomment-821319162


   One way to test this is to drop the ZK max client connections (maxClientCnxns=10) down to something low like 5-10. The problem is it is very hard to debug or do anything as Accumulo will hover in this degraded state. It gets enough connections to be alive and that's about it. You can't even debug ZK using 4 letter commands or the CLI because there aren't any connections. There may be a sweet spot where the Master and a tserver can get enough connections to function but a 2nd tserver will be half alive. I have yet to find what that is though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime edited a comment on issue #1994: Tablets processed by multiple tservers

Posted by GitBox <gi...@apache.org>.
milleruntime edited a comment on issue #1994:
URL: https://github.com/apache/accumulo/issues/1994#issuecomment-822490385


   After some analysis running Accumulo 2.1 latest in Uno, it looks like each process at a minimum (no activity) takes the following ZK connections.
   ZK Admin command - 1
   Manager - 2
   GC - 2
   Tracer - 3
   Monitor - 2
   tserver - 2 each
   scans, shell, clients - ~1 each


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime edited a comment on issue #1994: Tablets processed by multiple tservers

Posted by GitBox <gi...@apache.org>.
milleruntime edited a comment on issue #1994:
URL: https://github.com/apache/accumulo/issues/1994#issuecomment-822490385


   After some analysis running Accumulo 2.1 latest in Uno, it looks like each process at a minimum (no activity) takes the following ZK connections.
   ZK Admin command - 1
   Manager - 2
   GC - 2
   Tracer - 3
   Monitor - 2
   tserver - 2 each


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on issue #1994: Tablets processed by multiple tservers

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #1994:
URL: https://github.com/apache/accumulo/issues/1994#issuecomment-822490385


   After some analysis running Uno, it looks like each process at a minimum (no activity) takes the following ZK connections.
   ZK Admin command - 1
   Manager - 2
   GC - 2
   Tracer - 3
   Monitor - 2
   tserver - 2 each


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org