You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2020/11/25 13:34:22 UTC

[GitHub] [accumulo] milleruntime opened a new issue #1809: ZK Watchers remain after table delete

milleruntime opened a new issue #1809:
URL: https://github.com/apache/accumulo/issues/1809


   **Describe the bug**
   When a table is created, Accumulo creates around 20 Zookeeper watchers associated with that table.  This is with the default configuration for that table.  The more properties and iterators that are configured on the table, the more watchers that will exist.  When the table is dropped, anywhere from 8 to 15 watchers will persist indefinitely.  The only way for ZK to drop these watchers is for a restart of the server persisting the connections (ZK, tserver or master).  This becomes a problem on a large cluster with a lot of tables being created and deleted as ZK will eventually become inoperable.  Restarting ZK or the master is not always advisable since this can lead to more problems on an active cluster.
   
   **Versions (OS, Maven, Java, and others, as appropriate):**
    - Affected version(s) of this project: 1.10, 2.0, 2.1
   
   **To Reproduce**
   1. Start up a cluster using [Uno](https://github.com/apache/fluo-uno) and have netcat installed for running ZK four letter commands.  You will probably have modify the ZK whitelist in zoo.cfg.  For example, `vi <uno_home>/install/apache-zookeeper-3.6.1-bin/conf/zoo.cfg` and modify the property: `4lw.commands.whitelist=*`.
   2. Create a table.  For example, `accumulo shell -e "createtable test"`
   3. Get the table ID for that table: `accumulo shell -e "tables -l"`
   4. Get a count of the number of watchers associated with that table ID. For table ID=4: 
   `echo wchp | nc localhost 2181 | grep "tables/4"`.  This returned a count = 23
   5. Drop the table: `accumulo shell -e "droptable test -f"`
   6. Get the number of watchers again for that table ID.  Command returned a count = 15
   
   **Expected behavior**
   ZK Watchers associated with a table should be dropped when the table is deleted.
   
   **Additional context**
   There is a very good chance this will be fixed with the 2.1 change #1454.  But until that change is made, this is a critical bug in 1.10 and 2.0.   
   
   The watchers that persist seem to be associated with table configuration:
   <pre>
   /accumulo/d12e80e5-3008-43af-b050-195094437b44/tables/p/conf/table.split.threshold
   /accumulo/d12e80e5-3008-43af-b050-195094437b44/tables/p/conf/table.replication
   /accumulo/d12e80e5-3008-43af-b050-195094437b44/tables/p/conf/table.balancer
   /accumulo/d12e80e5-3008-43af-b050-195094437b44/tables/p/conf/table.groups.enabled
   /accumulo/d12e80e5-3008-43af-b050-195094437b44/tables/p/conf/table.compaction.minor.logs.threshold
   /accumulo/d12e80e5-3008-43af-b050-195094437b44/tables/p/namespace
   /accumulo/d12e80e5-3008-43af-b050-195094437b44/tables/p/conf/table.classpath.context
   /accumulo/d12e80e5-3008-43af-b050-195094437b44/tables/p/conf/tserver.dir.memdump
   /accumulo/d12e80e5-3008-43af-b050-195094437b44/tables/p/conf/table.compaction.selector
   /accumulo/d12e80e5-3008-43af-b050-195094437b44/tables/p/conf/table.majc.compaction.strategy
   /accumulo/d12e80e5-3008-43af-b050-195094437b44/tables/p/conf/tserver.walog.max.referenced
   /accumulo/d12e80e5-3008-43af-b050-195094437b44/tables/p/conf/table.compaction.dispatcher
   /accumulo/d12e80e5-3008-43af-b050-195094437b44/tables/p/conf/table.split.endrow.size.max
   /accumulo/d12e80e5-3008-43af-b050-195094437b44/tables/p/name
   /accumulo/d12e80e5-3008-43af-b050-195094437b44/tables/p/conf/tserver.memory.maps.native.enabled
   </pre>
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion edited a comment on issue #1809: ZK Watchers remain after table delete

Posted by GitBox <gi...@apache.org>.
dlmarion edited a comment on issue #1809:
URL: https://github.com/apache/accumulo/issues/1809#issuecomment-870878410


   Do we have a utility that uses ZKUtil.visitSubTreeDFS() to traverse the `tables` in ZK and for those that don't exist delete the watches in the callback?
   
   Edit: Looks like `visitSubTreeDFS` doesn't exist in ZooKeeper 3.4.14, but `listSubTreeBFS()` does. Also, there is no way to remove the watches in 3.4.14. I'm not sure how you fix this in 1.10.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on issue #1809: ZK Watchers remain after table delete

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #1809:
URL: https://github.com/apache/accumulo/issues/1809#issuecomment-735991651


   Here is another ticket for reference: https://github.com/apache/accumulo/issues/1423


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] EdColeman commented on issue #1809: ZK Watchers remain after table delete

Posted by GitBox <gi...@apache.org>.
EdColeman commented on issue #1809:
URL: https://github.com/apache/accumulo/issues/1809#issuecomment-733772902


   While the issue with large number of watchers was document in https://issues.apache.org/jira/browse/ACCUMULO-2757 - the resolution of that is still open.  This is a separate issue that aggravates the problem with lots of watches because the tservers (and the master) seem to keep unnecessary watches after a table is deleted.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] EdColeman edited a comment on issue #1809: ZK Watchers remain after table delete

Posted by GitBox <gi...@apache.org>.
EdColeman edited a comment on issue #1809:
URL: https://github.com/apache/accumulo/issues/1809#issuecomment-733772902


   While the issue with large number of watchers was document in https://issues.apache.org/jira/browse/ACCUMULO-2757 - the resolution of that is still open (https://github.com/apache/accumulo/issues/1225).  This is a separate issue that aggravates the problem with lots of watches because the tservers (and the master) seem to keep unnecessary watches after a table is deleted.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #1809: ZK Watchers remain after table delete

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #1809:
URL: https://github.com/apache/accumulo/issues/1809#issuecomment-870954204


   @dlmarion I don't anticipate us shipping a fix for 1.10, since doing so would likely break forward-compatibility with other 1.10 releases in the way things are persisted to ZK. This has been a constraint for a long time in 1.x, and I don't think it would be super urgent to fix for 1.10. There are some workarounds... such as creating watched missing entries and deleting them, so the delete causes the watches to trigger, or bouncing tservers with excessive watchers. This issue is assigned to @EdColeman  and I believe he is making progress in his fork on a more robust solution for 2.1 that, among other changes, involves using significantly fewer ZK nodes to store config.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] EdColeman commented on issue #1809: ZK Watchers remain after table delete

Posted by GitBox <gi...@apache.org>.
EdColeman commented on issue #1809:
URL: https://github.com/apache/accumulo/issues/1809#issuecomment-870961440


   One of the issues that blocks this as a specific fix is that currently, watchers are created with the expectation that the node might appear - and then code will receive notification on node creation.  If the property never set, the node never comes into existence and the watcher that was set remains even when the table is deleted.  To clear the watcher, it would be necessary to create the "missing" nodes and then delete them to trigger the watcher, which hopefully would not them be reset - while this might be a solution, the rework of the properties in 2.1 will make this unnecessary. 
   
   I explored a stand-alone utility that can gather the watchers using a zk four-letter work command (either wchp or wchc work) and then create / delete the nodes with watchers when a corresponding table id does not exist.  The four-letter commands can be disruptive and there was no interest in the utility.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] jzgithub1 commented on issue #1809: ZK Watchers remain after table delete

Posted by GitBox <gi...@apache.org>.
jzgithub1 commented on issue #1809:
URL: https://github.com/apache/accumulo/issues/1809#issuecomment-735896613


   Maybe some of the ideas in this old and closed pull request can motivate the ultimate solution to this issue:
   
   https://github.com/apache/accumulo/pull/1443


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] EdColeman commented on issue #1809: ZK Watchers remain after table delete

Posted by GitBox <gi...@apache.org>.
EdColeman commented on issue #1809:
URL: https://github.com/apache/accumulo/issues/1809#issuecomment-870976336


   If were seemed likely to release a 1.10.2, then if might be possible to add some mitigations as @ctubbsii mentioned - I need to finish the 2.1 approach first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #1809: ZK Watchers remain after table delete

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #1809:
URL: https://github.com/apache/accumulo/issues/1809#issuecomment-870970227


   Should we remove this from the 1.10.2 project then?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #1809: ZK Watchers remain after table delete

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #1809:
URL: https://github.com/apache/accumulo/issues/1809#issuecomment-870878410


   Do we have a utility that uses ZKUtil.visitSubTreeDFS() to traverse the `tables` in ZK and for those that don't exist delete the watches in the callback?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org