You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/02/09 21:16:58 UTC

[GitHub] [accumulo] milleruntime opened a new issue #1919: Compact FATE undo failed

milleruntime opened a new issue #1919:
URL: https://github.com/apache/accumulo/issues/1919


   While running RW MultiTable jobs for 2.1.0-SNAPSHOT on Uno with 2 Tservers, I saw a few user initiated compactions run after a table was already being deleted and throw an error while trying to back out of the FATE compaction. Here is a trace of revelevant log activity in the Manager:
   <pre>
   2021-02-09T15:17:34,268 [tables.TableManager] DEBUG: Transitioning state for table 6w from ONLINE to DELETING
   2021-02-09T15:17:34,518 [delete.CleanUp] DEBUG: Still waiting for table to be deleted: 6w locationState: 6w;1<@(null,ip-10-113-12-25:10000[10001119e1e0006],ip-10-113-12-25:10000[10001119e1e0006])
   2021-02-09T15:18:38,439 [accumulo.audit] INFO : operation: permitted; user: root; client: 127.0.0.1:33232; action: compactTable; targetTable: 6w; targetNamespace: +default;
   2021-02-09T15:18:40,295 [zookeeper.DistributedReadWriteLock] INFO : Added lock entry 22 userData 6b396a774d5b7118 lockTpye READ
   2021-02-09T15:18:40,311 [tableOps.Utils] INFO : namespace +default (6b396a774d5b7118) locked for read operation: COMPACT
   2021-02-09T15:18:40,313 [zookeeper.DistributedReadWriteLock] INFO : Added lock entry 1 userData 6b396a774d5b7118 lockTpye READ
   2021-02-09T15:18:43,910 [tableOps.Utils] INFO : namespace +default (6b396a774d5b7118) locked for read operation: COMPACT
   ...
   2021-02-09T15:19:02,911 [delete.CleanUp] DEBUG: Still waiting for table to be deleted: 6w locationState: 6w;2;1@(null,ip-10-113-12-25:9997[10001119e1e0005],ip-10-113-12-25:9997[10001119e1e0005])
   
   2021-02-09T15:19:06,498 [tableOps.Utils] INFO : namespace +default (6b396a774d5b7118) locked for read operation: COMPACT
   2021-02-09T15:19:08,056 [delete.CleanUp] DEBUG: Deleted table 6w
   ...
   2021-02-09T15:19:10,094 [tableOps.Utils] INFO : namespace +default (6b396a774d5b7118) locked for read operation: COMPACT
   2021-02-09T15:19:10,137 [fate.Fate] INFO : Updated status for Repo with FATE[6b396a774d5b7118] to FAILED_IN_PROGRESS
   2021-02-09T15:19:13,032 [tableOps.Utils] INFO : namespace +default (6b396a774d5b7118) unlocked for read
   2021-02-09T15:19:13,037 [tableOps.Utils] INFO : table 6w (6b396a774d5b7118) unlocked for read
   2021-02-09T15:19:13,038 [fate.Fate] WARN : Failed to undo Repo, FATE[6b396a774d5b7118]
   org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /accumulo/4128397f-66ce-45f3-840f-38924fa0abd7/tables/6w/compact-id
           at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) ~[zookeeper-3.6.2.jar:3.6.2]
           at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[zookeeper-3.6.2.jar:3.6.2]
           at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:2358) ~[zookeeper-3.6.2.jar:3.6.2]
           at org.apache.accumulo.fate.zookeeper.ZooReaderWriter.lambda$mutateExisting$6(ZooReaderWriter.java:187) ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.fate.zookeeper.ZooReader.retryLoopMutator(ZooReader.java:165) ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.fate.zookeeper.ZooReaderWriter.mutateExisting(ZooReaderWriter.java:185) ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.manager.tableOps.compact.CompactRange.removeIterators(CompactRange.java:152) ~[accumulo-manager-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.manager.tableOps.compact.CompactRange.undo(CompactRange.java:175) ~[accumulo-manager-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.manager.tableOps.compact.CompactRange.undo(CompactRange.java:47) ~[accumulo-manager-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.manager.tableOps.TraceRepo.undo(TraceRepo.java:64) ~[accumulo-manager-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.fate.Fate$TransactionRunner.undo(Fate.java:203) ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.fate.Fate$TransactionRunner.processFailed(Fate.java:179) ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.fate.Fate$TransactionRunner.run(Fate.java:72) ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
           at java.lang.Thread.run(Thread.java:834) [?:?]
   </pre>
   
   From the logs, it appears the FATE transaction was started after the table was already marked for delete and made it through to the `CompactRange` operation. It looks like it was waiting there for the table write lock to free but failed once the Table delete was complete.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime closed issue #1919: Compact FATE undo failed

Posted by GitBox <gi...@apache.org>.
milleruntime closed issue #1919:
URL: https://github.com/apache/accumulo/issues/1919


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org