You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/11/08 17:36:51 UTC

[GitHub] [accumulo] milleruntime opened a new issue #2350: External compaction error when table is deleted

milleruntime opened a new issue #2350:
URL: https://github.com/apache/accumulo/issues/2350


   I saw a few errors that I did not expect to see after deleting a table that had active running external compactions. Here is what gets printed in the compactor log:
   <pre>
   2021-11-08T12:20:31,706 [compaction.FileCompactor] ERROR: File does not exist: /accumulo/tables/2/t-000002w/C0000115.rf_tmp (inode 17654) Holder DFSClient_NONMAPREDUCE_55427
   0523_18 does not have any open files.
           at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3050)
           at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFileInternal(FSDirWriteFileOp.java:704)
           at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFile(FSDirWriteFileOp.java:690)
           at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3094)
           at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:963)
           at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:639)
           at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
           at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:532)
           at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
           at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1020)
           at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
           at java.base/java.security.AccessController.doPrivileged(Native Method)
           at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
           at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
           at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2952)
   </pre>
   <pre>
   org.apache.hadoop.ipc.RemoteException: File does not exist: /accumulo/tables/2/t-000002w/C0000115.rf_tmp (inode 17654) Holder DFSClient_NONMAPREDUCE_554270523_18 does not ha
   ve any open files.
           at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3050)
           at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFileInternal(FSDirWriteFileOp.java:704)
           at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFile(FSDirWriteFileOp.java:690)
           at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3094)
           at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:963)
           at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:639)
           at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
           at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:532)
           at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
           at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1020)
           at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
           at java.base/java.security.AccessController.doPrivileged(Native Method)
           at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
           at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
           at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2952)
   
   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1562) ~[hadoop-client-api-3.3.0.jar:?]
           at org.apache.hadoop.ipc.Client.call(Client.java:1508) ~[hadoop-client-api-3.3.0.jar:?]
           at org.apache.hadoop.ipc.Client.call(Client.java:1405) ~[hadoop-client-api-3.3.0.jar:?]
           at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:234) ~[hadoop-client-api-3.3.0.jar:?]
           at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:119) ~[hadoop-client-api-3.3.0.jar:?]
           at com.sun.proxy.$Proxy34.complete(Unknown Source) ~[?:?]
           at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:570) ~[hadoop-client-api-3.3.0.jar:?]
           at jdk.internal.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) ~[?:?]
           at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
           at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
           at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) ~[hadoop-client-api-3.3.0.jar:?]
           at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) ~[hadoop-client-api-3.3.0.jar:?]
           at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) ~[hadoop-client-api-3.3.0.jar:?]
           at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) ~[hadoop-client-api-3.3.0.jar:?]
           at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) ~[hadoop-client-api-3.3.0.jar:?]
           at com.sun.proxy.$Proxy35.complete(Unknown Source) ~[?:?]
           at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:957) ~[hadoop-client-api-3.3.0.jar:?]
           at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:914) ~[hadoop-client-api-3.3.0.jar:?]
           at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:897) ~[hadoop-client-api-3.3.0.jar:?]
           at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:852) ~[hadoop-client-api-3.3.0.jar:?]
           at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) ~[hadoop-client-api-3.3.0.jar:?]
           at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101) ~[hadoop-client-api-3.3.0.jar:?]
           at org.apache.accumulo.core.file.streams.RateLimitedOutputStream.close(RateLimitedOutputStream.java:54) ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer.close(BCFile.java:369) ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.core.file.rfile.RFile$Writer.close(RFile.java:635) ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.server.compaction.FileCompactor.call(FileCompactor.java:236) ~[accumulo-server-base-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.compactor.Compactor$6.run(Compactor.java:553) ~[accumulo-compactor-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at java.lang.Thread.run(Thread.java:829) [?:?]
   </pre>
   <pre>
   2021-11-08T12:20:31,706 [threads.AccumuloUncaughtExceptionHandler] ERROR: Caught an Exception in Thread[Compaction job for tablet TKeyExtent(table:32, endRow:31 33 33 33 33 33 33 33 33 33 33 33 33 33 33 35, prevEndRow:30 63 63 63 63 63 63 63 63 63 63 63 63 63 63 65),5,main]. Thread is dead.
   java.lang.RuntimeException: Compaction failed
           at org.apache.accumulo.compactor.Compactor$6.run(Compactor.java:568) ~[accumulo-compactor-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at java.lang.Thread.run(Thread.java:829) [?:?]
   </pre>
   
   Right before the errors this is what the same compactor was reporting in its log:
   <pre>
   2021-11-08T12:20:30,189 [compactor.Compactor] INFO : Starting up compaction runnable for job: TExternalCompactionJob(externalCompactionId:ECID:cd145bea-6d06-45cc-b2a2-5a9575
   ab3e42, extent:TKeyExtent(table:32, endRow:31 33 33 33 33 33 33 33 33 33 33 33 33 33 33 35, prevEndRow:30 63 63 63 63 63 63 63 63 63 63 63 63 63 63 65), files:[InputFile(met
   adataFileEntry:hdfs://localhost:8020/accumulo/tables/2/t-000002w/F00000ym.rf, size:2306065, entries:61204, timestamp:-1), InputFile(metadataFileEntry:hdfs://localhost:8020/a
   ccumulo/tables/2/t-000002w/F00000z8.rf, size:2194819, entries:58318, timestamp:-1), InputFile(metadataFileEntry:hdfs://localhost:8020/accumulo/tables/2/t-000002w/F00000zu.rf
   , size:2325338, entries:61640, timestamp:-1), InputFile(metadataFileEntry:hdfs://localhost:8020/accumulo/tables/2/t-000002w/F000010l.rf, size:2322428, entries:61581, timesta
   mp:-1)], iteratorSettings:IteratorConfig(iterators:[]), outputFile:hdfs://localhost:8020/accumulo/tables/2/t-000002w/C0000115.rf_tmp, propagateDeletes:true, kind:SYSTEM, use
   rCompactionId:0, overrides:{})
   2021-11-08T12:20:30,191 [compactor.Compactor] DEBUG: Progress checks will occur every 1 seconds
   2021-11-08T12:20:31,191 [compactor.Compactor] DEBUG: Updating coordinator with compaction progress: Compaction in progress, read 172032 of 242743 input entries ( 70.87001 % 
   ), written 172032 entries.
   </pre>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2350: External compaction error when table is deleted

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2350:
URL: https://github.com/apache/accumulo/issues/2350#issuecomment-982601335


   @milleruntime - what's the error here? How do you think it should be handled differently? I have reviewed the code and the error originates [here](https://github.com/apache/accumulo/blob/main/server/base/src/main/java/org/apache/accumulo/server/compaction/FileCompactor.java#L238) when trying to close the compaction output file. The error is re-thrown which affects both the internal and external compaction code. In the internal compaction case, it bubbles up to [here](https://github.com/apache/accumulo/blob/main/server/tserver/src/main/java/org/apache/accumulo/tserver/compactions/InternalCompactionExecutor.java#L100) and a warning is logged.  In the external case the error bubbles up [here](https://github.com/apache/accumulo/blob/main/server/compactor/src/main/java/org/apache/accumulo/compactor/Compactor.java#L569). I could remove the line that throws a RuntimeException from the thread, that would remove the log entry for the UncaughtExceptionHandler saying the thread is dead. I th
 ink the code would be unaffected by doing that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on issue #2350: External compaction error when table is deleted

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #2350:
URL: https://github.com/apache/accumulo/issues/2350#issuecomment-982755747


   I think there error handling could just be cleaned up a bit. If we know the table is deleted, then don't treat the situation the same as an error. 
   
   > I think the solution here might be to remove the code that throws the RuntimeException in the FileCompactor thread and call the checkIfCanceled method after the FileCompactor thread finishes to perform a final check.
   
   That sounds reasonable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion closed issue #2350: External compaction error when table is deleted

Posted by GitBox <gi...@apache.org>.
dlmarion closed issue #2350:
URL: https://github.com/apache/accumulo/issues/2350


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on issue #2350: External compaction error when table is deleted

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #2350:
URL: https://github.com/apache/accumulo/issues/2350#issuecomment-963399290


   Here is when the Manager deleted the table:
   <pre>
   2021-11-08T12:20:31,099 [tableOps.Utils] INFO : table 2 (dcd003ecac0a015) locked for read operation: DELETE
   2021-11-08T12:20:31,111 [cancel.CancelCompactions] DEBUG: FATE[0dcd003ecac0a015] setting cancel compaction id to 1 for 2
   2021-11-08T12:20:31,114 [zookeeper.DistributedReadWriteLock] DEBUG: Removing lock entry 0 userData 0dcd003ecac0a015 lockType READ
   2021-11-08T12:20:31,116 [tableOps.Utils] INFO : table 2 (dcd003ecac0a015) unlocked for read
   2021-11-08T12:20:31,117 [zookeeper.DistributedReadWriteLock] DEBUG: Removing lock entry 0 userData 0dcd003ecac0a015 lockType READ
   2021-11-08T12:20:31,121 [tableOps.Utils] INFO : namespace +default (dcd003ecac0a015) unlocked for read
   2021-11-08T12:20:31,133 [zookeeper.DistributedReadWriteLock] INFO : Added lock entry 0 userData 0dcd003ecac0a015 lockTpye READ
   2021-11-08T12:20:31,134 [tableOps.Utils] INFO : namespace +default (dcd003ecac0a015) locked for read operation: DELETE
   2021-11-08T12:20:31,138 [zookeeper.DistributedReadWriteLock] INFO : Added lock entry 0 userData 0dcd003ecac0a015 lockType WRITE
   2021-11-08T12:20:31,139 [tableOps.Utils] INFO : table 2 (dcd003ecac0a015) locked for write operation: DELETE
   2021-11-08T12:20:31,141 [tables.TableManager] DEBUG: Transitioning state for table 2 from ONLINE to DELETING
   2021-11-08T12:20:31,142 [manager.EventCoordinator] INFO : deleting table 2 
   2021-11-08T12:20:31,143 [tables.TableManager] DEBUG: State transition to DELETING @ WatchedEvent state:SyncConnected type:NodeDataChanged path:/accumulo/2ab5ddfb-44ff-4d37-9
   538-f16e7c37c5c3/tables/2/state
   2021-11-08T12:20:31,143 [manager.EventCoordinator] INFO : Table state in zookeeper changed for 2 to DELETING
   </pre>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2350: External compaction error when table is deleted

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2350:
URL: https://github.com/apache/accumulo/issues/2350#issuecomment-982618327


   I should also note that the External Compactor does cancel an external compaction if a table is deleted. It currently checks every 5 seconds, see [here](https://github.com/apache/accumulo/blob/main/server/compactor/src/main/java/org/apache/accumulo/compactor/Compactor.java#L181). I think the solution here might be to remove the code that throws the RuntimeException in the FileCompactor thread and call the `checkIfCanceled` method after the FileCompactor thread finishes to perform a final check.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org