You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/06/24 11:22:28 UTC

[GitHub] [accumulo] milleruntime opened a new issue #2179: Handle old sorted map files

milleruntime opened a new issue #2179:
URL: https://github.com/apache/accumulo/issues/2179


   The changes in #2117 will cause errors if users have old sorted map files left around from before an upgrade to 2.1. I think we can handle this by catching the error or detecting the old files. Then deleting the map files and forcing Accumulo to re-sort is probably the best approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on issue #2179: Handle old sorted map files

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #2179:
URL: https://github.com/apache/accumulo/issues/2179#issuecomment-867816943


   Once I deleted the sorted WAL files (everything in /accumulo/recovery) and restarted, Accumulo was able to recovery properly. The simplest way to handle these files would be to just delete everything in /accumulo/recovery in `Upgrader9to10`. This should be fine, since the files are intermediate and the data should still exist in the original WAL under /accumulo/wal. Another option would be to print a warning that the files exist and say that the files must be removed to proceed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime edited a comment on issue #2179: Handle old sorted map files

Posted by GitBox <gi...@apache.org>.
milleruntime edited a comment on issue #2179:
URL: https://github.com/apache/accumulo/issues/2179#issuecomment-867763969


   I tested an upgrade in Uno with some data and sorted WAL files. I was surprised to not see any errors trying to recover the old map files. I noticed that the GC deleted the files rather quickly after the Upgrader was finished.
   
   <pre>
   11:04:24 {main} ~/workspace/uno$ hdfs dfs -ls -R /accumulo/recovery
   drwxr-xr-x   - mike supergroup          0 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3
   drwxr-xr-x   - mike supergroup          0 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00000
   -rw-r--r--   3 mike supergroup     366452 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00000/data
   -rw-r--r--   3 mike supergroup        255 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00000/index
   drwxr-xr-x   - mike supergroup          0 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00001
   -rw-r--r--   3 mike supergroup     366592 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00001/data
   -rw-r--r--   3 mike supergroup        224 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00001/index
   drwxr-xr-x   - mike supergroup          0 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00002
   -rw-r--r--   3 mike supergroup     366727 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00002/data
   -rw-r--r--   3 mike supergroup        224 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00002/index
   
   11:49:32 {main} ~/workspace/uno/install/logs/accumulo$ grep 980ce402-07dc-4a09-b6ac-e26664db29a3 *
   gc_ip-10-113-12-25.log:2021-06-24T11:08:05,957 [gc.GarbageCollectWriteAheadLogs] DEBUG: Removing CLOSED WAL hdfs://localhost:8020/accumulo/wal/ip-10-113-12-25+9997/980ce402-07dc-4a09-b6ac-e26664db29a3
   gc_ip-10-113-12-25.log:2021-06-24T11:08:05,965 [gc.GarbageCollectWriteAheadLogs] DEBUG: Removing recovery log hdfs://localhost:8020/accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3
   gc_ip-10-113-12-25.log:2021-06-24T11:08:05,979 [log.WalStateManager] DEBUG: Removing 980ce402-07dc-4a09-b6ac-e26664db29a3
   </pre>
   
   I am going to try the test again but use CI to make sure there wasn't data any data loss.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on issue #2179: Handle old sorted map files

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #2179:
URL: https://github.com/apache/accumulo/issues/2179#issuecomment-867799697


   After repeating the upgrade again and using CI, I did see errors and Accumulo was unable to recover properly. 
   <pre>
   2021-06-24T12:48:55,826 [log.RecoveryLogsIterator] DEBUG: Opening recovery log dir 0367b0c1-12fb-4d94-8335-061b7c9ac232
   2021-06-24T12:48:55,831 [tserver.AssignmentHandler] WARN : exception trying to assign tablet !0<;~ null
   java.lang.RuntimeException: Error recovering tablet !0<;~ from log files
           at org.apache.accumulo.tserver.tablet.Tablet.<init>(Tablet.java:407) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.tserver.AssignmentHandler.run(AssignmentHandler.java:160) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.tserver.ActiveAssignmentRunnable.run(ActiveAssignmentRunnable.java:63) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) ~[htrace-core-3.2.0-incubating.jar:3.2.0-incubating]
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
           at java.lang.Thread.run(Thread.java:829) [?:?]
   Caused by: java.io.IOException: java.lang.RuntimeException: java.io.FileNotFoundException: Path is not a file: /accumulo/recovery/0367b0c1-12fb-4d94-8335-061b7c9ac232/part-r-00000
           at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:90)
           at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
           at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:156)
           at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2070)
           at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:770)
           at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:458)
           at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
           at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:532)
           at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
           at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1020)
           at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
           at java.base/java.security.AccessController.doPrivileged(Native Method)
           at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
           at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
           at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2952)
   
           at org.apache.accumulo.tserver.log.TabletServerLogger.recover(TabletServerLogger.java:540) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.tserver.TabletServer.recover(TabletServer.java:1153) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.tserver.tablet.Tablet.<init>(Tablet.java:366) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           ... 6 more
   Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: Path is not a file: /accumulo/recovery/0367b0c1-12fb-4d94-8335-061b7c9ac232/part-r-00000
           at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:90)
           at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
           at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:156)
           at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2070)
           at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:770)
           at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:458)
           at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
           at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:532)
           at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
           at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1020)
           at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
           at java.base/java.security.AccessController.doPrivileged(Native Method)
           at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
           at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
           at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2952)
   
           at org.apache.accumulo.core.client.rfile.RFileScanner.iterator(RFileScanner.java:398) ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.tserver.log.RecoveryLogsIterator.validateFirstKey(RecoveryLogsIterator.java:156) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.tserver.log.RecoveryLogsIterator.<init>(RecoveryLogsIterator.java:77) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.tserver.log.SortedLogRecovery.findMaxTabletId(SortedLogRecovery.java:107) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.tserver.log.SortedLogRecovery.findLogsThatDefineTablet(SortedLogRecovery.java:147) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.tserver.log.SortedLogRecovery.recover(SortedLogRecovery.java:291) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.tserver.log.TabletServerLogger.recover(TabletServerLogger.java:538) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.tserver.TabletServer.recover(TabletServer.java:1153) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.tserver.tablet.Tablet.<init>(Tablet.java:366) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           ... 6 more
           Suppressed: java.lang.NullPointerException
                   at org.apache.accumulo.core.client.rfile.RFileScannerBuilder$InputArgs.getSources(RFileScannerBuilder.java:64) ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
                   at org.apache.accumulo.core.client.rfile.RFileScanner.close(RFileScanner.java:405) ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
                   at org.apache.accumulo.tserver.log.RecoveryLogsIterator.validateFirstKey(RecoveryLogsIterator.java:153) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
                   at org.apache.accumulo.tserver.log.RecoveryLogsIterator.<init>(RecoveryLogsIterator.java:77) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
                   at org.apache.accumulo.tserver.log.SortedLogRecovery.findMaxTabletId(SortedLogRecovery.java:107) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
                   at org.apache.accumulo.tserver.log.SortedLogRecovery.findLogsThatDefineTablet(SortedLogRecovery.java:147) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
                   at org.apache.accumulo.tserver.log.SortedLogRecovery.recover(SortedLogRecovery.java:291) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
                   at org.apache.accumulo.tserver.log.TabletServerLogger.recover(TabletServerLogger.java:538) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
                   at org.apache.accumulo.tserver.TabletServer.recover(TabletServer.java:1153) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
                   at org.apache.accumulo.tserver.tablet.Tablet.<init>(Tablet.java:366) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
                   at org.apache.accumulo.tserver.AssignmentHandler.run(AssignmentHandler.java:160) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
                   at org.apache.accumulo.tserver.ActiveAssignmentRunnable.run(ActiveAssignmentRunnable.java:63) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
                   at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) ~[htrace-core-3.2.0-incubating.jar:3.2.0-incubating]
                   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
                   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
                   at java.lang.Thread.run(Thread.java:829) [?:?]
   </pre>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime edited a comment on issue #2179: Handle old sorted map files

Posted by GitBox <gi...@apache.org>.
milleruntime edited a comment on issue #2179:
URL: https://github.com/apache/accumulo/issues/2179#issuecomment-867763969


   I tested an upgrade in Uno from 2.0.1 to 2.1.0-SNAPSHOT with some data and sorted WAL files. I was surprised to not see any errors trying to recover the old map files. I noticed that the GC deleted the files rather quickly after the Upgrader was finished.
   
   <pre>
   11:04:24 {main} ~/workspace/uno$ hdfs dfs -ls -R /accumulo/recovery
   drwxr-xr-x   - mike supergroup          0 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3
   drwxr-xr-x   - mike supergroup          0 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00000
   -rw-r--r--   3 mike supergroup     366452 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00000/data
   -rw-r--r--   3 mike supergroup        255 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00000/index
   drwxr-xr-x   - mike supergroup          0 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00001
   -rw-r--r--   3 mike supergroup     366592 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00001/data
   -rw-r--r--   3 mike supergroup        224 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00001/index
   drwxr-xr-x   - mike supergroup          0 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00002
   -rw-r--r--   3 mike supergroup     366727 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00002/data
   -rw-r--r--   3 mike supergroup        224 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00002/index
   
   11:49:32 {main} ~/workspace/uno/install/logs/accumulo$ grep 980ce402-07dc-4a09-b6ac-e26664db29a3 *
   gc_ip-10-113-12-25.log:2021-06-24T11:08:05,957 [gc.GarbageCollectWriteAheadLogs] DEBUG: Removing CLOSED WAL hdfs://localhost:8020/accumulo/wal/ip-10-113-12-25+9997/980ce402-07dc-4a09-b6ac-e26664db29a3
   gc_ip-10-113-12-25.log:2021-06-24T11:08:05,965 [gc.GarbageCollectWriteAheadLogs] DEBUG: Removing recovery log hdfs://localhost:8020/accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3
   gc_ip-10-113-12-25.log:2021-06-24T11:08:05,979 [log.WalStateManager] DEBUG: Removing 980ce402-07dc-4a09-b6ac-e26664db29a3
   </pre>
   
   This could have been because the sorted files were no longer referenced. I am going to try the test again but use CI to make sure there wasn't data any data loss.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on issue #2179: Handle old sorted map files

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #2179:
URL: https://github.com/apache/accumulo/issues/2179#issuecomment-869692365


   > Option 2 Seems similar to the way that FATEs are handled during an upgrade - basically if any FATEs exist, the upgrade stops with a warning. And IIRC, that includes FATEs that have completed with a SUCCESS but not reported back to the client and cleaned-up. So, its not just active FATEs, but any FATE, active, failed or success. I think this is to prevent potential serialization changes between versions causing issues.
   
   It seems like we have a good reason to handle FATE operations that way. I can't think of a reason why we would need to keep around an intermediate sorted WAL. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime edited a comment on issue #2179: Handle old sorted map files

Posted by GitBox <gi...@apache.org>.
milleruntime edited a comment on issue #2179:
URL: https://github.com/apache/accumulo/issues/2179#issuecomment-867763969


   I tested an upgrade in Uno from 2.0.1 to 2.1.0-SNAPSHOT with some data and sorted WAL files. I was surprised to not see any errors trying to recover the old map files. I noticed that the GC deleted the files rather quickly after the Upgrader was finished.
   
   <pre>
   11:04:24 {main} ~/workspace/uno$ hdfs dfs -ls -R /accumulo/recovery
   drwxr-xr-x   - mike supergroup          0 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3
   drwxr-xr-x   - mike supergroup          0 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00000
   -rw-r--r--   3 mike supergroup     366452 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00000/data
   -rw-r--r--   3 mike supergroup        255 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00000/index
   drwxr-xr-x   - mike supergroup          0 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00001
   -rw-r--r--   3 mike supergroup     366592 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00001/data
   -rw-r--r--   3 mike supergroup        224 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00001/index
   drwxr-xr-x   - mike supergroup          0 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00002
   -rw-r--r--   3 mike supergroup     366727 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00002/data
   -rw-r--r--   3 mike supergroup        224 2021-06-24 11:04 /accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3/part-r-00002/index
   
   11:49:32 {main} ~/workspace/uno/install/logs/accumulo$ grep 980ce402-07dc-4a09-b6ac-e26664db29a3 *
   gc_ip-10-113-12-25.log:2021-06-24T11:08:05,957 [gc.GarbageCollectWriteAheadLogs] DEBUG: Removing CLOSED WAL hdfs://localhost:8020/accumulo/wal/ip-10-113-12-25+9997/980ce402-07dc-4a09-b6ac-e26664db29a3
   gc_ip-10-113-12-25.log:2021-06-24T11:08:05,965 [gc.GarbageCollectWriteAheadLogs] DEBUG: Removing recovery log hdfs://localhost:8020/accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3
   gc_ip-10-113-12-25.log:2021-06-24T11:08:05,979 [log.WalStateManager] DEBUG: Removing 980ce402-07dc-4a09-b6ac-e26664db29a3
   </pre>
   
   I am going to try the test again but use CI to make sure there wasn't data any data loss.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on issue #2179: Handle old sorted map files

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #2179:
URL: https://github.com/apache/accumulo/issues/2179#issuecomment-869692365


   > Option 2 Seems similar to the way that FATEs are handled during an upgrade - basically if any FATEs exist, the upgrade stops with a warning. And IIRC, that includes FATEs that have completed with a SUCCESS but not reported back to the client and cleaned-up. So, its not just active FATEs, but any FATE, active, failed or success. I think this is to prevent potential serialization changes between versions causing issues.
   
   It seems like we have a good reason to handle FATE operations that way. I can't think of a reason why we would need to keep around an intermediate sorted WAL. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on issue #2179: Handle old sorted map files

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #2179:
URL: https://github.com/apache/accumulo/issues/2179#issuecomment-867763969


   I tested an upgrade in Uno with some data and sorted WAL files. I was surprised to not see any errors trying to recover the old map files. I noticed that the GC deleted the files rather quickly after the Upgrader was finished.
   
   <pre>
   11:49:32 {main} ~/workspace/uno/install/logs/accumulo$ grep 980ce402-07dc-4a09-b6ac-e26664db29a3 *
   gc_ip-10-113-12-25.log:2021-06-24T11:08:05,957 [gc.GarbageCollectWriteAheadLogs] DEBUG: Removing CLOSED WAL hdfs://localhost:8020/accumulo/wal/ip-10-113-12-25+9997/980ce402-07dc-4a09-b6ac-e26664db29a3
   gc_ip-10-113-12-25.log:2021-06-24T11:08:05,965 [gc.GarbageCollectWriteAheadLogs] DEBUG: Removing recovery log hdfs://localhost:8020/accumulo/recovery/980ce402-07dc-4a09-b6ac-e26664db29a3
   gc_ip-10-113-12-25.log:2021-06-24T11:08:05,979 [log.WalStateManager] DEBUG: Removing 980ce402-07dc-4a09-b6ac-e26664db29a3
   </pre>
   
   I am going to try the test again but use CI to make sure there wasn't data any data loss.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] EdColeman commented on issue #2179: Handle old sorted map files

Posted by GitBox <gi...@apache.org>.
EdColeman commented on issue #2179:
URL: https://github.com/apache/accumulo/issues/2179#issuecomment-867825880


   Option 2 Seems similar to the way that FATEs are handled during an upgrade - basically if any FATEs exist, the upgrade stops with a warning.  And IIRC, that includes FATEs that have completed with a SUCCESS but not reported back to the client and cleaned-up.  So, its not just active FATEs, but any FATE, active, failed or success.  I think this is to prevent potential serialization changes between versions causing issues.
   
   If they really intermediate and can will be auto-regenerated if needed, then the first option seems to be the most user friendly, but option 2 does have a precedent. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime closed issue #2179: Handle old sorted map files

Posted by GitBox <gi...@apache.org>.
milleruntime closed issue #2179:
URL: https://github.com/apache/accumulo/issues/2179


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org