You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2022/05/02 12:47:30 UTC

[GitHub] [accumulo] milleruntime opened a new issue, #2667: Exception during RW Bulk import

milleruntime opened a new issue, #2667:
URL: https://github.com/apache/accumulo/issues/2667

   Steps to reproduce:
   
   1. Start a Uno Cluster with 2 tservers (I used commit b3ac225d43bacf86bb280be48de750c409f45e66)
   2. Run RW Bulk test from [Accumulo Testing](https://github.com/apache/accumulo-testing). I ran the following command 4 times with different log files: 
   `~/workspace/accumulo-testing$ ./bin/rwalk Bulk.xml > /tmp/log4 2>&1 &`
   3. Let tests finish running and maybe see the error
   
   <pre>
   java.lang.RuntimeException: Closed tablet 5<;r13c3b lastCompactID is inconsistent with metadata : 25 != 23
   	at org.apache.accumulo.tserver.tablet.Tablet.lambda$closeConsistencyCheck$4(Tablet.java:1090)
   	at java.base/java.util.OptionalLong.ifPresent(OptionalLong.java:163)
   	at org.apache.accumulo.tserver.tablet.Tablet.closeConsistencyCheck(Tablet.java:1085)
   	at org.apache.accumulo.tserver.tablet.Tablet.completeClose(Tablet.java:1003)
   	at org.apache.accumulo.tserver.tablet.Tablet.split(Tablet.java:1464)
   	at org.apache.accumulo.tserver.TabletServer.splitTablet(TabletServer.java:491)
   	at org.apache.accumulo.tserver.TabletClientHandler.splitTablet(TabletClientHandler.java:1006)
   	at jdk.internal.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
   	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
   	at org.apache.accumulo.core.trace.TraceUtil.lambda$wrapService$0(TraceUtil.java:206)
   	at com.sun.proxy.$Proxy39.splitTablet(Unknown Source)
   	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$splitTablet.getResult(TabletClientService.java:2648)
   	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$splitTablet.getResult(TabletClientService.java:2627)
   	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
   	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
   	at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:138)
   	at org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:54)
   	at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524)
   	at org.apache.accumulo.server.rpc.CustomNonBlockingServer$CustomFrameBuffer.invoke(CustomNonBlockingServer.java:129)
   	at org.apache.thrift.server.Invocation.run(Invocation.java:18)
   	at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
   	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
   	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
   	at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
   	at java.base/java.lang.Thread.run(Thread.java:829)
   </pre>
   
   From the tserver log:
   <pre>
   2022-05-02T08:16:04,322 [tablet.Tablet] DEBUG: Tablet 5;r13c3b;r0f20c had no dir, creating hdfs://localhost:8020/accumulo/tables/5/t-00000fa
   2022-05-02T08:16:04,333 [tablet.Tablet] DEBUG: Files for low split 5;r0d7fe;r0860c []
   2022-05-02T08:16:04,333 [tablet.Tablet] DEBUG: Files for high split 5;r0f20c;r0d7fe []
   2022-05-02T08:16:04,356 [tablet.Tablet] ERROR: Closed tablet 5<;r13c3b lastCompactID is inconsistent with metadata : 25 != 23
   2022-05-02T08:16:04,357 [tablet.Tablet] ERROR: Failed to do close consistency check for tablet 5<;r13c3b
   </pre>
   
   Manager log:
   <pre>
   2022-05-02T08:15:59,338 [tableOps.Utils] INFO : table 5 (6043c9b3efe07636) locked for read operation: COMPACT
   2022-05-02T08:15:59,887 [tableOps.Utils] INFO : table 5 (332cb19f408b595b) locked for write operation: MERGE
   2022-05-02T08:16:00,991 [tableOps.Utils] INFO : table 5 (332cb19f408b595b) unlocked for write
   2022-05-02T08:16:01,008 [tableOps.Utils] INFO : table 5 (5b528b34c36ceac6) locked for write operation: MERGE
   2022-05-02T08:16:02,810 [tableOps.Utils] INFO : table 5 (5b528b34c36ceac6) unlocked for write
   2022-05-02T08:16:02,927 [tableOps.Utils] INFO : table 5 (33fb4a18cefb5fa1) locked for read operation: COMPACT
   2022-05-02T08:16:03,659 [tableOps.Utils] INFO : table 5 (56170bb68d50efd6) locked for write operation: MERGE
   2022-05-02T08:16:03,909 [tableOps.Utils] INFO : table 5 (56170bb68d50efd6) unlocked for write
   2022-05-02T08:16:03,910 [tableOps.Utils] INFO : table 5 (71a50c6c0d42b5b0) locked for read operation: COMPACT
   2022-05-02T08:16:03,910 [tableOps.Utils] INFO : table 5 (112da7f68f32e75d) locked for read operation: COMPACT
   2022-05-02T08:16:08,555 [tableOps.Utils] INFO : table 5 (4de2323e0a3772eb) locked for write operation: MERGE
   </pre>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] milleruntime closed issue #2667: lastCompactID is inconsistent with metadata Exception

Posted by GitBox <gi...@apache.org>.
milleruntime closed issue #2667: lastCompactID is inconsistent with metadata Exception 
URL: https://github.com/apache/accumulo/issues/2667


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] milleruntime commented on issue #2667: lastCompactID is inconsistent with metadata Exception

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #2667:
URL: https://github.com/apache/accumulo/issues/2667#issuecomment-1258053465

   Fixed in #2574 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on issue #2667: lastCompactID is inconsistent with metadata Exception

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2667:
URL: https://github.com/apache/accumulo/issues/2667#issuecomment-1258602393

   > I was running 4 Bulk Rwalk jobs again.
   
   Any advice if I wanted to try to reproduce this?  How many tservers were you running?  You had 4 random walk processes each running the bulk graph?  Was everything running on a single VM or multiple VMs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] milleruntime commented on issue #2667: lastCompactID is inconsistent with metadata Exception

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #2667:
URL: https://github.com/apache/accumulo/issues/2667#issuecomment-1272024544

   > I was able to reproduce this yesterday. It seems like the same situation that was speculated as a cause in an earlier comment, a compaction of an empty tablet followed by a close.
   
   Interesting. I am going to try to write a test to reproduce it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] milleruntime commented on issue #2667: lastCompactID is inconsistent with metadata Exception

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #2667:
URL: https://github.com/apache/accumulo/issues/2667#issuecomment-1259993972

   @keith-turner I also dropped the verify and END nodes from Bulk, just so the test would keep running.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on issue #2667: lastCompactID is inconsistent with metadata Exception

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2667:
URL: https://github.com/apache/accumulo/issues/2667#issuecomment-1114995545

   I'll take a look at it
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] milleruntime commented on issue #2667: lastCompactID is inconsistent with metadata Exception

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #2667:
URL: https://github.com/apache/accumulo/issues/2667#issuecomment-1268958175

   I wonder if this error occurred when a table was compacting and splitting at the same time and the split failed due to the bug I fixed in #2977. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner closed issue #2667: lastCompactID is inconsistent with metadata Exception

Posted by GitBox <gi...@apache.org>.
keith-turner closed issue #2667: lastCompactID is inconsistent with metadata Exception 
URL: https://github.com/apache/accumulo/issues/2667


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] milleruntime commented on issue #2667: lastCompactID is inconsistent with metadata Exception

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #2667:
URL: https://github.com/apache/accumulo/issues/2667#issuecomment-1136162226

   @keith-turner Did you make any changes for this race condition bug?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on issue #2667: lastCompactID is inconsistent with metadata Exception

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2667:
URL: https://github.com/apache/accumulo/issues/2667#issuecomment-1277381244

   I am continuing to run the bulk import test with the fixes for #3013 and #3014 applied.  I have not seen the error again with the inconsistent id, I suspect I know what the problem is but want to confirm it with the additional logging I added.  On the run last night all of the tservers died with out of memory problems.  I looked at the UNO config and the tserver memory was 768m.  I upped this to 4G and restarted the test.  I have never seen the out of memory problem when running the test, maybe fixing #3013 and #3014 allowed the test to run longer and that caused it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] milleruntime commented on issue #2667: lastCompactID is inconsistent with metadata Exception

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #2667:
URL: https://github.com/apache/accumulo/issues/2667#issuecomment-1114918368

   @keith-turner Any thoughts on this consistency check? I am not sure if there was a problem or not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] milleruntime commented on issue #2667: lastCompactID is inconsistent with metadata Exception

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #2667:
URL: https://github.com/apache/accumulo/issues/2667#issuecomment-1258159848

   It seems like recent changes (possibly the refactor in #2574) have made this less frequent but I just saw this happen again. I was running 4 Bulk Rwalk jobs again.
   Here is the error I saw:
   <pre>
   2022-09-26T09:45:25,362 [util.FileUtil] DEBUG: Found first and last keys for 0 map files in   0.00 secs
   2022-09-26T09:45:25,363 [tablet.Tablet] DEBUG: Tablet 2;r143fe< had no dir, creating hdfs://localhost:8020/accumulo/tables/2/t-00000rs
   2022-09-26T09:45:25,369 [tablet.Tablet] ERROR: Closed tablet 2;r143fe< lastCompactID is inconsistent with metadata : 115 != 112
   2022-09-26T09:45:25,370 [tablet.Tablet] ERROR: Failed to do close consistency check for tablet 2;r143fe<
   java.lang.RuntimeException: Closed tablet 2;r143fe< lastCompactID is inconsistent with metadata : 115 != 112
           at org.apache.accumulo.tserver.tablet.Tablet.lambda$closeConsistencyCheck$4(Tablet.java:1090) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at java.util.OptionalLong.ifPresent(OptionalLong.java:163) ~[?:?]
           at org.apache.accumulo.tserver.tablet.Tablet.closeConsistencyCheck(Tablet.java:1085) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.tserver.tablet.Tablet.completeClose(Tablet.java:1003) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.tserver.tablet.Tablet.split(Tablet.java:1465) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.tserver.TabletServer.splitTablet(TabletServer.java:489) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.tserver.TabletClientHandler.splitTablet(TabletClientHandler.java:1004) ~[accumulo-tserver-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at jdk.internal.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) ~[?:?]
           at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
           at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
           at org.apache.accumulo.core.trace.TraceUtil.lambda$wrapService$0(TraceUtil.java:206) ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at com.sun.proxy.$Proxy38.splitTablet(Unknown Source) ~[?:?]
           at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$splitTablet.getResult(TabletClientService.java:2648) ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$splitTablet.getResult(TabletClientService.java:2627) ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) ~[libthrift-0.16.0.jar:0.16.0]
           at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) ~[libthrift-0.16.0.jar:0.16.0]
           at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:138) ~[libthrift-0.16.0.jar:0.16.0]
           at org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:54) ~[accumulo-server-base-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524) ~[libthrift-0.16.0.jar:0.16.0]
           at org.apache.accumulo.server.rpc.CustomNonBlockingServer$CustomFrameBuffer.invoke(CustomNonBlockingServer.java:129) ~[accumulo-server-base-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at org.apache.thrift.server.Invocation.run(Invocation.java:18) ~[libthrift-0.16.0.jar:0.16.0]
           at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
           at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) ~[accumulo-core-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
           at java.lang.Thread.run(Thread.java:829) ~[?:?]
   </pre>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on issue #2667: lastCompactID is inconsistent with metadata Exception

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2667:
URL: https://github.com/apache/accumulo/issues/2667#issuecomment-1271312449

   I was able to reproduce this yesterday and the situation seemed similar to the one above.  It seems like the same situation that was speculated as a cause in an earlier comment, a compaction of an empty tablet followed by a close.
   
   Saw the following in the tserver logs.
   
   ```
   2022-10-06T19:41:09,701 [tablet.location] DEBUG: Loading 3s<< on localhost:9997[1000325bae30004]
   2022-10-06T19:41:09,728 [tablet.location] DEBUG: Loaded 3s<< on localhost:9997[1000325bae30004]
   2022-10-06T19:41:09,830 [tablet.Tablet] DEBUG: Tablet 3s<< had no dir, creating hdfs://localhost:8020/accumulo/tables/3s/default_tablet
   2022-10-06T19:41:09,837 [tablet.Tablet] ERROR: Closed tablet 3s<< lastCompactID is inconsistent with metadata : 2 != -1
   2022-10-06T19:41:09,839 [tablet.Tablet] ERROR: Failed to do close consistency check for tablet 3s<<
   java.lang.RuntimeException: Closed tablet 3s<< lastCompactID is inconsistent with metadata : 2 != -1
   java.lang.RuntimeException: Failed to do close consistency check for tablet 3s<<
   ```
   
   Saw the following the manager logs
   
   ```
   2022-10-06T19:41:09,520 [security.SecurityOperation] INFO : Granted table permission READ for user root on the table 3s at the request of user !SYSTEM
   2022-10-06T19:41:09,523 [security.SecurityOperation] INFO : Granted table permission WRITE for user root on the table 3s at the request of user !SYSTEM
   2022-10-06T19:41:09,526 [security.SecurityOperation] INFO : Granted table permission BULK_IMPORT for user root on the table 3s at the request of user !SYSTEM
   2022-10-06T19:41:09,528 [security.SecurityOperation] INFO : Granted table permission ALTER_TABLE for user root on the table 3s at the request of user !SYSTEM
   2022-10-06T19:41:09,530 [security.SecurityOperation] INFO : Granted table permission GRANT for user root on the table 3s at the request of user !SYSTEM
   2022-10-06T19:41:09,533 [security.SecurityOperation] INFO : Granted table permission DROP_TABLE for user root on the table 3s at the request of user !SYSTEM
   2022-10-06T19:41:09,537 [security.SecurityOperation] INFO : Granted table permission GET_SUMMARIES for user root on the table 3s at the request of user !SYSTEM
   2022-10-06T19:41:09,550 [tableOps.Utils] INFO : table 3s FATE[3d5c0cbb1211e4a9] locked for write operation: CREATE
   2022-10-06T19:41:09,550 [tables.TableManager] DEBUG: Creating ZooKeeper entries for new table bulk_dev_uk_2030907_1665085269455 (ID: 3s) in namespace (ID: +default)
   2022-10-06T19:41:09,560 [manager.EventCoordinator] INFO : Table state in zookeeper changed for 3s to NEW
   2022-10-06T19:41:09,581 [balancer.TableLoadBalancer] INFO : Loaded class org.apache.accumulo.core.spi.balancer.SimpleLoadBalancer for table 3s
   2022-10-06T19:41:09,603 [tables.TableManager] DEBUG: Transitioning state for table 3s from NEW to ONLINE
   2022-10-06T19:41:09,604 [manager.EventCoordinator] INFO : Table state in zookeeper changed for 3s to ONLINE
   2022-10-06T19:41:09,613 [tableOps.Utils] INFO : table 3s FATE[3d5c0cbb1211e4a9] unlocked for write
   2022-10-06T19:41:09,680 [accumulo.audit] INFO : operation: permitted; user: root; client: 127.0.0.1:53720; action: compactTable; targetTable: 3s; targetNamespace: +default;
   2022-10-06T19:41:09,685 [accumulo.audit] INFO : operation: permitted; user: root; client: 127.0.0.1:53712; action: compactTable; targetTable: 3s; targetNamespace: +default;
   2022-10-06T19:41:09,700 [tablet.location] DEBUG: Assigned 3s<< to localhost:9997[1000325bae30004]
   2022-10-06T19:41:09,730 [manager.EventCoordinator] INFO : tablet 3s<< was loaded on localhost:9997
   2022-10-06T19:41:09,731 [tableOps.Utils] INFO : table 3s FATE[68a7eab3013b5aaa] locked for read operation: COMPACT
   2022-10-06T19:41:09,751 [tableOps.Utils] INFO : table 3s FATE[71a8f902cf66da2a] locked for read operation: COMPACT
   2022-10-06T19:41:11,854 [schema.LinkingIterator] DEBUG: First tablet seen provides evidence of earlier tablet in range, retrying 3s;r0e773 [3s; : [] 9223372036854775807 false,3s<%00; : [] 9223372036854775807 false) 
   2022-10-06T19:41:12,124 [manager.EventCoordinator] INFO : localhost:9997 reported split 3s;r0e773<, 3s<;r0e773
   2022-10-06T19:41:13,897 [schema.LinkingIterator] DEBUG: First tablet seen provides evidence of earlier tablet in range, retrying 3s;r057b6 [3s; : [] 9223372036854775807 false,3s<%00; : [] 9223372036854775807 false) 
   2022-10-06T19:41:14,123 [manager.EventCoordinator] INFO : localhost:9997 reported split 3s;r057b6<, 3s;r0e773;r057b6
   2022-10-06T19:41:14,905 [manager.EventCoordinator] INFO : localhost:9997 reported split 3s;r11bf7;r0e773, 3s<;r11bf7
   2022-10-06T19:41:16,423 [manager.EventCoordinator] INFO : localhost:9997 reported split 3s;r0af73;r057b6, 3s;r0e773;r0af73
   2022-10-06T19:41:16,902 [schema.LinkingIterator] DEBUG: Tablets end row and prev end row not equals 3s;r11bf7;r0e773 3s<;r1391e 
   2022-10-06T19:41:17,425 [manager.EventCoordinator] INFO : localhost:9997 reported split 3s;r1391e;r11bf7, 3s<;r1391e
   2022-10-06T19:41:18,168 [accumulo.audit] INFO : operation: permitted; user: root; client: 127.0.0.1:53712; action: mergeTable; targetTable: 3s; targetNamespace: +default;
   2022-10-06T19:41:18,168 [manager.Manager] DEBUG: Creating merge op: 3s from startRow: -inf to endRow: +inf
   2022-10-06T19:41:18,237 [tableOps.Utils] INFO : table 3s FATE[57d4471744875c60] locked for write operation: MERGE
   2022-10-06T19:41:18,242 [manager.EventCoordinator] INFO : Merge state of 3s<< set to STARTED
   2022-10-06T19:41:18,382 [manager.Manager] DEBUG: mergeInfo overlaps: 3s;r057b6< true
   ```
   
   I added some logging to the tserver that will help verify the hypothesis and kicked the test off again.  Didn't see the problem again, but did see #3006.  Also noticed some of the bulk import random walk test are stuck this morning i have not had a chance to track that down. Going to keep trying to reproduce the problem with improved logging.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on issue #2667: lastCompactID is inconsistent with metadata Exception

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2667:
URL: https://github.com/apache/accumulo/issues/2667#issuecomment-1269905344

   > I wonder if this error occurred when a table was compacting and splitting at the same time and the split failed due to the bug I fixed in https://github.com/apache/accumulo/issues/2977.
   
   I will look into that.  I have been trying to recreate your scenario with a slight modification to the RW test graph. Instead of not ever verifiying and therefore not ending.  I changed my graph locally to start over after verify.  So doing the following locally.
   
   ```
   $ git diff
   diff --git a/src/main/resources/randomwalk/modules/Bulk.xml b/src/main/resources/randomwalk/modules/Bulk.xml
   index 26004b1..1880944 100644
   --- a/src/main/resources/randomwalk/modules/Bulk.xml
   +++ b/src/main/resources/randomwalk/modules/Bulk.xml
   @@ -55,7 +55,7 @@
    </node>
    
    <node id="bulk.Verify">
   -  <edge id="END" weight="1"/>
   +  <edge id="bulk.Setup" weight="1"/>
    </node>
    
    </module>
   ```
   
   I have been bumping into a few little issues with the test when trying to verify so i have not have not had a super long run yet.  But in the runs so far I don't think I saw any problems.  I will take a look at the split thing you mentioned.  I also found I needed to add nohup to your little script so that the processes would not terminate.
   
   ```
   for (( i=1; i<5; i++)); do nohup bin/rwalk Bulk.xml > rwalk"$i".log 2>&1 & done
   ```
   
   I think I am close to getting a test that could run for days on end going.  I think this a good to test to run even if this issue may be fixed by the change you mentioned.
   
   @milleruntime when you did see the error on the tserver, did it cause the RW test to hang or error?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on issue #2667: lastCompactID is inconsistent with metadata Exception

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2667:
URL: https://github.com/apache/accumulo/issues/2667#issuecomment-1269941212

   @milleruntime when you did see this error how long did you run test before seeing it?  How many times did you see it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on issue #2667: lastCompactID is inconsistent with metadata Exception

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2667:
URL: https://github.com/apache/accumulo/issues/2667#issuecomment-1115139592

   I think I have found one possible cause to this.  Before doing the consistency check on close the following code will wait for there the be zero non-external compactions by looking at the `runningJobs` set.
   
   https://github.com/apache/accumulo/blob/b3ac225d43bacf86bb280be48de750c409f45e66/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/CompactableImpl.java#L1504-L1505
   
   The following code is called when a compaction completes and it removes itself from `runningJobs` and then calls `selectFiles()`
   
   https://github.com/apache/accumulo/blob/b3ac225d43bacf86bb280be48de750c409f45e66/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/CompactableImpl.java#L1220-L1233
   
   The call to `selectFiles()` eventaully calls the the following code which updates the metadata table and then updates the tablets in memory last compact ID.  The following code can run concurrently with the close consistency check.
   
   https://github.com/apache/accumulo/blob/b3ac225d43bacf86bb280be48de750c409f45e66/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/CompactableUtils.java#L439-L444
   
   Looking at the error message `Closed tablet 5<;r13c3b lastCompactID is inconsistent with metadata : 25 != 23`
   
   * 25 is from metadata table
   * 23 is from tablet instance variable lastCompactID
   
   Which is consistent with the metadata table being updated first followed by the lastCompactID instance variable.
   
   I think this code with the race conditions only executes when an empty tablet is forced to compact, so all it does is write the compact id to the metadata table to signify its done w/o any files changes.  There was a forced compaction in the manager logs a bit before this split that closed the tablet.
   
   I am going to look into this some more and see if I can figure out if there is a problem with reordering the code to avoid the race condition.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on issue #2667: lastCompactID is inconsistent with metadata Exception

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2667:
URL: https://github.com/apache/accumulo/issues/2667#issuecomment-1282120602

   I opened #3028, I think it may fix this issue.  I saw the problem happen multiple times while running the bulk RW test.  With additional logging added I confirmed the problem happened under the following conditions all happening to the same tablet at around the same time.
   
    * Tablet with no files
    * User compaction initiated
    * Tablet being closed
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] milleruntime commented on issue #2667: lastCompactID is inconsistent with metadata Exception

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #2667:
URL: https://github.com/apache/accumulo/issues/2667#issuecomment-1259410321

   > > I was running 4 Bulk Rwalk jobs again.
   > 
   > Any advice if I wanted to try to reproduce this? How many tservers were you running? You had 4 random walk processes each running the bulk graph? Was everything running on a single VM or multiple VMs?
   
   I was able to produce this on a single EC2 Ubuntu instance, running Uno. I installed Snaphot and set `export NUM_TSERVERS=2`. Then once I had Uno running, I ran the following command to fire off 4 local RW processes:
   <pre>
   for (( i=1; i<5; i++)); do ~/workspace/accumulo-testing/bin/rwalk Bulk.xml > /tmp/rwalk"$i".log 2>&1 & done
   </pre>
   FYI I found a bug in the Bulk RW test so make sure you get the fix from accumulo-testing first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org