You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2022/12/29 23:36:34 UTC

[GitHub] [accumulo] keith-turner opened a new issue, #3144: Scan server not properly handling not finding tablet for batch scan

keith-turner opened a new issue, #3144:
URL: https://github.com/apache/accumulo/issues/3144

   **Describe the bug**
   
   While working on #3143 and running [these test](https://gist.github.com/keith-turner/f2159111b025e600a6e0abbaba1d92f3) I saw the following exception in the scan server that caused a scan to fail with a server side error.
   
   ```
   2022-12-29T15:03:16,443 [tserver.ScanServer] INFO : RFFS 169171 extent not found in metadata table 1;000139510<
   2022-12-29T15:03:16,443 [tserver.ScanServer] ERROR: Error starting scan
   org.apache.accumulo.core.tabletserver.thrift.NotServingTabletException: null
   	at org.apache.accumulo.tserver.ScanServer.reserveFilesInner(ScanServer.java:503) ~[accumulo-tserver-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
   	at org.apache.accumulo.tserver.ScanServer.reserveFiles(ScanServer.java:641) ~[accumulo-tserver-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
   	at org.apache.accumulo.tserver.ScanServer.startMultiScan(ScanServer.java:884) ~[accumulo-tserver-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
   	at jdk.internal.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) ~[?:?]
   	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
   	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
   	at org.apache.accumulo.core.trace.TraceUtil.lambda$wrapService$0(TraceUtil.java:202) ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
   	at com.sun.proxy.$Proxy34.startMultiScan(Unknown Source) ~[?:?]
   	at org.apache.accumulo.core.tabletserver.thrift.TabletScanClientService$Processor$startMultiScan.getResult(TabletScanClientService.java:855) ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
   	at org.apache.accumulo.core.tabletserver.thrift.TabletScanClientService$Processor$startMultiScan.getResult(TabletScanClientService.java:831) ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
   	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:40) ~[libthrift-0.17.0.jar:0.17.0]
   	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:40) ~[libthrift-0.17.0.jar:0.17.0]
   	at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:147) ~[libthrift-0.17.0.jar:0.17.0]
   	at org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:54) ~[accumulo-server-base-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
   	at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:492) ~[libthrift-0.17.0.jar:0.17.0]
   	at org.apache.accumulo.server.rpc.CustomNonBlockingServer$CustomFrameBuffer.invoke(CustomNonBlockingServer.java:129) ~[accumulo-server-base-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
   	at org.apache.thrift.server.Invocation.run(Invocation.java:18) ~[libthrift-0.17.0.jar:0.17.0]
   	at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
   	at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
   	at java.lang.Thread.run(Thread.java:829) ~[?:?]
   2022-12-29T15:03:16,443 [thrift.ProcessFunction] ERROR: Internal error processing startMultiScan
   org.apache.accumulo.core.tabletserver.thrift.NotServingTabletException: null
   	at org.apache.accumulo.tserver.ScanServer.reserveFilesInner(ScanServer.java:503) ~[accumulo-tserver-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
   	at org.apache.accumulo.tserver.ScanServer.reserveFiles(ScanServer.java:641) ~[accumulo-tserver-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
   	at org.apache.accumulo.tserver.ScanServer.startMultiScan(ScanServer.java:884) ~[accumulo-tserver-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
   	at jdk.internal.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) ~[?:?]
   	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
   	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
   	at org.apache.accumulo.core.trace.TraceUtil.lambda$wrapService$0(TraceUtil.java:202) ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
   	at com.sun.proxy.$Proxy34.startMultiScan(Unknown Source) ~[?:?]
   	at org.apache.accumulo.core.tabletserver.thrift.TabletScanClientService$Processor$startMultiScan.getResult(TabletScanClientService.java:855) ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
   	at org.apache.accumulo.core.tabletserver.thrift.TabletScanClientService$Processor$startMultiScan.getResult(TabletScanClientService.java:831) ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
   	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:40) ~[libthrift-0.17.0.jar:0.17.0]
   	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:40) ~[libthrift-0.17.0.jar:0.17.0]
   	at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:147) ~[libthrift-0.17.0.jar:0.17.0]
   	at org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:54) ~[accumulo-server-base-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
   	at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:492) ~[libthrift-0.17.0.jar:0.17.0]
   	at org.apache.accumulo.server.rpc.CustomNonBlockingServer$CustomFrameBuffer.invoke(CustomNonBlockingServer.java:129) ~[accumulo-server-base-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
   	at org.apache.thrift.server.Invocation.run(Invocation.java:18) ~[libthrift-0.17.0.jar:0.17.0]
   	at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
   	at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) ~[accumulo-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
   	at java.lang.Thread.run(Thread.java:829) ~[?:?]
   ``` 
   
   Looking into this, the ScanSever is not properly handling not finding a tablet in the metadata table for a batch scan.  For normal scans when a tablet is not found the thrift RPC throws a NotServingTabletException.  Batch scans process multiple tablets in a single RPC and do not throw this exception.  A batch scan RPC can process a subset of the requested tablets, so the RPC returns a list of the extents that it did not process.  The scan server does not do this, it throws the NotServingTabletException which is not declared on the startMultiscan RPC and therefore ended up looking like a server side error the the client.  This is where the [problems happens](https://github.com/apache/accumulo/blob/f7a62bf46bff438c1f77c62d6261f33fa9c0beb3/server/tserver/src/main/java/org/apache/accumulo/tserver/ScanServer.java#L884) in the scan server code.  For batch scans, the scan server need to return the list of failed tablets for ones it could not find.
   
   **To Reproduce**
   
   Run the tests mentioned earlier for a while.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on issue #3144: Scan server not properly handling not finding tablet for batch scan

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #3144:
URL: https://github.com/apache/accumulo/issues/3144#issuecomment-1370437539

   > Is this something that should be addressed in 2.1.1?
   
   I suspect 2.1.1 is not properly handling a table splitting while batch scanning using scan servers.  Not sure if it will fail in the same way as was seen in 3.0. If it fails in a way that disrupts the scan in 2.1.1 it would be good to fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on issue #3144: Scan server not properly handling not finding tablet for batch scan

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #3144:
URL: https://github.com/apache/accumulo/issues/3144#issuecomment-1369685153

   Is this something that should be addressed in 2.1.1?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion closed issue #3144: Scan server not properly handling not finding tablet for batch scan

Posted by GitBox <gi...@apache.org>.
dlmarion closed issue #3144: Scan server not properly handling not finding tablet for batch scan
URL: https://github.com/apache/accumulo/issues/3144


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org