You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Chinmay Kulkarni (Jira)" <ji...@apache.org> on 2020/06/05 07:07:00 UTC

[jira] [Commented] (PHOENIX-5940) Pre-4.15 client cannot connect to 4.15+ server after SYSTEM.CATALOG region has split

    [ https://issues.apache.org/jira/browse/PHOENIX-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126462#comment-17126462 ] 

Chinmay Kulkarni commented on PHOENIX-5940:
-------------------------------------------

Technically, this should have been a blocker for 4.15, but I guess we didn't catch it :(. Let's aim to get this into 4.16 at least.

I also think this calls for a review of all our SYSTEM table co-processor invocations in general to ensure we are providing valid start-end keys in the existing client. We should also consider looking into older clients (last 2 minor versions?) to see if similar issues exist. Besides being a huge availability issue for clients like this Jira outlines, it can also lead to unnecessary RPCs which can be avoided. This issue may potentially exist for other invocations on the SYSTEM.CATALOG coprocessor too.

FYI [~yanxinyi] 

> Pre-4.15 client cannot connect to 4.15+ server after SYSTEM.CATALOG region has split
> ------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-5940
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5940
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.14.3
>            Reporter: Chinmay Kulkarni
>            Priority: Blocker
>             Fix For: 4.16.0
>
>
> Steps to repro:
>  # Start the server with 4.15 or 4.16-SNAPSHOT (head of 4.x) with the default setting for splitting SYSTEM.CATALOG i.e. phoenix.system.catalog.splittable=true
>  # Connect with a 4.15+ client and create enough tables/views/indices to cause SYSTEM.CATALOG region to split
>  # Now connect with any pre-4.15 client like 4.14.3. Getting a connection will fail with the following stack trace:
> {noformat}
> Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException): org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2013 (INT15): ERROR 2013 (INT15): MetadataEndpointImpl doGetTable called for table not present on region tableName=SYSTEM.CATALOG SYSTEM.CATALOG
> 	at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:114)
> 	at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getVersion(MetaDataEndpointImpl.java:3214)
> 	at org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:17268)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8338)
> 	at org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2170)
> 	at org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2152)
> 	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35076)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2394)
> 	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> Caused by: java.sql.SQLException: ERROR 2013 (INT15): MetadataEndpointImpl doGetTable called for table not present on region tableName=SYSTEM.CATALOG
> 	at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:575)
> 	at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:195)
> 	at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2916)
> 	at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getVersion(MetaDataEndpointImpl.java:3208)
> 	... 9 more
> 	at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1275)
> 	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
> 	at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)
> 	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.execService(ClientProtos.java:35542)
> 	at org.apache.hadoop.hbase.protobuf.ProtobufUtil.execService(ProtobufUtil.java:1702)
> 	... 13 more
> 20/06/04 19:14:18 WARN client.HTable: Error calling coprocessor service org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService for row
> java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2013 (INT15): ERROR 2013 (INT15): MetadataEndpointImpl doGetTable called for table not present on region tableName=SYSTEM.CATALOG SYSTEM.CATALOG
> 	at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:114)
> 	at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getVersion(MetaDataEndpointImpl.java:3214)
> 	at org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:17268)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8338)
> 	at org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2170)
> 	at org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2152)
> 	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35076)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2394)
> 	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> Caused by: java.sql.SQLException: ERROR 2013 (INT15): MetadataEndpointImpl doGetTable called for table not present on region tableName=SYSTEM.CATALOG
> 	at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:575)
> 	at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:195)
> 	at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2916)
> 	at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getVersion(MetaDataEndpointImpl.java:3208)
> 	... 9 more
> 	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> 	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> 	at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1775)
> 	at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1731)
> 	at org.apache.phoenix.query.ConnectionQueryServicesImpl.checkClientServerCompatibility(ConnectionQueryServicesImpl.java:1350)
> 	at org.apache.phoenix.query.ConnectionQueryServicesImpl.ensureTableCreated(ConnectionQueryServicesImpl.java:1239)
> 	at org.apache.phoenix.query.ConnectionQueryServicesImpl.createTable(ConnectionQueryServicesImpl.java:1576)
> 	at org.apache.phoenix.schema.MetaDataClient.createTableInternal(MetaDataClient.java:2731)
> 	at org.apache.phoenix.schema.MetaDataClient.createTable(MetaDataClient.java:1115)
> 	at org.apache.phoenix.compile.CreateTableCompiler$1.execute(CreateTableCompiler.java:192)
> 	at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:410)
> 	at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:393)
> 	at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
> 	at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:392)
> 	at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:380)
> 	at org.apache.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:1810)
> 	at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:2623)
> 	at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:2586)
> 	at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:76)
> 	at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:2586)
> 	at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:255)
> 	at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:144)
> 	at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:221)
> 	at sqlline.DatabaseConnection.connect(DatabaseConnection.java:157)
> 	at sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:203)
> 	at sqlline.Commands.connect(Commands.java:1064)
> 	at sqlline.Commands.connect(Commands.java:996)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:38)
> 	at sqlline.SqlLine.dispatch(SqlLine.java:809)
> 	at sqlline.SqlLine.initArgs(SqlLine.java:588)
> 	at sqlline.SqlLine.begin(SqlLine.java:661)
> 	at sqlline.SqlLine.start(SqlLine.java:398)
> 	at sqlline.SqlLine.main(SqlLine.java:291)
> {noformat}
> RS logs for the region throwing the error:
> {noformat}
> 2020-06-04 19:14:18,655 ERROR [RpcServer.FifoWFPBQ.default.handler=29,queue=2,port=56704] coprocessor.MetaDataEndpointImpl: loading system catalog table inside getVersion failed
> java.sql.SQLException: ERROR 2013 (INT15): MetadataEndpointImpl doGetTable called for table not present on region tableName=SYSTEM.CATALOG
> 	at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:575)
> 	at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:195)
> 	at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2916)
> 	at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getVersion(MetaDataEndpointImpl.java:3208)
> 	at org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:17268)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8338)
> 	at org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2170)
> 	at org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2152)
> 	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35076)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2394)
> 	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> 2020-06-04 19:14:18,656 DEBUG [RpcServer.FifoWFPBQ.default.handler=29,queue=2,port=56704] ipc.RpcServer: RpcServer.FifoWFPBQ.default.handler=29,queue=2,port=56704: callId: 7 service: ClientService methodName: ExecService size: 131 connection: 10.3.4.181:57305
> org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2013 (INT15): ERROR 2013 (INT15): MetadataEndpointImpl doGetTable called for table not present on region tableName=SYSTEM.CATALOG SYSTEM.CATALOG
> 	at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:114)
> 	at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getVersion(MetaDataEndpointImpl.java:3214)
> 	at org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:17268)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8338)
> 	at org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2170)
> 	at org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2152)
> 	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35076)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2394)
> 	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> Caused by: java.sql.SQLException: ERROR 2013 (INT15): MetadataEndpointImpl doGetTable called for table not present on region tableName=SYSTEM.CATALOG
> 	at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:575)
> 	at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:195)
> 	at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2916)
> 	at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getVersion(MetaDataEndpointImpl.java:3208)
> 	... 9 more
> {noformat}
> The reason why this happens is that in a pre-4.15 client, inside CQSI.checkClientServerCompatibility, the getVersion method is invoked on MetaDataEndpointImpl over *all SYSTEM.CATALOG regions* (we pass in null for startKey and endKey), see [this|https://github.com/apache/phoenix/blob/e2993552dc88cb7fc59fc0dfdaa2876ac260886c/phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java#L1350].
> Inside MetaDataEndpointImpl#getVersion, we [call doGetTable|https://github.com/apache/phoenix/blob/77c6cb32fce04b912b7c502dc170d86af8293fe6/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L3224]
>  Now, if SYSTEM.CATALOG has split, this call will also be invoked on a region that does not contain the header row for SYSTEM.CATALOG causing it to fail in MetaDataEndpointImpl#doGetTable [here|https://github.com/apache/phoenix/blob/77c6cb32fce04b912b7c502dc170d86af8293fe6/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L2928-L2933].
> This is avoided in 4.15+ clients since we have restricted the getVersion invocation to the region containing the header row for SYSTEM.CATALOG (see [this|https://github.com/apache/phoenix/blob/77c6cb32fce04b912b7c502dc170d86af8293fe6/phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java#L1517]).
> We need to add a special condition to consider pre-4.15 clients before propagating the error back to clients inside MetaDataEndpointImpl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)