You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Andrew Sherman (Jira)" <ji...@apache.org> on 2022/07/28 23:54:00 UTC

[jira] [Comment Edited] (IMPALA-11330) Handle missing Iceberg data/metadata gracefully

    [ https://issues.apache.org/jira/browse/IMPALA-11330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572698#comment-17572698 ] 

Andrew Sherman edited comment on IMPALA-11330 at 7/28/22 11:53 PM:
-------------------------------------------------------------------

I did the following
{code:java}
create table abcde3 (i int) stored as iceberg;
insert into  abcde3 values(5);
describe extended abcde3;
{code}
Drop the metadata directory
{code:java}
$ hdfs dfs -ls  /test-warehouse/abcde3/metadata
Found 7 items
-rw-r--r--   3 asherman supergroup       1196 2022-07-28 14:05 /test-warehouse/abcde3/metadata/00000-90ce0b96-d885-4240-bbd3-8dcc3564bf1c.metadata.json
-rw-r--r--   3 asherman supergroup       2235 2022-07-28 16:38 /test-warehouse/abcde3/metadata/00001-ff61a30b-ab3c-44d9-89a6-f3af8e1aa542.metadata.json
-rw-r--r--   3 asherman supergroup       3189 2022-07-28 16:38 /test-warehouse/abcde3/metadata/00002-e3d06ba2-8f4b-40d6-b42e-b895f14303d0.metadata.json
-rw-r--r--   3 asherman supergroup       5708 2022-07-28 16:38 /test-warehouse/abcde3/metadata/5cb43482-e090-4561-a9d5-5eccfe52a007-m0.avro
-rw-r--r--   3 asherman supergroup       5709 2022-07-28 16:38 /test-warehouse/abcde3/metadata/ec597beb-e6d9-4dae-9f04-48dff5a619cc-m0.avro
-rw-r--r--   3 asherman supergroup       3765 2022-07-28 16:38 /test-warehouse/abcde3/metadata/snap-1471986745302219132-1-5cb43482-e090-4561-a9d5-5eccfe52a007.avro
-rw-r--r--   3 asherman supergroup       3831 2022-07-28 16:38 /test-warehouse/abcde3/metadata/snap-4906186431705617426-1-ec597beb-e6d9-4dae-9f04-48dff5a619cc.avro

$ hdfs dfs -rm -r  /test-warehouse/abcde3/metadata
22/07/28 16:38:54 INFO fs.TrashPolicyDefault: Moved: 'hdfs://localhost:20500/test-warehouse/abcde3/metadata' to trash at: hdfs://localhost:20500/user/asherman/.Trash/Current/test-warehouse/abcde3/metadata

$ hdfs dfs -ls  /test-warehouse/abcde3
Found 1 items
drwxr-xr-x   - asherman supergroup          0 2022-07-28 16:38 /test-warehouse/abcde3/data
{code}
Now if I run a select
{code:java}
[localhost:21050] default> select * from abcde3;
Query: select * from abcde3
Query submitted at: 2022-07-28 16:40:59 (Coordinator: http://andrew-desktop:25000)
ERROR: AnalysisException: Failed to load metadata for table: 'abcde3'
CAUSED BY: TableLoadingException: IcebergTableLoadingException: Error loading metadata for Iceberg table hdfs://localhost:20500/test-warehouse/abcde3
CAUSED BY: TableLoadingException: Failed to load Iceberg table with id: default.abcde3
CAUSED BY: NotFoundException: Failed to open input stream for file: hdfs://localhost:20500/test-warehouse/abcde3/metadata/00002-e3d06ba2-8f4b-40d6-b42e-b895f14303d0.metadata.json
CAUSED BY: FileNotFoundException: File does not exist: /test-warehouse/abcde3/metadata/00002-e3d06ba2-8f4b-40d6-b42e-b895f14303d0.metadata.json
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
	at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2035)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:737)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)

CAUSED BY: RemoteException: File does not exist: /test-warehouse/abcde3/metadata/00002-e3d06ba2-8f4b-40d6-b42e-b895f14303d0.metadata.json
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
	at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2035)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:737)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)
{code}
If I drop the table
{code:java}
[localhost:21050] default> drop table abcde3;
Query: drop table abcde3
Table has been dropped.
Fetched 1 row(s) in 0.26s
[localhost:21050] default> describe extended abcde3;
Query: describe extended abcde3
ERROR: AnalysisException: Could not resolve path: 'abcde3'
{code}
and the data is gone:
{code:java}
$ hdfs dfs -ls  /test-warehouse/abcde3
ls: `/test-warehouse/abcde3': No such file or directory
{code}
So it seems like drop is working OK.
And the select does fail, but I am not sure what would be better behavior as the table is in an inconsistent state.


was (Author: asherman):
I did the following
{code}
create table abcde3 (i int) stored as iceberg;
insert into  abcde3 values(5);
describe extended abcde3;
{code}

Drop the metadata directory

{code}
{code}


 

> Handle missing Iceberg data/metadata gracefully
> -----------------------------------------------
>
>                 Key: IMPALA-11330
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11330
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 4.1.0
>            Reporter: Tamas Mate
>            Assignee: Andrew Sherman
>            Priority: Major
>              Labels: impala-iceberg
>
> In case the data/metadata directory is not available for an Iceberg table queries are failing with NotFoundException, see bellow. This affects DROP TABLE as well, which means that an Iceberg table can get stuck in the system if the administrator moves the data.
> {code:none}
> ERROR: NotFoundException: Failed to open input stream for file: hdfs://localhost:20500/test-warehouse/test2/metadata/00001-398886ba-f6eb-4b72-b755-f1be10ac99c5.metadata.json
> CAUSED BY: FileNotFoundException: File does not exist: /test-warehouse/test2/metadata/00001-398886ba-f6eb-4b72-b755-f1be10ac99c5.metadata.json
> 	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
> 	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
> 	at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2035)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:737)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)
> CAUSED BY: RemoteException: File does not exist: /test-warehouse/test2/metadata/00001-398886ba-f6eb-4b72-b755-f1be10ac99c5.metadata.json
> 	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
> 	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
> 	at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2035)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:737)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org