You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/10/29 11:12:26 UTC

[GitHub] [iceberg] wg1026688210 opened a new issue #1688: iceberg flink sink job can't restart due to metadata location not found

wg1026688210 opened a new issue #1688:
URL: https://github.com/apache/iceberg/issues/1688


   ### flink sink job fail 
   ##### we found job fail due to hive metastore  server  timeout
   ```
   java.lang.RuntimeException: operation failed for flink_origin_db.iceberg_test_1504_10,localtion viewfs://AutoLfCluster/team/db/hive_db/flink_origin_db/iceberg_test_1504_10/metadata/00077-ca38eacc-c44f-4ea1-b702-de53de28154b.metadata.json=>viewfs://AutoLfCluster/team/db/hive_db/flink_origin_db/iceberg_test_1504_10/metadata/00078-399ca6c3-badf-4f84-9a9a-b58d5f5b02ef.metadata.json
   	at org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:198) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:118) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:293) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:213) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:197) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:189) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:275) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.flink.sink.IcebergFilesCommitter.commitOperation(IcebergFilesCommitter.java:231) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.flink.sink.IcebergFilesCommitter.append(IcebergFilesCommitter.java:220) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.flink.sink.IcebergFilesCommitter.commitUpToCheckpoint(IcebergFilesCommitter.java:192) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.flink.sink.IcebergFilesCommitter.notifyCheckpointComplete(IcebergFilesCommitter.java:173) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.notifyCheckpointComplete(StreamOperatorWrapper.java:107) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpointComplete(SubtaskCheckpointCoordinatorImpl.java:283) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:987) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointCompleteAsync$10(StreamTask.java:958) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointOperation$12(StreamTask.java:974) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:282) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:190) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:566) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:536) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_92]
   Caused by: java.lang.RuntimeException: org.apache.thrift.TApplicationException: Internal error processing alter_table_with_environment_context
   	at org.apache.iceberg.relocated.com.google.common.base.Throwables.propagate(Throwables.java:241) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.common.DynMethods$UnboundMethod.invoke(DynMethods.java:80) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.hive.HiveTableOperations.lambda$persistTable$1(HiveTableOperations.java:226) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.hive.ClientPool.run(ClientPool.java:54) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.hive.HiveTableOperations.persistTable(HiveTableOperations.java:222) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:176) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	... 26 more
   Caused by: org.apache.thrift.TApplicationException: Internal error processing alter_table_with_environment_context
   	at org.apache.thrift.TApplicationException.read(TApplicationException.java:111) ~[hive-exec-2.0.0.jar:2.0.0]
   	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) ~[hive-exec-2.0.0.jar:2.0.0]
   	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_alter_table_with_environment_context(ThriftHiveMetastore.java:1409) ~[hive-exec-2.0.0.jar:2.0.0]
   	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.alter_table_with_environment_context(ThriftHiveMetastore.java:1393) ~[hive-exec-2.0.0.jar:2.0.0]
   	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table(HiveMetaStoreClient.java:377) ~[hive-exec-2.0.0.jar:2.0.0]
   	at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source) ~[?:?]
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_92]
   	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_92]
   	at org.apache.iceberg.common.DynMethods$UnboundMethod.invokeChecked(DynMethods.java:65) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.common.DynMethods$UnboundMethod.invoke(DynMethods.java:77) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.hive.HiveTableOperations.lambda$persistTable$1(HiveTableOperations.java:226) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.hive.ClientPool.run(ClientPool.java:54) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.hive.HiveTableOperations.persistTable(HiveTableOperations.java:222) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:176) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	... 26 more
   ``` 
   
   
   ### error log  in hive 
   ```
   Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
   
           at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
   
           at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
   
           at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376)
   
           at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453)
   
           at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435)
   
           at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
   
           at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
   
           at org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
   
           at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
   
           at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
   
           at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
   
           at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
   
           at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_alter_table_with_environment_context(ThriftHiveMetastore.java:1409)
   
           at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.alter_table_with_environment_context(ThriftHiveMetastore.java:1393)
   
           at com.autohome.server.HMSFederationHandler.alter_table_with_environment_context(HMSFederationHa
   ndler.java:255)
   
           ... 26 common frames omitted
   
   Caused by: java.net.SocketTimeoutException: Read timed out
   
           at java.net.SocketInputStream.socketRead0(Native Method)
   
           at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
   
           at java.net.SocketInputStream.read(SocketInputStream.java:170)
   
           at java.net.SocketInputStream.read(SocketInputStream.java:141)
   
           ... 40 common frames omitted
   
           at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
   
           at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
   
           at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376)
   
           at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453)
   
           at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435)
   
           at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
   
           at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
   
           at org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
   
           at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
   
           at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
   
           at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
   
           at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
   
           at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_alter_table_with_environment_context(ThriftHiveMetastore.java:1409)
   
           at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.alter_table_with_environment_context(ThriftHiveMetastore.java:1393)
   
           at com.autohome.server.HMSFederationHandler.alter_table_with_environment_context(HMSFederationHandler.java:255)
   
           at sun.reflect.GeneratedMethodAccessor158.invoke(Unknown Source)
   
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   
           at java.lang.reflect.Method.invoke(Method.java:498)
   
           at com.autohome.server.HMSFederationHandlerProxy.invoke(HMSFederationHandlerProxy.java:36)
   
           at com.sun.proxy.$Proxy90.alter_table_with_environment_context(Unknown Source)
   
           at sun.reflect.GeneratedMethodAccessor158.invoke(Unknown Source)
   
           at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
   
           at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
   
           at com.sun.proxy.$Proxy90.alter_table_with_environment_context(Unknown Source)
   
           at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
   
           at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
   
           at org.apache.hadoop.hive.metastore.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:48)
   
           at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:703)
   
           at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:698)
   
           at java.security.AccessController.doPrivileged(Native Method)
   
           at javax.security.auth.Subject.doAs(Subject.java:422)
   
           at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   
           at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:698)
   
           at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
   
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   
           at java.lang.Thread.run(Thread.java:745)
   
   Caused by: java.net.SocketTimeoutException: Read timed out
   
           at java.net.SocketInputStream.socketRead0(Native Method)
   
           at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
   
           at java.net.SocketInputStream.read(SocketInputStream.java:170)
   
           at java.net.SocketInputStream.read(SocketInputStream.java:141)
   
           at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
   
   
   
           at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
   
           at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
   
           at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376)
   
           at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453)
   
           at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435)
   
           at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
   
           at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
   
           at org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
   
           at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
   
           at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
   
           at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
   
           at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
   
           at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_alter_table_with_environment_context(ThriftHiveMetastore.java:1409)
   
           at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.alter_table_with_environment_context(ThriftHiveMetastore.java:1393)
   
           at com.autohome.server.HMSFederationHandler.alter_table_with_environment_context(HMSFederationHandler.java:255)
   
           at sun.reflect.GeneratedMethodAccessor158.invoke(Unknown Source)
   
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   
           at java.lang.reflect.Method.invoke(Method.java:498)
   
           at com.autohome.server.HMSFederationHandlerProxy.invoke(HMSFederationHandlerProxy.java:36)
   
           at com.sun.proxy.$Proxy90.alter_table_with_environment_context(Unknown Source)
   
           at sun.reflect.GeneratedMethodAccessor158.invoke(Unknown Source)
   
           at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
   
           at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
   
           at com.sun.proxy.$Proxy90.alter_table_with_environment_context(Unknown Source)
   
           at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
   
           at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
   
           at org.apache.hadoop.hive.metastore.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:48)
   
           at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:703)
   
           at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:698)
   
           at java.security.AccessController.doPrivileged(Native Method)
   
           at javax.security.auth.Subject.doAs(Subject.java:422)
   
           at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   
           at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:698)
   
           at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
   
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   
           at java.lang.Thread.run(Thread.java:745)
   
   Caused by: java.net.SocketTimeoutException: Read timed out
   
           at java.net.SocketInputStream.socketRead0(Native Method)
   
           at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
   
           at java.net.SocketInputStream.read(SocketInputStream.java:170)
   
           at java.net.SocketInputStream.read(SocketInputStream.java:141)
   
           at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
   
           at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
   
   
   ```
   
   
   
   ### job can not restart from checkpoint
   
   and  then  job  can't restart  from checkpoint automatically   due to metadata location not found when hive  catalog loads iceberg table 
   
   ```
   2020-10-28 14:05:36,665 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - IcebergFilesCommitter -> Sink: IcebergSink iceberg_catalog.flink_origin_db.iceberg_test_1504_10 (1/1) (f78a61f9b308af32c673641b23fe9c0f) switched from RUNNING to FAILED on org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@16add4e5.
   org.apache.iceberg.exceptions.NotFoundException: Failed to open input stream for file: viewfs://AutoLfCluster/team/db/hive_db/flink_origin_db/iceberg_test_1504_10/metadata/00078-399ca6c3-badf-4f84-9a9a-b58d5f5b02ef.metadata.json
   	at org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:159) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:238) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:233) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$0(BaseMetastoreTableOperations.java:167) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:213) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:197) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:189) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:166) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:148) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:138) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:86) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:69) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:102) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.flink.TableLoader$CatalogTableLoader.loadTable(TableLoader.java:113) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.iceberg.flink.sink.IcebergFilesCommitter.initializeState(IcebergFilesCommitter.java:115) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.initializeOperatorState(StreamOperatorStateHandler.java:106) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:258) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:290) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:479) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:475) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:528) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) ~[flink-dist_2.11-1.11.2-auto1.1-SNAPSHOT.jar:1.11.2-auto1.1-SNAPSHOT]
   	at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_92]
   Caused by: java.io.FileNotFoundException: File does not exist: /team/db/hive_db/flink_origin_db/iceberg_test_1504_10/metadata/00078-399ca6c3-badf-4f84-9a9a-b58d5f5b02ef.metadata.json
   	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
   	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
   	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
   	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
   	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
   	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:587)
   	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
   	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
   	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
   	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
   	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at javax.security.auth.Subject.doAs(Subject.java:422)
   	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
   
   	at sun.reflect.GeneratedConstructorAccessor47.newInstance(Unknown Source) ~[?:?]
   	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_92]
   	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_92]
   	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) ~[hadoop-common-2.7.2.jar:?]
   	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) ~[hadoop-common-2.7.2.jar:?]
   	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1228) ~[hadoop-hdfs-2.7.2.jar:?]
   	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213) ~[hadoop-hdfs-2.7.2.jar:?]
   	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) ~[hadoop-hdfs-2.7.2.jar:?]
   	at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:306) ~[hadoop-hdfs-2.7.2.jar:?]
   	at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:272) ~[hadoop-hdfs-2.7.2.jar:?]
   	at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:264) ~[hadoop-hdfs-2.7.2.jar:?]
   	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1526) ~[hadoop-hdfs-2.7.2.jar:?]
   	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304) ~[hadoop-hdfs-2.7.2.jar:?]
   	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:299) ~[hadoop-hdfs-2.7.2.jar:?]
   	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-2.7.2.jar:?]
   	at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:312) ~[hadoop-hdfs-2.7.2.jar:?]
   	at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:161) ~[hadoop-common-2.7.2.jar:?]
   	at org.apache.hadoop.fs.viewfs.ChRootedFileSystem.open(ChRootedFileSystem.java:261) ~[hadoop-common-2.7.2.jar:?]
   	at org.apache.hadoop.fs.viewfs.ViewFileSystem.open(ViewFileSystem.java:459) ~[hadoop-common-2.7.2.jar:?]
   	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767) ~[hadoop-common-2.7.2.jar:?]
   	at org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:157) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	... 25 more
   Caused by: org.apache.hadoop.ipc.RemoteException: File does not exist: /team/db/hive_db/flink_origin_db/iceberg_test_1504_10/metadata/00078-399ca6c3-badf-4f84-9a9a-b58d5f5b02ef.metadata.json
   	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
   	at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
   	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
   	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
   	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
   	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:587)
   	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
   	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
   	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
   	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
   	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at javax.security.auth.Subject.doAs(Subject.java:422)
   	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
   
   	at org.apache.hadoop.ipc.Client.call(Client.java:1475) ~[hadoop-common-2.7.2.jar:?]
   	at org.apache.hadoop.ipc.Client.call(Client.java:1412) ~[hadoop-common-2.7.2.jar:?]
   	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) ~[hadoop-common-2.7.2.jar:?]
   	at com.sun.proxy.$Proxy26.getBlockLocations(Unknown Source) ~[?:?]
   	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:255) ~[hadoop-hdfs-2.7.2.jar:?]
   	at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) ~[?:?]
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_92]
   	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_92]
   	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) ~[hadoop-common-2.7.2.jar:?]
   	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) ~[hadoop-common-2.7.2.jar:?]
   	at com.sun.proxy.$Proxy27.getBlockLocations(Unknown Source) ~[?:?]
   	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1226) ~[hadoop-hdfs-2.7.2.jar:?]
   	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213) ~[hadoop-hdfs-2.7.2.jar:?]
   	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) ~[hadoop-hdfs-2.7.2.jar:?]
   	at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:306) ~[hadoop-hdfs-2.7.2.jar:?]
   	at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:272) ~[hadoop-hdfs-2.7.2.jar:?]
   	at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:264) ~[hadoop-hdfs-2.7.2.jar:?]
   	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1526) ~[hadoop-hdfs-2.7.2.jar:?]
   	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304) ~[hadoop-hdfs-2.7.2.jar:?]
   	at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:299) ~[hadoop-hdfs-2.7.2.jar:?]
   	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-2.7.2.jar:?]
   	at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:312) ~[hadoop-hdfs-2.7.2.jar:?]
   	at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:161) ~[hadoop-common-2.7.2.jar:?]
   	at org.apache.hadoop.fs.viewfs.ChRootedFileSystem.open(ChRootedFileSystem.java:261) ~[hadoop-common-2.7.2.jar:?]
   	at org.apache.hadoop.fs.viewfs.ViewFileSystem.open(ViewFileSystem.java:459) ~[hadoop-common-2.7.2.jar:?]
   	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767) ~[hadoop-common-2.7.2.jar:?]
   	at org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:157) ~[iceberg-flink-runtime-3d308d3.dirty.jar-1603859992059.jar:?]
   	... 25 more
   ```
   
   
   
   
   We found metadatafile will be deleted when  alter hms table  operation  throwed exception (timeout) .    but still alter opteration still success when hive metastore api occur   ```Internal error processing alter_table_with_environment_context```.
   
   https://github.com/apache/iceberg/blob/4ba48be69963fded08bcd9d0a28e14202191585d/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L215
   
   https://github.com/apache/iceberg/blob/4ba48be69963fded08bcd9d0a28e14202191585d/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L201-L204
   
   
   So we add a improvment on loading table in iceberg HiveTableOperations to do some fault-tolerant .  
   If metadataLocation  is not found, 
   1. Try to use  previousMetadataLocation. 
   2. Set the previousMetadataLocation path value to metadataLocation property of hive table , in case   both location is unavaliable when job restart from checkpoint.
   
   This work  when the errot occuring again .
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] KarlManong commented on issue #1688: iceberg flink sink job can't restart due to metadata location not found

Posted by GitBox <gi...@apache.org>.
KarlManong commented on issue #1688:
URL: https://github.com/apache/iceberg/issues/1688#issuecomment-1084223547


   I think #3717 can fix this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org