You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/05/31 06:29:05 UTC

[GitHub] [hudi] Aload opened a new issue, #5720: When performing RemoteHoodieTableFileSystemView. Refresh, appear HoodieRemoteException: IP:46302 failed to respond

Aload opened a new issue, #5720:
URL: https://github.com/apache/hudi/issues/5720

   
   
   When I use Flink to consume kafka in real time and write to hudi MOR table, the configuration of the response is as follows:
   
   `  .option(FlinkOptions.READ_AS_STREAMING, true)
       .option(FlinkOptions.TABLE_TYPE, HoodieTableType.MERGE_ON_READ)
       //    .option(FlinkOptions.INSERT_CLUSTER, true)
       .option(FlinkOptions.READ_STREAMING_CHECK_INTERVAL, 60)
       //    .option(FlinkOptions.OPERATION, "insert")
       .option(FlinkOptions.BUCKET_ASSIGN_TASKS, taskParallelism)
       .option(FlinkOptions.WRITE_COMMIT_ACK_TIMEOUT, 20000L)
       .option(FlinkOptions.WRITE_PARQUET_MAX_FILE_SIZE, 128)
       .option(FlinkOptions.COMPACTION_MAX_MEMORY, 512)
       .option(FlinkOptions.WRITE_MERGE_MAX_MEMORY, 1024)
       .option(FlinkOptions.WRITE_TASK_MAX_SIZE, 2048D)
       .option(FlinkOptions.COMPACTION_MAX_MEMORY, 1024)
       .option(FlinkOptions.RETRY_INTERVAL_MS, 5000L)
       .option(FlinkOptions.RETRY_TIMES, 5)
       .option(FlinkOptions.WRITE_TASKS, 3 * taskParallelism)
       .option(FlinkOptions.READ_TASKS,4)
       .option(FlinkOptions.COMPACTION_TASKS, taskParallelism)
       .option(FlinkOptions.METADATA_ENABLED, false)
       .option(FlinkOptions.WRITE_RATE_LIMIT, 5000) //写入速率限制开启,防止流量抖动、写入流畅
       .option(FlinkOptions.COMPACTION_ASYNC_ENABLED, true)
       .option(FlinkOptions.COMPACTION_DELTA_COMMITS, "5")
       .option(FlinkOptions.COMPACTION_TRIGGER_STRATEGY, "num_commits")
       .option(FlinkOptions.COMPACTION_TIMEOUT_SECONDS, 5 * 60 * 60)
       .option(FlinkOptions.HIVE_SYNC_DB, sinkDbName)
       .option(FlinkOptions.HIVE_SYNC_TABLE, sinkTableName)
       .option(FlinkOptions.HIVE_SYNC_ENABLED, true)
       .option(FlinkOptions.HIVE_SYNC_MODE, "HMS")
       .option(FlinkOptions.HIVE_SYNC_METASTORE_URIS, "thrift://hdp05:9083")
       .option(FlinkOptions.HIVE_SYNC_JDBC_URL, "jdbc:hive2://hdp05:10000")
       .option(FlinkOptions.HIVE_SYNC_SKIP_RO_SUFFIX, true)`
   ![image](https://user-images.githubusercontent.com/13082598/171106910-49d99849-573a-4b5b-9f14-a70bbac6c86a.png)
   
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.11.0
   
   * Spark version : spark3.2.1
   
   * Hive version :2.3.7
   
   * Hadoop version :2.7.3
   
   * Storage (HDFS/S3/GCS..) :HDFS
   
   * Running on Docker? (yes/no) :no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Aload closed issue #5720: When performing RemoteHoodieTableFileSystemView. Refresh, appear HoodieRemoteException: IP:46302 failed to respond

Posted by GitBox <gi...@apache.org>.
Aload closed issue #5720: When performing RemoteHoodieTableFileSystemView. Refresh, appear HoodieRemoteException: IP:46302 failed to respond
URL: https://github.com/apache/hudi/issues/5720


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Aload commented on issue #5720: When performing RemoteHoodieTableFileSystemView. Refresh, appear HoodieRemoteException: IP:46302 failed to respond

Posted by GitBox <gi...@apache.org>.
Aload commented on issue #5720:
URL: https://github.com/apache/hudi/issues/5720#issuecomment-1143228555

   > when i use spark write data to hudi and set hoodie.metadata.enable=true,The same problem occurred.
   > 
   > After the program was restarted, every time, the error is reported after the program completes about 11 commits, which seems to be the metadata table has been cleaned up once.
   > 
   > but when i set hoodie.metadata.enable=false , The program can run normally for a long time .
   > 
   > versions:spark-3.1.1 , [hudi-0](https://issues.apache.org/jira/browse/HUDI-0).11.0 , hadoop3.3.0
   
   The default value of Flink is metadata.enable=false, but it does not take effect.
   Reference:
   https://github.com/apache/hudi/pull/5617
   https://github.com/apache/hudi/pull/5640
   https://github.com/apache/hudi/pull/5716
   Has been resolved


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Aload commented on issue #5720: When performing RemoteHoodieTableFileSystemView. Refresh, appear HoodieRemoteException: IP:46302 failed to respond

Posted by GitBox <gi...@apache.org>.
Aload commented on issue #5720:
URL: https://github.com/apache/hudi/issues/5720#issuecomment-1141724366

   java.io.IOException: Could not perform checkpoint 1 for operator stream_write (3/12)#0.
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1274) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.io.checkpointing.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:147) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.io.checkpointing.CheckpointBarrierTracker.triggerCheckpointOnAligned(CheckpointBarrierTracker.java:301) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.io.checkpointing.CheckpointBarrierTracker.processBarrier(CheckpointBarrierTracker.java:141) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.handleEvent(CheckpointedInputGate.java:181) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:159) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:110) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:496) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:809) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:761) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) ~[anso-process-0.0.1.jar:?]
   	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_181]
   Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Could not complete snapshot 1 for operator stream_write (3/12)#0. Failure reason: Checkpoint was declined.
   	at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:265) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:170) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:348) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.checkpointStreamOperator(RegularOperatorChain.java:233) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.buildOperatorSnapshotFutures(RegularOperatorChain.java:206) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.snapshotState(RegularOperatorChain.java:186) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:605) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:315) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$14(StreamTask.java:1329) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1315) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1258) ~[anso-process-0.0.1.jar:?]
   	... 16 more
   Caused by: org.apache.hudi.exception.HoodieRemoteException: 10.0.20.53:46302 failed to respond
   	at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.refresh(RemoteHoodieTableFileSystemView.java:420) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.sync(RemoteHoodieTableFileSystemView.java:484) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.sync(PriorityBasedFileSystemView.java:257) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1470) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1496) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.client.HoodieFlinkWriteClient.upsert(HoodieFlinkWriteClient.java:140) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.sink.StreamWriteFunction.lambda$initWriteFunction$1(StreamWriteFunction.java:184) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.sink.StreamWriteFunction.lambda$flushRemaining$7(StreamWriteFunction.java:461) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at java.util.LinkedHashMap$LinkedValues.forEach(LinkedHashMap.java:608) ~[?:1.8.0_181]
   	at org.apache.hudi.sink.StreamWriteFunction.flushRemaining(StreamWriteFunction.java:454) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.sink.StreamWriteFunction.snapshotState(StreamWriteFunction.java:131) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.sink.common.AbstractStreamWriteFunction.snapshotState(AbstractStreamWriteFunction.java:157) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.trySnapshotFunctionState(StreamingFunctionUtils.java:118) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.snapshotFunctionState(StreamingFunctionUtils.java:99) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.snapshotState(AbstractUdfStreamOperator.java:87) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:219) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:170) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:348) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.checkpointStreamOperator(RegularOperatorChain.java:233) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.buildOperatorSnapshotFutures(RegularOperatorChain.java:206) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.snapshotState(RegularOperatorChain.java:186) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:605) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:315) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$14(StreamTask.java:1329) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1315) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1258) ~[anso-process-0.0.1.jar:?]
   	... 16 more
   Caused by: org.apache.http.NoHttpResponseException: 10.0.20.53:46302 failed to respond
   	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:165) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:167) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.client.fluent.Request.execute(Request.java:151) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.executeRequest(RemoteHoodieTableFileSystemView.java:176) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.refresh(RemoteHoodieTableFileSystemView.java:418) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.sync(RemoteHoodieTableFileSystemView.java:484) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.sync(PriorityBasedFileSystemView.java:257) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1470) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1496) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.client.HoodieFlinkWriteClient.upsert(HoodieFlinkWriteClient.java:140) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.sink.StreamWriteFunction.lambda$initWriteFunction$1(StreamWriteFunction.java:184) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.sink.StreamWriteFunction.lambda$flushRemaining$7(StreamWriteFunction.java:461) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at java.util.LinkedHashMap$LinkedValues.forEach(LinkedHashMap.java:608) ~[?:1.8.0_181]
   	at org.apache.hudi.sink.StreamWriteFunction.flushRemaining(StreamWriteFunction.java:454) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.sink.StreamWriteFunction.snapshotState(StreamWriteFunction.java:131) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.sink.common.AbstractStreamWriteFunction.snapshotState(AbstractStreamWriteFunction.java:157) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.trySnapshotFunctionState(StreamingFunctionUtils.java:118) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.snapshotFunctionState(StreamingFunctionUtils.java:99) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.snapshotState(AbstractUdfStreamOperator.java:87) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:219) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:170) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:348) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.checkpointStreamOperator(RegularOperatorChain.java:233) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.buildOperatorSnapshotFutures(RegularOperatorChain.java:206) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.snapshotState(RegularOperatorChain.java:186) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:605) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:315) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$14(StreamTask.java:1329) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1315) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1258) ~[anso-process-0.0.1.jar:?]
   	... 16 more
   2022-05-31 14:03:23,689 WARN  org.apache.hudi.sink.StreamWriteOperatorCoordinator          [] - Reset the event for task [2]
   java.io.IOException: Could not perform checkpoint 1 for operator stream_write (3/12)#0.
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1274) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.io.checkpointing.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:147) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.io.checkpointing.CheckpointBarrierTracker.triggerCheckpointOnAligned(CheckpointBarrierTracker.java:301) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.io.checkpointing.CheckpointBarrierTracker.processBarrier(CheckpointBarrierTracker.java:141) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.handleEvent(CheckpointedInputGate.java:181) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:159) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:110) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:496) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:809) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:761) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) ~[anso-process-0.0.1.jar:?]
   	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_181]
   Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Could not complete snapshot 1 for operator stream_write (3/12)#0. Failure reason: Checkpoint was declined.
   	at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:265) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:170) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:348) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.checkpointStreamOperator(RegularOperatorChain.java:233) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.buildOperatorSnapshotFutures(RegularOperatorChain.java:206) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.snapshotState(RegularOperatorChain.java:186) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:605) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:315) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$14(StreamTask.java:1329) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1315) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1258) ~[anso-process-0.0.1.jar:?]
   	... 16 more
   Caused by: org.apache.hudi.exception.HoodieRemoteException: 10.0.20.53:46302 failed to respond
   	at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.refresh(RemoteHoodieTableFileSystemView.java:420) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.sync(RemoteHoodieTableFileSystemView.java:484) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.sync(PriorityBasedFileSystemView.java:257) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1470) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1496) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.client.HoodieFlinkWriteClient.upsert(HoodieFlinkWriteClient.java:140) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.sink.StreamWriteFunction.lambda$initWriteFunction$1(StreamWriteFunction.java:184) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.sink.StreamWriteFunction.lambda$flushRemaining$7(StreamWriteFunction.java:461) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at java.util.LinkedHashMap$LinkedValues.forEach(LinkedHashMap.java:608) ~[?:1.8.0_181]
   	at org.apache.hudi.sink.StreamWriteFunction.flushRemaining(StreamWriteFunction.java:454) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.sink.StreamWriteFunction.snapshotState(StreamWriteFunction.java:131) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.sink.common.AbstractStreamWriteFunction.snapshotState(AbstractStreamWriteFunction.java:157) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.trySnapshotFunctionState(StreamingFunctionUtils.java:118) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.snapshotFunctionState(StreamingFunctionUtils.java:99) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.snapshotState(AbstractUdfStreamOperator.java:87) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:219) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:170) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:348) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.checkpointStreamOperator(RegularOperatorChain.java:233) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.buildOperatorSnapshotFutures(RegularOperatorChain.java:206) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.snapshotState(RegularOperatorChain.java:186) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:605) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:315) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$14(StreamTask.java:1329) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1315) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1258) ~[anso-process-0.0.1.jar:?]
   	... 16 more
   Caused by: org.apache.http.NoHttpResponseException: 10.0.20.53:46302 failed to respond
   	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:165) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:167) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) ~[anso-process-0.0.1.jar:?]
   	at org.apache.http.client.fluent.Request.execute(Request.java:151) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.executeRequest(RemoteHoodieTableFileSystemView.java:176) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.refresh(RemoteHoodieTableFileSystemView.java:418) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.sync(RemoteHoodieTableFileSystemView.java:484) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.sync(PriorityBasedFileSystemView.java:257) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1470) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1496) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.client.HoodieFlinkWriteClient.upsert(HoodieFlinkWriteClient.java:140) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.sink.StreamWriteFunction.lambda$initWriteFunction$1(StreamWriteFunction.java:184) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.sink.StreamWriteFunction.lambda$flushRemaining$7(StreamWriteFunction.java:461) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at java.util.LinkedHashMap$LinkedValues.forEach(LinkedHashMap.java:608) ~[?:1.8.0_181]
   	at org.apache.hudi.sink.StreamWriteFunction.flushRemaining(StreamWriteFunction.java:454) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.sink.StreamWriteFunction.snapshotState(StreamWriteFunction.java:131) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.hudi.sink.common.AbstractStreamWriteFunction.snapshotState(AbstractStreamWriteFunction.java:157) ~[hudi-flink1.14-bundle_2.12-0.11.0.jar:0.11.0]
   	at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.trySnapshotFunctionState(StreamingFunctionUtils.java:118) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.snapshotFunctionState(StreamingFunctionUtils.java:99) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.snapshotState(AbstractUdfStreamOperator.java:87) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:219) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:170) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:348) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.checkpointStreamOperator(RegularOperatorChain.java:233) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.buildOperatorSnapshotFutures(RegularOperatorChain.java:206) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.snapshotState(RegularOperatorChain.java:186) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:605) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:315) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$14(StreamTask.java:1329) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1315) ~[anso-process-0.0.1.jar:?]
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1258) ~[anso-process-0.0.1.jar:?]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] eric9204 commented on issue #5720: When performing RemoteHoodieTableFileSystemView. Refresh, appear HoodieRemoteException: IP:46302 failed to respond

Posted by GitBox <gi...@apache.org>.
eric9204 commented on issue #5720:
URL: https://github.com/apache/hudi/issues/5720#issuecomment-1143087391

   when i use spark write data to hudi and set hoodie.metadata.enable=true,The same problem occurred.
   
   After the program was restarted, every time, the error is reported after the program completes about 11 commits, which seems to be  the metadata table has been cleaned up once.
   
   versions:spark-3.1.1 ,  hudi-0.11.0  ,  hadoop3.3.0
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org