You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/29 05:58:19 UTC

[GitHub] [hudi] gubinjie opened a new issue, #6825: [SUPPORT]org.apache.hudi.exception.HoodieRemoteException: *****:37568 failed to respond

gubinjie opened a new issue, #6825:
URL: https://github.com/apache/hudi/issues/6825

   Hudi :0.10.1
   Flink:1.13.3
   
   When using FlinkSql to write to Hudi, Hudi will have an exception and cause the task to exit abnormally. I don't know why this is caused. Is there any solution? ?
   
   
   
   `2022-09-29 13:38:51,316 INFO  org.apache.hudi.table.action.clean.CleanPlanner              [] - Incremental Cleaning mode is enabled. Looking up partition-paths that have since changed since last cleaned at 20220929130332153. New Instant to retain : Option{val=[20220929130632162__commit__COMPLETED]}
   2022-09-29 13:38:51,320 INFO  org.apache.hudi.client.HoodieFlinkWriteClient                [] - Cleaner has been spawned already. Waiting for it to finish
   2022-09-29 13:38:51,320 INFO  org.apache.hudi.client.AsyncCleanerService                   [] - Waiting for async cleaner to finish
   2022-09-29 13:38:51,320 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Loaded instants upto : Option{val=[20220929133815434__rollback__COMPLETED]}
   2022-09-29 13:38:51,322 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Loaded instants upto : Option{val=[20220929133815434__rollback__COMPLETED]}
   2022-09-29 13:38:51,322 INFO  org.apache.flink.streaming.api.operators.AbstractStreamOperator [] - No compaction plan for checkpoint 206
   2022-09-29 13:38:51,322 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Loaded instants upto : Option{val=[20220929133815434__rollback__COMPLETED]}
   2022-09-29 13:38:51,322 INFO  org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView [] - Sending request : (http://172.16.7.55:37568/v1/hoodie/view/refresh/?basepath=hdfs%3A%2F%2Fpaat-dev%2Fuser%2Fhudi%2Fwarehouse%2Fpaat_ods_hudi.db&lastinstantts=20220929133815434&timelinehash=349d0f08ef10a04fc26f6567771fd2ba51f211bcbfe748b039706ea116397cf2)
   2022-09-29 13:38:51,326 INFO  org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend [] - Closed RocksDB State Backend. Cleaning up RocksDB working directory /tmp/flink-io-90c278ae-7432-43cf-8ed9-0a2b1e517719/job_e91eef4dfcd5908c56bb1a4978a1f564_op_KeyedProcessOperator_86011fb501be3971273edc8de4ea854a__1_2__uuid_7e0588fb-a70f-47c5-b91e-a04ea88f4de0.
   2022-09-29 13:38:51,327 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Loaded instants upto : Option{val=[20220929133815434__rollback__COMPLETED]}
   2022-09-29 13:38:51,327 INFO  org.apache.hudi.table.action.clean.CleanPlanner              [] - Total Partitions to clean : 1, with policy KEEP_LATEST_COMMITS
   2022-09-29 13:38:51,327 INFO  org.apache.hudi.table.action.clean.CleanPlanner              [] - Using cleanerParallelism: 1
   2022-09-29 13:38:51,327 INFO  org.apache.hudi.table.action.clean.CleanPlanner              [] - Cleaning , retaining latest 10 commits. 
   2022-09-29 13:38:51,327 INFO  org.apache.hudi.common.table.view.AbstractTableFileSystemView [] - Building file system view for partition ()
   2022-09-29 13:38:51,328 WARN  org.apache.flink.runtime.taskmanager.Task                    [] - bucket_assigner (1/2)#5 (1316413b42cf9d8a6ea5be1da909a000) switched from RUNNING to FAILED with failure cause: org.apache.hudi.exception.HoodieRemoteException: 172.16.7.55:37568 failed to respond
   	at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.refresh(RemoteHoodieTableFileSystemView.java:420)
   	at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.sync(RemoteHoodieTableFileSystemView.java:484)
   	at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.sync(PriorityBasedFileSystemView.java:257)
   	at org.apache.hudi.sink.partitioner.profile.WriteProfile.reload(WriteProfile.java:252)
   	at org.apache.hudi.sink.partitioner.BucketAssigner.reload(BucketAssigner.java:211)
   	at org.apache.hudi.sink.partitioner.BucketAssignFunction.notifyCheckpointComplete(BucketAssignFunction.java:234)
   	at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.notifyCheckpointComplete(AbstractUdfStreamOperator.java:130)
   	at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.notifyCheckpointComplete(StreamOperatorWrapper.java:99)
   	at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpointComplete(SubtaskCheckpointCoordinatorImpl.java:334)
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:1171)
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointCompleteAsync$10(StreamTask.java:1136)
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointOperation$12(StreamTask.java:1159)
   	at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
   	at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
   	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:344)
   	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:330)
   	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:202)
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:684)
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:639)
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650)
   	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623)
   	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
   	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.http.NoHttpResponseException: 172.16.7.55:37568 failed to respond
   	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
   	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
   	at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
   	at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:165)
   	at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:167)
   	at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
   	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
   	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
   	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
   	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
   	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
   	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
   	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
   	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
   	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
   	at org.apache.http.client.fluent.Request.execute(Request.java:151)
   	at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.executeRequest(RemoteHoodieTableFileSystemView.java:176)
   	at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.refresh(RemoteHoodieTableFileSystemView.java:418)
   	... 23 more
   
   2022-09-29 13:38:51,328 INFO  org.apache.flink.runtime.taskmanager.Task                    [] - Freeing task resources for bucket_assigner (1/2)#5 (1316413b42cf9d8a6ea5be1da909a000).
   2022-09-29 13:38:51,328 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Loaded instants upto : Option{val=[20220929133815434__rollback__COMPLETED]}`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on issue #6825: [SUPPORT]org.apache.hudi.exception.HoodieRemoteException: *****:37568 failed to respond

Posted by GitBox <gi...@apache.org>.
codope commented on issue #6825:
URL: https://github.com/apache/hudi/issues/6825#issuecomment-1263231014

   Yes from the stack trace it seems like the timeline server crashed. To root cause, we will have to look into driver resource monitors. In the upcoming release version, we have added a retry mechanism based on exponential backoff to avoid any throttling issues: https://github.com/apache/hudi/commit/c597eb5d206b528d201a47194e144b8aed732c0e
   However, in the meantime, perhaps you could disable timeline server and remote filesytem view using
   ```
   hoodie.embed.timeline.server=false
   hoodie.filesystem.view.type=MEMORY
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] gubinjie commented on issue #6825: [SUPPORT]org.apache.hudi.exception.HoodieRemoteException: *****:37568 failed to respond

Posted by GitBox <gi...@apache.org>.
gubinjie commented on issue #6825:
URL: https://github.com/apache/hudi/issues/6825#issuecomment-1272269358

   TH


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] gubinjie closed issue #6825: [SUPPORT]org.apache.hudi.exception.HoodieRemoteException: *****:37568 failed to respond

Posted by GitBox <gi...@apache.org>.
gubinjie closed issue #6825: [SUPPORT]org.apache.hudi.exception.HoodieRemoteException: *****:37568 failed to respond
URL: https://github.com/apache/hudi/issues/6825


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6825: [SUPPORT]org.apache.hudi.exception.HoodieRemoteException: *****:37568 failed to respond

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6825:
URL: https://github.com/apache/hudi/issues/6825#issuecomment-1263120280

   guess timeline server crashed for some reason. 
   CC @yihua any thoughts. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org