You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Danny Chen (Jira)" <ji...@apache.org> on 2022/06/20 09:09:00 UTC
[jira] [Commented] (HUDI-4258) HoodieTable removes data file before the end of Flink job
[ https://issues.apache.org/jira/browse/HUDI-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556285#comment-17556285 ]
Danny Chen commented on HUDI-4258:
----------------------------------
Fixed via master branch: f1103281d2aebf177231b6c3b6df5cce299957cf
> HoodieTable removes data file before the end of Flink job
> ---------------------------------------------------------
>
> Key: HUDI-4258
> URL: https://issues.apache.org/jira/browse/HUDI-4258
> Project: Apache Hudi
> Issue Type: Bug
> Components: flink
> Affects Versions: 0.11.0, 0.11.1
> Reporter: Alexander Trushev
> Assignee: Alexander Trushev
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.12.0
>
>
> h3. How to reproduce
> Flink SQL:
> {code:sql}
> create table t1 (uuid string, name string) with (
> 'connector' = 'hudi',
> 'path' = '/tmp/hudi',
> 'write.batch.size' = '0.0000000001' -- trigger flush after each tuple
> );
> insert into t1
> select cast(uuid as string), cast(name as string)
> from (values ('id1', 'Julian'));
> select * from t1;
> {code}
> Expected result: (id1, Julian)
> Actual result: none
> *Note:* this is a very rare, but possible result. To increase the chances, you need to pause thread before handle write event in org.apache.hudi.sink.StreamWriteOperatorCoordinator#handleEventFromOperator:
> {code:java}
> if (event.isBootstrap()) {
> handleBootstrapEvent(event);
> } else {
> Thread.sleep(100); // <-------------
> handleWriteMetaEvent(event);
> }
> {code}
> h3. Details
> This happens due to there is a valid sequence of operation:
> # CoordinatorThread: executor.execute(() -> handleWriteMetaEvent(WriteEvent(id1, Julian)))
> # CoordinatorThread: handleEndInputEvent(EndInputEvent) // WriteEvent(id1, Julian) is not performed by ExecutorThread yet
> # CoordinatorThread: HoodieTable.finalizeWrite() // removes data file
> # ExecutorThread: handleWriteMetaEvent(WriteEvent(id1, Julian))
--
This message was sent by Atlassian Jira
(v8.20.7#820007)