You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Alexander Trushev (Jira)" <ji...@apache.org> on 2022/06/15 09:33:00 UTC
[jira] [Created] (HUDI-4258) HoodieTable removes data file before the end of Flink job
Alexander Trushev created HUDI-4258:
---------------------------------------
Summary: HoodieTable removes data file before the end of Flink job
Key: HUDI-4258
URL: https://issues.apache.org/jira/browse/HUDI-4258
Project: Apache Hudi
Issue Type: Bug
Components: flink
Affects Versions: 0.11.0
Reporter: Alexander Trushev
Assignee: Alexander Trushev
h3. How to reproduce
Flink SQL:
{code:sql}
create table t1 (uuid string, name string) with (
'connector' = 'hudi',
'path' = '/tmp/hudi',
'write.batch.size' = '0.0000000001' -- trigger flush after each tuple
);
insert into t1
select cast(uuid as string), cast(name as string)
from (values ('id1', 'Julian'));
select * from t1;
{code}
Expected result: (id1, Julian)
Actual result: none
*Note:* this is a very rare, but possible result. To increase the chances, you need to pause thread before handle write event in org.apache.hudi.sink.StreamWriteOperatorCoordinator#handleEventFromOperator:
{code:java}
if (event.isBootstrap()) {
handleBootstrapEvent(event);
} else {
Thread.sleep(100); // <-------------
handleWriteMetaEvent(event);
}
{code}
h3. Details
This happens due to there is a valid sequence of operation:
# CoordinatorThread: executor.execute(() -> handleWriteMetaEvent(WriteEvent(id1, Julian)))
# CoordinatorThread: handleEndInputEvent(EndInputEvent) // WriteEvent(id1, Julian) is not performed by ExecutorThread yet
# CoordinatorThread: HoodieTable.finalizeWrite() // removes data file
# ExecutorThread: handleWriteMetaEvent(WriteEvent(id1, Julian))
--
This message was sent by Atlassian Jira
(v8.20.7#820007)