You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Alexander Trushev (Jira)" <ji...@apache.org> on 2022/06/15 09:33:00 UTC

[jira] [Created] (HUDI-4258) HoodieTable removes data file before the end of Flink job

Alexander Trushev created HUDI-4258:
---------------------------------------

             Summary: HoodieTable removes data file before the end of Flink job
                 Key: HUDI-4258
                 URL: https://issues.apache.org/jira/browse/HUDI-4258
             Project: Apache Hudi
          Issue Type: Bug
          Components: flink
    Affects Versions: 0.11.0
            Reporter: Alexander Trushev
            Assignee: Alexander Trushev


h3. How to reproduce

Flink SQL:
{code:sql}
create table t1 (uuid string, name string) with (
  'connector' = 'hudi',
  'path' = '/tmp/hudi',
  'write.batch.size' = '0.0000000001' -- trigger flush after each tuple
);

insert into t1
  select cast(uuid as string), cast(name as string)
  from (values ('id1', 'Julian'));

select * from t1;
{code}
Expected result: (id1, Julian)
Actual result: none

*Note:* this is a very rare, but possible result. To increase the chances, you need to pause thread before handle write event in org.apache.hudi.sink.StreamWriteOperatorCoordinator#handleEventFromOperator:
{code:java}
if (event.isBootstrap()) {
  handleBootstrapEvent(event);
} else {
  Thread.sleep(100); // <-------------
  handleWriteMetaEvent(event);
}
{code}
h3. Details

This happens due to there is a valid sequence of operation:
 # CoordinatorThread: executor.execute(() -> handleWriteMetaEvent(WriteEvent(id1, Julian)))
 # CoordinatorThread: handleEndInputEvent(EndInputEvent) // WriteEvent(id1, Julian) is not performed by ExecutorThread yet
 # CoordinatorThread: HoodieTable.finalizeWrite() // removes data file
 # ExecutorThread: handleWriteMetaEvent(WriteEvent(id1, Julian))



--
This message was sent by Atlassian Jira
(v8.20.7#820007)