You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Alexander Trushev (Jira)" <ji...@apache.org> on 2022/06/15 09:33:00 UTC

[jira] [Updated] (HUDI-4258) HoodieTable removes data file before the end of Flink job

     [ https://issues.apache.org/jira/browse/HUDI-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexander Trushev updated HUDI-4258:
------------------------------------
    Status: In Progress  (was: Open)

> HoodieTable removes data file before the end of Flink job
> ---------------------------------------------------------
>
>                 Key: HUDI-4258
>                 URL: https://issues.apache.org/jira/browse/HUDI-4258
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: flink
>    Affects Versions: 0.11.0
>            Reporter: Alexander Trushev
>            Assignee: Alexander Trushev
>            Priority: Major
>
> h3. How to reproduce
> Flink SQL:
> {code:sql}
> create table t1 (uuid string, name string) with (
>   'connector' = 'hudi',
>   'path' = '/tmp/hudi',
>   'write.batch.size' = '0.0000000001' -- trigger flush after each tuple
> );
> insert into t1
>   select cast(uuid as string), cast(name as string)
>   from (values ('id1', 'Julian'));
> select * from t1;
> {code}
> Expected result: (id1, Julian)
> Actual result: none
> *Note:* this is a very rare, but possible result. To increase the chances, you need to pause thread before handle write event in org.apache.hudi.sink.StreamWriteOperatorCoordinator#handleEventFromOperator:
> {code:java}
> if (event.isBootstrap()) {
>   handleBootstrapEvent(event);
> } else {
>   Thread.sleep(100); // <-------------
>   handleWriteMetaEvent(event);
> }
> {code}
> h3. Details
> This happens due to there is a valid sequence of operation:
>  # CoordinatorThread: executor.execute(() -> handleWriteMetaEvent(WriteEvent(id1, Julian)))
>  # CoordinatorThread: handleEndInputEvent(EndInputEvent) // WriteEvent(id1, Julian) is not performed by ExecutorThread yet
>  # CoordinatorThread: HoodieTable.finalizeWrite() // removes data file
>  # ExecutorThread: handleWriteMetaEvent(WriteEvent(id1, Julian))



--
This message was sent by Atlassian Jira
(v8.20.7#820007)