You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Danny Chen (Jira)" <ji...@apache.org> on 2022/06/20 09:09:00 UTC

[jira] [Commented] (HUDI-4258) HoodieTable removes data file before the end of Flink job

    [ https://issues.apache.org/jira/browse/HUDI-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556285#comment-17556285 ] 

Danny Chen commented on HUDI-4258:
----------------------------------

Fixed via master branch: f1103281d2aebf177231b6c3b6df5cce299957cf

> HoodieTable removes data file before the end of Flink job
> ---------------------------------------------------------
>
>                 Key: HUDI-4258
>                 URL: https://issues.apache.org/jira/browse/HUDI-4258
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: flink
>    Affects Versions: 0.11.0, 0.11.1
>            Reporter: Alexander Trushev
>            Assignee: Alexander Trushev
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.12.0
>
>
> h3. How to reproduce
> Flink SQL:
> {code:sql}
> create table t1 (uuid string, name string) with (
>   'connector' = 'hudi',
>   'path' = '/tmp/hudi',
>   'write.batch.size' = '0.0000000001' -- trigger flush after each tuple
> );
> insert into t1
>   select cast(uuid as string), cast(name as string)
>   from (values ('id1', 'Julian'));
> select * from t1;
> {code}
> Expected result: (id1, Julian)
> Actual result: none
> *Note:* this is a very rare, but possible result. To increase the chances, you need to pause thread before handle write event in org.apache.hudi.sink.StreamWriteOperatorCoordinator#handleEventFromOperator:
> {code:java}
> if (event.isBootstrap()) {
>   handleBootstrapEvent(event);
> } else {
>   Thread.sleep(100); // <-------------
>   handleWriteMetaEvent(event);
> }
> {code}
> h3. Details
> This happens due to there is a valid sequence of operation:
>  # CoordinatorThread: executor.execute(() -> handleWriteMetaEvent(WriteEvent(id1, Julian)))
>  # CoordinatorThread: handleEndInputEvent(EndInputEvent) // WriteEvent(id1, Julian) is not performed by ExecutorThread yet
>  # CoordinatorThread: HoodieTable.finalizeWrite() // removes data file
>  # ExecutorThread: handleWriteMetaEvent(WriteEvent(id1, Julian))



--
This message was sent by Atlassian Jira
(v8.20.7#820007)