You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/02/21 12:37:42 UTC

[GitHub] [hudi] awpengfei opened a new issue #4863: [SUPPORT] Compaction and rollback with Flink cause data loss

awpengfei opened a new issue #4863:
URL: https://github.com/apache/hudi/issues/4863


   **Describe the problem you faced**
   
   * At instant time `20220221085407453`, Flink sent a compaction request to merge the delta log files into the base parquet files.
   ```
   2022-02-21 08:58:50,410 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Create new file for toInstant ?hdfs://da-hdfs/user/hive/warehouse/default.db/hudi_test/.hoodie/20220221085407453.compaction.inflight
   2022-02-21 08:58:50,583 INFO  org.apache.flink.streaming.api.operators.AbstractStreamOperator [] - Execute compaction plan for instant 20220221085407453 as 3 file groups
   ```
   * At time `2022-02-21 09:00:07,398`, an exception occurred in task `hoodie_stream_write` that caused the job to restart.
   ```
   ……
   Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while appending records to hdfs://da-hdfs/user/hive/warehouse/default.db/hudi_test/.5ce039d0-5080-41c2-a2b4-aaae3b92ea36_20220221085407453.log.1_0-1-118
   Caused by: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try.
   ……
   ```
   * When the job finished to restart, Flink sent a rollback request and then the compaction at instant time `20220221085407453` finished.
   ```
   2022-02-21 09:00:08,879 INFO  org.apache.hudi.table.action.rollback.BaseRollbackPlanActionExecutor [] - Requesting Rollback with instant time [==>20220221090008627__rollback__REQUESTED]
   2022-02-21 09:00:08,947 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Create new file for toInstant ?hdfs://da-hdfs/user/hive/warehouse/default.db/hudi_test/.hoodie/20220221085407453.commit
   2022-02-21 09:00:08,947 INFO  org.apache.hudi.client.HoodieFlinkWriteClient                [] - Compacted successfully on commit 20220221085407453
   ```
   * Then the rollback request at instant time `20220221090008627` began to rollback the compaction commit at instant time `20220221085407453`. It deleted the base parquet files with instant time `20220221085407453`.
   ```
   2022-02-21 09:00:09,155 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Create new file for toInstant ?hdfs://da-hdfs/user/hive/warehouse/default.db/hudi_test/.hoodie/20220221090008627.rollback.inflight
   2022-02-21 09:00:09,156 INFO  org.apache.hudi.table.action.rollback.MergeOnReadRollbackActionExecutor [] - Rolling back instant [==>20220221085407453__compaction__INFLIGHT]
   2022-02-21 09:00:09,156 INFO  org.apache.hudi.table.action.rollback.MergeOnReadRollbackActionExecutor [] - Unpublished [==>20220221085407453__compaction__INFLIGHT]
   2022-02-21 09:00:09,205 WARN  org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor [] - Rollback finished without deleting inflight instant file. Instant=[==>20220221085407453__compaction__INFLIGHT]
   2022-02-21 09:00:09,205 INFO  org.apache.hudi.table.action.rollback.MergeOnReadRollbackActionExecutor [] - Time(in ms) taken to finish rollback 49
   2022-02-21 09:00:09,205 INFO  org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor [] - Rolled back inflight instant 20220221085407453
   2022-02-21 09:00:09,206 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Checking for file exists ?hdfs://da-hdfs/user/hive/warehouse/mysql.db/user_auth_hudi/.hoodie/20220221090008627.rollback.inflight
   2022-02-21 09:00:09,313 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Create new file for toInstant ?hdfs://da-hdfs/user/hive/warehouse/mysql.db/user_auth_hudi/.hoodie/20220221090008627.rollback
   2022-02-21 09:00:09,313 INFO  org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor [] - Rollback of Commits [20220221085407453] is complete
   2022-02-21 09:00:09,326 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Loaded instants upto : Option{val=[20220221090008627__rollback__COMPLETED]}
   ```
   ```
   compaction show --instant 20220221085407453
   ╔════════════════╤══════════════════════════════════════╤═══════════════════╤════════════════════════════════════════════════════════════════════════╤═══════════════════╤════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
   ║ Partition Path │ FileId                               │ Base-Instant      │ Data File Path                                                         │ Total Delta Files │ getMetrics                                                                                                             ║
   ╠════════════════╪══════════════════════════════════════╪═══════════════════╪════════════════════════════════════════════════════════════════════════╪═══════════════════╪════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╣
   ║                │ 4ec1bad9-e941-4748-a82d-04461975b3dc │ 20220221084228229 │ 4ec1bad9-e941-4748-a82d-04461975b3dc_0-1-117_20220221084228229.parquet │ 1                 │ {TOTAL_LOG_FILES=1.0, TOTAL_IO_READ_MB=46.0, TOTAL_LOG_FILES_SIZE=7709317.0, TOTAL_IO_WRITE_MB=39.0, TOTAL_IO_MB=85.0} ║
   ╟────────────────┼──────────────────────────────────────┼───────────────────┼────────────────────────────────────────────────────────────────────────┼───────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
   ║                │ 5ce039d0-5080-41c2-a2b4-aaae3b92ea36 │ 20220221084228229 │ 5ce039d0-5080-41c2-a2b4-aaae3b92ea36_0-1-117_20220221084228229.parquet │ 1                 │ {TOTAL_LOG_FILES=1.0, TOTAL_IO_READ_MB=46.0, TOTAL_LOG_FILES_SIZE=7688548.0, TOTAL_IO_WRITE_MB=39.0, TOTAL_IO_MB=85.0} ║
   ╟────────────────┼──────────────────────────────────────┼───────────────────┼────────────────────────────────────────────────────────────────────────┼───────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
   ║                │ 88609801-c541-4dd3-8996-d5588b85fd03 │ 20220221084228229 │ 88609801-c541-4dd3-8996-d5588b85fd03_0-1-117_20220221084228229.parquet │ 2                 │ {TOTAL_LOG_FILES=2.0, TOTAL_IO_READ_MB=44.0, TOTAL_LOG_FILES_SIZE=6908476.0, TOTAL_IO_WRITE_MB=38.0, TOTAL_IO_MB=82.0} ║
   ╚════════════════╧══════════════════════════════════════╧═══════════════════╧════════════════════════════════════════════════════════════════════════╧═══════════════════╧════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
   ```
   ```
   show rollback --instant 20220221090008627
   ╔═══════════════════╤═════════════════════╤═══════════╤═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╤═══════════╗
   ║ Instant           │ Rolledback Instant  │ Partition │ Deleted File                                                                                                                      │ Succeeded ║
   ╠═══════════════════╪═════════════════════╪═══════════╪═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╪═══════════╣
   ║ 20220221090008627 │ [20220221085407453] │           │ hdfs://da-hdfs/user/hive/warehouse/default.db/hudi_test/4ec1bad9-e941-4748-a82d-04461975b3dc_0-1-118_20220221085407453.parquet    │ true      ║
   ╟───────────────────┼─────────────────────┼───────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼───────────╢
   ║ 20220221090008627 │ [20220221085407453] │           │ hdfs://da-hdfs/user/hive/warehouse/default.db/hudi_test/88609801-c541-4dd3-8996-d5588b85fd03_0-1-118_20220221085407453.parquet    │ true      ║
   ╟───────────────────┼─────────────────────┼───────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼───────────╢
   ║ 20220221090008627 │ [20220221085407453] │           │ hdfs://da-hdfs/user/hive/warehouse/default.db/hudi_test/5ce039d0-5080-41c2-a2b4-aaae3b92ea36_0-1-118_20220221085407453.parquet    │ true      ║
   ╚═══════════════════╧═════════════════════╧═══════════╧═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╧═══════════╝
   ```
   * Util the next compaction at instant time `20220221090858111`, the base parquet files at instant time `20220221085407453` had not been generated. That caused the compaction at instant time `20220221090858111` doesn't contain the data before the compaction at instant time `20220221085407453`.
   ```
   compaction show --instant 20220221090858111
   ╔════════════════╤══════════════════════════════════════╤═══════════════════╤════════════════╤═══════════════════╤═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
   ║ Partition Path │ FileId                               │ Base-Instant      │ Data File Path │ Total Delta Files │ getMetrics                                                                                                              ║
   ╠════════════════╪══════════════════════════════════════╪═══════════════════╪════════════════╪═══════════════════╪═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╣
   ║                │ 5ce039d0-5080-41c2-a2b4-aaae3b92ea36 │ 20220221085407453 │ null           │ 2                 │ {TOTAL_LOG_FILES=2.0, TOTAL_IO_READ_MB=9.0, TOTAL_LOG_FILES_SIZE=9751919.0, TOTAL_IO_WRITE_MB=120.0, TOTAL_IO_MB=129.0} ║
   ╟────────────────┼──────────────────────────────────────┼───────────────────┼────────────────┼───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
   ║                │ 4ec1bad9-e941-4748-a82d-04461975b3dc │ 20220221085407453 │ null           │ 1                 │ {TOTAL_LOG_FILES=1.0, TOTAL_IO_READ_MB=9.0, TOTAL_LOG_FILES_SIZE=9673812.0, TOTAL_IO_WRITE_MB=120.0, TOTAL_IO_MB=129.0} ║
   ╟────────────────┼──────────────────────────────────────┼───────────────────┼────────────────┼───────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
   ║                │ 88609801-c541-4dd3-8996-d5588b85fd03 │ 20220221085407453 │ null           │ 1                 │ {TOTAL_LOG_FILES=1.0, TOTAL_IO_READ_MB=8.0, TOTAL_LOG_FILES_SIZE=9316325.0, TOTAL_IO_WRITE_MB=120.0, TOTAL_IO_MB=128.0} ║
   ╚════════════════╧══════════════════════════════════════╧═══════════════════╧════════════════╧═══════════════════╧═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
   ```
   
   **Environment Description**
   
   * Hudi version : 0.10.1
   
   * Hadoop version : 3.3.1
   
   * Flink version : 1.13.5
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 edited a comment on issue #4863: [SUPPORT] Compaction and rollback with Flink cause data loss

Posted by GitBox <gi...@apache.org>.
danny0405 edited a comment on issue #4863:
URL: https://github.com/apache/hudi/issues/4863#issuecomment-1047504055


   Did you mean the coordinator rollback the compaction instant ? This is not as expected because we have excluded the compaction and clustering instant, see:https://github.com/apache/hudi/blob/4d1f74ebeaee857380f69d7c596eaaf0135ca59e/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java#L973 and https://github.com/apache/hudi/blob/4d1f74ebeaee857380f69d7c596eaaf0135ca59e/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java#L1028 the compaction rollback actually happens in `CompactionPlanOperator`: https://github.com/apache/hudi/blob/4d1f74ebeaee857380f69d7c596eaaf0135ca59e/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactionPlanOperator.java#L73
   
   But i believe there was something wrong here, let's dig deeper into this ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on issue #4863: [SUPPORT] Compaction and rollback with Flink cause data loss

Posted by GitBox <gi...@apache.org>.
danny0405 commented on issue #4863:
URL: https://github.com/apache/hudi/issues/4863#issuecomment-1047504055


   Did you mean the coordinator rollback the compaction instant ? This is not as expected because we have excluded the compaction and clustering instant, see:https://github.com/apache/hudi/blob/4d1f74ebeaee857380f69d7c596eaaf0135ca59e/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java#L973 and https://github.com/apache/hudi/blob/4d1f74ebeaee857380f69d7c596eaaf0135ca59e/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java#L1028 the compaction rollback actually happens in `CompactionPlanOperator`: https://github.com/apache/hudi/blob/4d1f74ebeaee857380f69d7c596eaaf0135ca59e/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactionPlanOperator.java#L73


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4863: [SUPPORT] Compaction and rollback with Flink cause data loss

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4863:
URL: https://github.com/apache/hudi/issues/4863#issuecomment-1047218084


   @danny0405 @leesf : can you loop in someone to assist here please. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org