You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "danny0405 (via GitHub)" <gi...@apache.org> on 2023/03/08 04:20:56 UTC

[GitHub] [hudi] danny0405 commented on a diff in pull request #8115: [HUDI-5864] Adding file system view refresh regression to our release page

danny0405 commented on code in PR #8115:
URL: https://github.com/apache/hudi/pull/8115#discussion_r1128958955


##########
website/releases/release-0.12.0.md:
##########
@@ -200,6 +200,29 @@ getting duplicate records in your pipeline:
 - Making sure that the [fix](https://github.com/apache/hudi/pull/6883) is
   included in your custom artifacts (if you're building and using ones)
 
+
+We also found another regression related to metadata table and timeline server interplay with streaming ingestion pipelines.
+
+The FileSystemView that Hudi maintains internally could go out of sync due to a occasional race conditions when table services are involved
+(compaction, clustering) and could result in updates and deletes routed to older file versions and hence resulting in missed updates and deletes.
+
+Here are the user-flows that could potentially be impacted with this.
+
+- This impacts pipelines using Deltastreamer in **continuous mode** (sync once is not impacted), Spark streaming, or if you have been directly
+  using write client across batches/commits instead of the standard ways to write to Hudi. In other words, batch writes should not be impacted.
+- Among these write models, this could have an impact only when table services are enabled.
+    - COW: clustering enabled (inline or async)
+    - MOR: compaction enabled (by default, inline or async)
+- Also, the impact is applicable only when metadata table is enabled, and timeline server is enabled (which are defaults as of 0.12.0)
+
+Based on some production data, we expect this issue might impact roughly < 1% of updates to be missed, since its a race condition
+and table services are generally scheduled once every N commits. The percentage of update misses could be even less if the
+frequency of table services is less.
+
+[Here](https://issues.apache.org/jira/browse/HUDI-5863) is the jira for the issue of interest and the fix has already been landed in master.

Review Comment:
   Recently, we found another critical regression for flink metadata sync: https://github.com/apache/hudi/pull/8050, which would cause object reference leak and has risk of OOM for long running streaming job, can we also address it for 0.12.x and 0.13.x release.
   
   The job would crush down seems like after continuous running about 2 weeks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org