You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2022/09/22 00:50:00 UTC
[jira] [Created] (HUDI-4893) More than 1 splits are created for a single log file for MOR table
sivabalan narayanan created HUDI-4893:
-----------------------------------------
Summary: More than 1 splits are created for a single log file for MOR table
Key: HUDI-4893
URL: https://issues.apache.org/jira/browse/HUDI-4893
Project: Apache Hudi
Issue Type: Bug
Components: reader-core
Reporter: sivabalan narayanan
While debugging a flaky test, realized that we are generating more than 1 split for one log file itself. Root caused it to isSpllitable() that returns true for HoodieRealTimePath.
[https://github.com/apache/hudi/blob/6dbe2960f2eaf0408dc0ef544991cad0190050a9/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java#L91]
I made a quick fix locally and verified that only one split is generated per log file.
{code:java}
git diff hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java
diff --git a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java
index bba44d5c66..d09dfdf753 100644
--- a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java
+++ b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java
@@ -89,7 +89,7 @@ public class HoodieRealtimePath extends Path {
}
public boolean isSplitable() {
- return !toString().isEmpty() && !includeBootstrapFilePath();
+ return !toString().contains(".log") && !includeBootstrapFilePath();
}
public PathWithBootstrapFileStatus getPathWithBootstrapFileStatus() { {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)