You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/05/17 15:52:59 UTC

[GitHub] [hudi] nsivabalan commented on a change in pull request #2902: [HUDI-1800] Exclude file slices in pending compaction when performing small file sizing

nsivabalan commented on a change in pull request #2902:
URL: https://github.com/apache/hudi/pull/2902#discussion_r633650139



##########
File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/commit/TestUpsertPartitioner.java
##########
@@ -306,6 +314,37 @@ public void testUpsertPartitionerWithSmallInsertHandling() throws Exception {
     assertInsertBuckets(weights, cumulativeWeights, insertBuckets);
   }
 
+  @Test
+  public void testUpsertPartitionerWithSmallFileHandlingWithInflightCompactionWithCanIndexLogFiles() throws Exception {
+    // Note this is used because it is same partition path used in CompactionTestUtils.createCompactionPlan()
+    final String testPartitionPath = DEFAULT_PARTITION_PATHS[0];
+
+    HoodieWriteConfig config = makeHoodieClientConfigBuilder()
+            .withCompactionConfig(HoodieCompactionConfig.newBuilder().compactionSmallFileSize(1024).build())
+            .withIndexConfig(HoodieIndexConfig.newBuilder()
+                    .withIndexType(HoodieIndex.IndexType.HBASE)
+                    .withHBaseIndexConfig(HoodieHBaseIndexConfig.newBuilder().build())
+                    .build())
+            .build();
+
+    // This will generate initial commits and create a compaction plan which includes file groups created as part of this
+    HoodieCompactionPlan plan = CompactionTestUtils.createCompactionPlan(metaClient, "001", "002", 1, true, false);
+    FileCreateUtils.createRequestedCompactionCommit(basePath, "002", plan);
+    // Simulate one more commit so that inflight compaction is considered when building file groups in file system view
+    //
+    FileCreateUtils.createLogFile(basePath, testPartitionPath, "003", "2", 1);
+    FileCreateUtils.createCommit(basePath, "003");
+
+    // Partitioner will attempt to assign inserts to file groups including base file created by inflight compaction
+    metaClient = HoodieTableMetaClient.reload(metaClient);
+    HoodieTestDataGenerator dataGenerator = new HoodieTestDataGenerator(new String[] {testPartitionPath});
+    List<HoodieRecord> insertRecords = dataGenerator.generateInserts("004", 100);
+    WorkloadProfile profile = new WorkloadProfile(buildProfile(jsc.parallelize(insertRecords)));
+
+    HoodieSparkTable table = HoodieSparkTable.create(config, context, metaClient);
+    SparkUpsertDeltaCommitPartitioner partitioner = new SparkUpsertDeltaCommitPartitioner(profile, context, table, config);

Review comment:
       sorry, I don't see any assertions with the partitioner. Did you miss to update the patch by any chance. If not, would be good to assert that records are routed to file groups of interest. I believe UpsertPartitioner has public apis to assist in this assertion

##########
File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/commit/TestUpsertPartitioner.java
##########
@@ -306,6 +314,37 @@ public void testUpsertPartitionerWithSmallInsertHandling() throws Exception {
     assertInsertBuckets(weights, cumulativeWeights, insertBuckets);
   }
 
+  @Test
+  public void testUpsertPartitionerWithSmallFileHandlingWithInflightCompactionWithCanIndexLogFiles() throws Exception {
+    // Note this is used because it is same partition path used in CompactionTestUtils.createCompactionPlan()
+    final String testPartitionPath = DEFAULT_PARTITION_PATHS[0];
+
+    HoodieWriteConfig config = makeHoodieClientConfigBuilder()
+            .withCompactionConfig(HoodieCompactionConfig.newBuilder().compactionSmallFileSize(1024).build())
+            .withIndexConfig(HoodieIndexConfig.newBuilder()
+                    .withIndexType(HoodieIndex.IndexType.HBASE)
+                    .withHBaseIndexConfig(HoodieHBaseIndexConfig.newBuilder().build())
+                    .build())
+            .build();
+
+    // This will generate initial commits and create a compaction plan which includes file groups created as part of this
+    HoodieCompactionPlan plan = CompactionTestUtils.createCompactionPlan(metaClient, "001", "002", 1, true, false);
+    FileCreateUtils.createRequestedCompactionCommit(basePath, "002", plan);
+    // Simulate one more commit so that inflight compaction is considered when building file groups in file system view
+    //
+    FileCreateUtils.createLogFile(basePath, testPartitionPath, "003", "2", 1);
+    FileCreateUtils.createCommit(basePath, "003");
+
+    // Partitioner will attempt to assign inserts to file groups including base file created by inflight compaction
+    metaClient = HoodieTableMetaClient.reload(metaClient);
+    HoodieTestDataGenerator dataGenerator = new HoodieTestDataGenerator(new String[] {testPartitionPath});
+    List<HoodieRecord> insertRecords = dataGenerator.generateInserts("004", 100);
+    WorkloadProfile profile = new WorkloadProfile(buildProfile(jsc.parallelize(insertRecords)));
+
+    HoodieSparkTable table = HoodieSparkTable.create(config, context, metaClient);
+    SparkUpsertDeltaCommitPartitioner partitioner = new SparkUpsertDeltaCommitPartitioner(profile, context, table, config);

Review comment:
       Also, I am fairly certain that this will not miss out new file groups where it has only log files w/o any base files. But can we add a test to cover this scenario as well. just to have a comprehensive test. 
   
   scenario:
   create 1 or 2 delta commits to a new file group(FG1). 
   also have few other file groups w/ base files and some delta files.
   ingest new batch. inserts should be routed to file group FG1 as well if adheres to small file sizing. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org