You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/02/14 20:32:21 UTC

[GitHub] [hudi] yihua commented on a change in pull request #4787: [HUDI-2189] Adding delete partitions support to DeltaStreamer

yihua commented on a change in pull request #4787:
URL: https://github.com/apache/hudi/pull/4787#discussion_r806215809



##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##########
@@ -1944,6 +1990,18 @@ public Dataset apply(JavaSparkContext jsc, SparkSession sparkSession, Dataset<Ro
     }
   }
 
+  public static class TestSpecificPartitionTransformer implements Transformer {
+
+    @Override
+    public Dataset<Row> apply(JavaSparkContext jsc, SparkSession sparkSession, Dataset<Row> rowDataset,
+                              TypedProperties properties) {
+      //List<Row> row0 = rowDataset.collectAsList();

Review comment:
       nit: same here.

##########
File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##########
@@ -258,6 +258,20 @@ static void assertRecordCount(long expected, String tablePath, SQLContext sqlCon
       assertEquals(expected, recordCount);
     }
 
+    static Map<String, Long> getPartitionRecordCount(String basePath, SQLContext sqlContext) {
+      sqlContext.clearCache();
+      //List<Row> rows0 = sqlContext.read().format("org.apache.hudi").load(basePath).collectAsList();

Review comment:
       nit: remove unused code?

##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
##########
@@ -555,6 +555,10 @@ public void refreshTimeline() throws IOException {
       case INSERT_OVERWRITE_TABLE:
         writeStatusRDD = writeClient.insertOverwriteTable(records, instantTime).getWriteStatuses();
         break;
+      case DELETE_PARTITION:
+        List<String> partitions = records.map(record -> record.getPartitionPath()).distinct().collect();

Review comment:
       The logic here takes the partition list from the input records and does nothing to the records.  This workflow looks strange to me.  Shouldn't the input be the partition list directly?  It's counter-intuitive for the users to provide a list of records for deleting corresponding partitions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org