You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/15 22:00:57 UTC

[GitHub] [hudi] umehrot2 commented on a change in pull request #1768: [HUDI-1054][Peformance] Several performance fixes during finalizing writes

umehrot2 commented on a change in pull request #1768:
URL: https://github.com/apache/hudi/pull/1768#discussion_r455390259



##########
File path: hudi-client/src/main/java/org/apache/hudi/table/HoodieTable.java
##########
@@ -386,13 +389,26 @@ public void finalizeWrite(JavaSparkContext jsc, String instantTs, List<HoodieWri
    *
    * @param instantTs Instant Time
    */
-  public void deleteMarkerDir(String instantTs) {
+  public void deleteMarkerDir(JavaSparkContext jsc, String instantTs) {
     try {
       FileSystem fs = getMetaClient().getFs();
       Path markerDir = new Path(metaClient.getMarkerFolderPath(instantTs));
+
       if (fs.exists(markerDir)) {
         // For append only case, we do not write to marker dir. Hence, the above check
-        LOG.info("Removing marker directory=" + markerDir);
+        LOG.info("Removing marker directory = " + markerDir);
+
+        FileStatus[] fileStatuses = fs.listStatus(markerDir);

Review comment:
       @vinothchandar I don't think there is any different between `EmrFS which uses S3` vs `HDFS` working w.r.t RPC calls made here. EmrFS just implements the HDFS interface, but the internals like RPC calls to the namenode etc. remain the same.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org