You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/30 17:10:22 UTC

[GitHub] [hudi] nsivabalan commented on a diff in pull request #6862: [HUDI-4989] Fixing deltastreamer init failures

nsivabalan commented on code in PR #6862:
URL: https://github.com/apache/hudi/pull/6862#discussion_r1008897155


##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java:
##########
@@ -255,44 +255,72 @@ public DeltaSync(HoodieDeltaStreamer.Config cfg, SparkSession sparkSession, Sche
    */
   public void refreshTimeline() throws IOException {
     if (fs.exists(new Path(cfg.targetBasePath))) {
-      HoodieTableMetaClient meta = HoodieTableMetaClient.builder().setConf(new Configuration(fs.getConf())).setBasePath(cfg.targetBasePath).setPayloadClassName(cfg.payloadClassName).build();
-      switch (meta.getTableType()) {
-        case COPY_ON_WRITE:
-          this.commitTimelineOpt = Option.of(meta.getActiveTimeline().getCommitTimeline().filterCompletedInstants());
-          this.allCommitsTimelineOpt = Option.of(meta.getActiveTimeline().getAllCommitsTimeline());
-          break;
-        case MERGE_ON_READ:
-          this.commitTimelineOpt = Option.of(meta.getActiveTimeline().getDeltaCommitTimeline().filterCompletedInstants());
-          this.allCommitsTimelineOpt = Option.of(meta.getActiveTimeline().getAllCommitsTimeline());
-          break;
-        default:
-          throw new HoodieException("Unsupported table type :" + meta.getTableType());
+      try {
+        HoodieTableMetaClient meta = HoodieTableMetaClient.builder().setConf(new Configuration(fs.getConf())).setBasePath(cfg.targetBasePath).setPayloadClassName(cfg.payloadClassName).build();
+        switch (meta.getTableType()) {
+          case COPY_ON_WRITE:
+            this.commitTimelineOpt = Option.of(meta.getActiveTimeline().getCommitTimeline().filterCompletedInstants());
+            this.allCommitsTimelineOpt = Option.of(meta.getActiveTimeline().getAllCommitsTimeline());
+            break;
+          case MERGE_ON_READ:
+            this.commitTimelineOpt = Option.of(meta.getActiveTimeline().getDeltaCommitTimeline().filterCompletedInstants());
+            this.allCommitsTimelineOpt = Option.of(meta.getActiveTimeline().getAllCommitsTimeline());
+            break;
+          default:
+            throw new HoodieException("Unsupported table type :" + meta.getTableType());
+        }
+      } catch (HoodieIOException e) {
+        LOG.warn("Full exception msg " + e.getMessage());
+        if (e.getMessage().contains("Could not load Hoodie properties") && e.getMessage().contains(HoodieTableConfig.HOODIE_PROPERTIES_FILE)) {
+          String basePathWithForwardSlash = cfg.targetBasePath.endsWith("/") ? cfg.targetBasePath : String.format("%s/", cfg.targetBasePath);
+          boolean hoodiePropertiesExists = fs.exists(new Path(basePathWithForwardSlash))
+              && fs.exists(new Path(String.format("%s%s/%s", basePathWithForwardSlash, HoodieTableMetaClient.METAFOLDER_NAME, HoodieTableConfig.HOODIE_PROPERTIES_FILE)))
+              && fs.exists(new Path(String.format("%s%s/%s", basePathWithForwardSlash, HoodieTableMetaClient.METAFOLDER_NAME, HoodieTableConfig.HOODIE_PROPERTIES_FILE_BACKUP)));

Review Comment:
   here is my take after thinking about it for sometime. in regular happy path(L257), we don't want to do multiple .exists() calls and hence probably we should not be checking for both hoodie.properties and hoodie.properties.backup. but if incase happy path fails, within catch block, we can definitely do more direct fs calls and decide how to go about it. 
   
   If you agree, we may be we can do some minor adjustments. in L257 instead of doing fs.exists for base path, we can do fs.exists for hoodie.properties file. But still within catch, we need to check for backup file as well, we can't avoid that. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org