You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/08 07:57:43 UTC

[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #5791: [MINOR] follow up HUDI-4178 automatically enable schema evolution when read hoodie table.

xiarixiaoyao commented on code in PR #5791:
URL: https://github.com/apache/hudi/pull/5791#discussion_r892040473


##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/io/FileBasedInternalSchemaStorageManager.java:
##########
@@ -131,6 +131,27 @@ private List<String> getValidInstants() {
         .filterCompletedInstants().getInstants().map(f -> f.getTimestamp()).collect(Collectors.toList());
   }
 
+  /**
+   * Return whether an available historySchema file exist in schema folder or not.
+   */
+  public boolean isValidHistorySchemaExist() {
+    try {
+      List<String> validateCommits = getValidInstants();
+      FileSystem fs = FSUtils.getFs(baseSchemaPath.toString(), conf);
+      if (fs.exists(baseSchemaPath)) {
+        List<String> validaSchemaFiles = Arrays.stream(fs.listStatus(baseSchemaPath))
+            .filter(f -> f.isFile() && f.getPath().getName().endsWith(SCHEMA_COMMIT_ACTION))
+            .map(file -> file.getPath().getName()).filter(f -> validateCommits.contains(f.split("\\.")[0])).sorted().collect(Collectors.toList());

Review Comment:
   good question
   1) if schema evolution happend,some schema files will be exists in schema folder.  we check the exist of those schema files to set schema evolution auto, this operation should be called once and then we set sparkSession.sessionState.conf.setConfString(DataSourceReadOptions.SCHEMA_EVOLUTION_ENABLED.key, result.toString) to avoid repeated call this function.  see [HoodieBaseRelation.scala](https://github.com/apache/hudi/pull/5791/files#diff-b95f9369e8ae90c511e1cff0863c8207d61c9e3dc2345350552a74d3a068bd31) line 517
   2)  if no schema evolution happend, no schema files  exists in schema folder, this folder should be empty.  when we call fs.listStatus() for empty folder, this operation should be  very fast.
   
   finally:  There won't be many schema files schema folder, There are at most 10 files in this directory.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org