You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/06 07:20:20 UTC

[GitHub] [hudi] xushiyan commented on a change in pull request #2903: [HUDI-1850] Fixing read of a empty table but with failed write

xushiyan commented on a change in pull request #2903:
URL: https://github.com/apache/hudi/pull/2903#discussion_r779342696



##########
File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/HoodieSparkSqlWriterSuite.scala
##########
@@ -336,7 +337,42 @@ class HoodieSparkSqlWriterSuite extends FunSuite with Matchers {
     }
   }
 
-  test("test bulk insert dataset with datasource impl multiple rounds") {
+  test("test read of a table with one failed write") {
+    initSparkContext("test_read_table_with_one_failed_write")
+    val path = java.nio.file.Files.createTempDirectory("hoodie_test_path")
+    try {
+      val hoodieFooTableName = "hoodie_foo_tbl"
+      val fooTableModifier = Map("path" -> path.toAbsolutePath.toString,
+        HoodieWriteConfig.TABLE_NAME.key() -> hoodieFooTableName,
+        DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY.key() -> "_row_key",
+        DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY.key() -> "partition")
+
+      val fooTableParams = HoodieWriterUtils.parametersWithWriteDefaults(fooTableModifier)
+      val props = new Properties()
+      fooTableParams.foreach(entry => props.setProperty(entry._1, entry._2))
+      val metaClient = HoodieTableMetaClient.initTableAndGetMetaClient(spark.sparkContext.hadoopConfiguration, path.toAbsolutePath.toString, props)
+
+      val partitionAndFileId = new util.HashMap[String, String]()
+      partitionAndFileId.put(HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH, "file-1")
+
+      HoodieTestTable.of(metaClient).withPartitionMetaFiles(HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH)
+        .addInflightCommit("001")
+        .withBaseFilesInPartitions(partitionAndFileId)
+
+      val snapshotDF1 = spark.read.format("org.apache.hudi")
+        .load(path.toAbsolutePath.toString + "/*/*/*/*")
+      snapshotDF1.count()
+      assertFalse(true)
+    }  catch {
+      case e: InvalidTableException =>
+        assertTrue(e.getMessage.contains("Invalid Hoodie Table"))

Review comment:
       Looks like we should fix `InvalidTableException` to say `Hudi` instead of `Hoodie` since this is user-facing message.
   
   ```suggestion
           assertTrue(e.getMessage.contains("Invalid Hudi Table"))
   ```

##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
##########
@@ -106,14 +106,16 @@ class DefaultSource extends RelationProvider
     val metaClient = HoodieTableMetaClient.builder().setConf(fs.getConf).setBasePath(tablePath).build()
     val isBootstrappedTable = metaClient.getTableConfig.getBootstrapBasePath.isPresent
     val tableType = metaClient.getTableType
-
     // First check if the ConfigUtils.IS_QUERY_AS_RO_TABLE has set by HiveSyncTool,
     // or else use query type from QUERY_TYPE_OPT_KEY.
     val queryType = parameters.get(ConfigUtils.IS_QUERY_AS_RO_TABLE)
       .map(is => if (is.toBoolean) QUERY_TYPE_READ_OPTIMIZED_OPT_VAL else QUERY_TYPE_SNAPSHOT_OPT_VAL)
       .getOrElse(parameters.getOrElse(QUERY_TYPE_OPT_KEY.key, QUERY_TYPE_OPT_KEY.defaultValue()))
 
     log.info(s"Is bootstrapped table => $isBootstrappedTable, tableType is: $tableType, queryType is: $queryType")
+    if (metaClient.getCommitsTimeline.filterCompletedInstants.empty()) {
+      throw new InvalidTableException("No valid commits found in the given path " + metaClient.getBasePath)
+    }

Review comment:
       @nsivabalan since this is a validation check, can we move it to L107 just below `metaClient` creation?

##########
File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/HoodieSparkSqlWriterSuite.scala
##########
@@ -336,7 +337,42 @@ class HoodieSparkSqlWriterSuite extends FunSuite with Matchers {
     }
   }
 
-  test("test bulk insert dataset with datasource impl multiple rounds") {
+  test("test read of a table with one failed write") {

Review comment:
       don't think we need a functional test for this. we should be cognizant about over-testing (causing long testing time and maintenance effort). The simple logic flow can be properly covered by a UT on `createRelation()` to assert an exception?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org