You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/06 23:16:47 UTC

[GitHub] [hudi] jonvex opened a new pull request, #6615: [HUDI-4758] Add validations to java spark examples

jonvex opened a new pull request, #6615:
URL: https://github.com/apache/hudi/pull/6615

   ### Change Logs
   
   The examples never check to make sure the desired actions are correct.
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance impact._
   
   **Risk level: none **
   
   _Choose one. If medium or high, explain what verification was done to mitigate the risks._
   
   ### Contributor's checklist
   
   - [ x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ x] Change Logs and Impact were stated clearly
   - [ x] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6615: [HUDI-4758] Add validations to java spark examples

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6615:
URL: https://github.com/apache/hudi/pull/6615#issuecomment-1239988078

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "61214015c3aed029c00882f121e6ec0333767e7f",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11199",
       "triggerID" : "61214015c3aed029c00882f121e6ec0333767e7f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b37307093cf2c6eb20a4e5f738f8bac38f1dba7",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11230",
       "triggerID" : "3b37307093cf2c6eb20a4e5f738f8bac38f1dba7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 61214015c3aed029c00882f121e6ec0333767e7f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11199) 
   * 3b37307093cf2c6eb20a4e5f738f8bac38f1dba7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11230) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #6615: [HUDI-4758] Add validations to java spark examples

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #6615:
URL: https://github.com/apache/hudi/pull/6615#discussion_r972355172


##########
hudi-examples/hudi-examples-spark/src/main/java/org/apache/hudi/examples/quickstart/HoodieSparkQuickstart.java:
##########
@@ -65,30 +66,47 @@ public static void main(String[] args) {
   public static void runQuickstart(JavaSparkContext jsc, SparkSession spark, String tableName, String tablePath) {
     final HoodieExampleDataGenerator<HoodieAvroPayload> dataGen = new HoodieExampleDataGenerator<>();
 
-    insertData(spark, jsc, tablePath, tableName, dataGen);
+    String snapshotQuery = "SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table";
+
+    Dataset<Row> insertDf = insertData(spark, jsc, tablePath, tableName, dataGen);
     queryData(spark, jsc, tablePath, tableName, dataGen);
+    assert insertDf.except(spark.sql(snapshotQuery)).count() == 0;
 
-    updateData(spark, jsc, tablePath, tableName, dataGen);
+    Dataset<Row> snapshotBeforeUpdate = spark.sql(snapshotQuery);
+    Dataset<Row> updateDf = updateData(spark, jsc, tablePath, tableName, dataGen);
     queryData(spark, jsc, tablePath, tableName, dataGen);
+    assert spark.sql(snapshotQuery).intersect(updateDf).count() == updateDf.count();
+    assert spark.sql(snapshotQuery).except(updateDf).except(snapshotBeforeUpdate).count() == 0;
 
     incrementalQuery(spark, tablePath, tableName);
     pointInTimeQuery(spark, tablePath, tableName);
 
-    delete(spark, tablePath, tableName);
+    Dataset<Row> snapshotBeforeDelete = spark.sql(snapshotQuery);
+    Dataset<Row> deleteDf = delete(spark, tablePath, tableName);
     queryData(spark, jsc, tablePath, tableName, dataGen);
+    assert spark.sql(snapshotQuery).intersect(deleteDf).count() == 0;
+    assert snapshotBeforeDelete.except(deleteDf).except(spark.sql(snapshotQuery)).count() == 0;

Review Comment:
   wherever you are calling spark.sql(snapshot) more than twice, lets save it to a dataset, cache and reuse. if not, it will trigger the compute repeatedly 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6615: [HUDI-4758] Add validations to java spark examples

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6615:
URL: https://github.com/apache/hudi/pull/6615#issuecomment-1248656791

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "61214015c3aed029c00882f121e6ec0333767e7f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11199",
       "triggerID" : "61214015c3aed029c00882f121e6ec0333767e7f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b37307093cf2c6eb20a4e5f738f8bac38f1dba7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11230",
       "triggerID" : "3b37307093cf2c6eb20a4e5f738f8bac38f1dba7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d675d338c90b09abbdbcc84003873cf05c40f871",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d675d338c90b09abbdbcc84003873cf05c40f871",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3b37307093cf2c6eb20a4e5f738f8bac38f1dba7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11230) 
   * d675d338c90b09abbdbcc84003873cf05c40f871 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6615: [HUDI-4758] Add validations to java spark examples

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6615:
URL: https://github.com/apache/hudi/pull/6615#issuecomment-1238899373

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "61214015c3aed029c00882f121e6ec0333767e7f",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11199",
       "triggerID" : "61214015c3aed029c00882f121e6ec0333767e7f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 61214015c3aed029c00882f121e6ec0333767e7f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11199) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6615: [HUDI-4758] Add validations to java spark examples

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6615:
URL: https://github.com/apache/hudi/pull/6615#issuecomment-1248661000

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "61214015c3aed029c00882f121e6ec0333767e7f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11199",
       "triggerID" : "61214015c3aed029c00882f121e6ec0333767e7f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b37307093cf2c6eb20a4e5f738f8bac38f1dba7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11230",
       "triggerID" : "3b37307093cf2c6eb20a4e5f738f8bac38f1dba7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d675d338c90b09abbdbcc84003873cf05c40f871",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11395",
       "triggerID" : "d675d338c90b09abbdbcc84003873cf05c40f871",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3b37307093cf2c6eb20a4e5f738f8bac38f1dba7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11230) 
   * d675d338c90b09abbdbcc84003873cf05c40f871 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11395) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6615: [HUDI-4758] Add validations to java spark examples

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6615:
URL: https://github.com/apache/hudi/pull/6615#issuecomment-1240143195

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "61214015c3aed029c00882f121e6ec0333767e7f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11199",
       "triggerID" : "61214015c3aed029c00882f121e6ec0333767e7f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b37307093cf2c6eb20a4e5f738f8bac38f1dba7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11230",
       "triggerID" : "3b37307093cf2c6eb20a4e5f738f8bac38f1dba7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3b37307093cf2c6eb20a4e5f738f8bac38f1dba7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11230) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6615: [HUDI-4758] Add validations to java spark examples

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6615:
URL: https://github.com/apache/hudi/pull/6615#issuecomment-1238761811

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "61214015c3aed029c00882f121e6ec0333767e7f",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11199",
       "triggerID" : "61214015c3aed029c00882f121e6ec0333767e7f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 61214015c3aed029c00882f121e6ec0333767e7f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11199) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #6615: [HUDI-4758] Add validations to java spark examples

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #6615:
URL: https://github.com/apache/hudi/pull/6615#discussion_r965133110


##########
hudi-examples/hudi-examples-spark/src/main/java/org/apache/hudi/examples/quickstart/HoodieSparkQuickstart.java:
##########
@@ -65,30 +66,42 @@ public static void main(String[] args) {
   public static void runQuickstart(JavaSparkContext jsc, SparkSession spark, String tableName, String tablePath) {
     final HoodieExampleDataGenerator<HoodieAvroPayload> dataGen = new HoodieExampleDataGenerator<>();
 
-    insertData(spark, jsc, tablePath, tableName, dataGen);
+    Dataset<Row> insertQueryDataIn = insertData(spark, jsc, tablePath, tableName, dataGen);

Review Comment:
   insertDf



##########
hudi-examples/hudi-examples-spark/src/main/java/org/apache/hudi/examples/quickstart/HoodieSparkQuickstart.java:
##########
@@ -65,30 +66,42 @@ public static void main(String[] args) {
   public static void runQuickstart(JavaSparkContext jsc, SparkSession spark, String tableName, String tablePath) {
     final HoodieExampleDataGenerator<HoodieAvroPayload> dataGen = new HoodieExampleDataGenerator<>();
 
-    insertData(spark, jsc, tablePath, tableName, dataGen);
+    Dataset<Row> insertQueryDataIn = insertData(spark, jsc, tablePath, tableName, dataGen);
     queryData(spark, jsc, tablePath, tableName, dataGen);
+    assert insertQueryDataIn.except(spark.sql("SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table")).count() == 0;
 
-    updateData(spark, jsc, tablePath, tableName, dataGen);
+    Dataset<Row> updateQueryDataIn = updateData(spark, jsc, tablePath, tableName, dataGen);
     queryData(spark, jsc, tablePath, tableName, dataGen);
+    assert spark.sql("SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table").except(insertQueryDataIn).except(updateQueryDataIn).count() == 0;
 
     incrementalQuery(spark, tablePath, tableName);
     pointInTimeQuery(spark, tablePath, tableName);
 
-    delete(spark, tablePath, tableName);
+    Dataset<Row> beforeDelete = spark.sql("SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table");
+    Dataset<Row> deleteQueryIn = delete(spark, tablePath, tableName);

Review Comment:
   deletedDf



##########
hudi-examples/hudi-examples-spark/src/main/java/org/apache/hudi/examples/quickstart/HoodieSparkQuickstart.java:
##########
@@ -65,30 +66,42 @@ public static void main(String[] args) {
   public static void runQuickstart(JavaSparkContext jsc, SparkSession spark, String tableName, String tablePath) {
     final HoodieExampleDataGenerator<HoodieAvroPayload> dataGen = new HoodieExampleDataGenerator<>();
 
-    insertData(spark, jsc, tablePath, tableName, dataGen);
+    Dataset<Row> insertQueryDataIn = insertData(spark, jsc, tablePath, tableName, dataGen);
     queryData(spark, jsc, tablePath, tableName, dataGen);
+    assert insertQueryDataIn.except(spark.sql("SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table")).count() == 0;
 
-    updateData(spark, jsc, tablePath, tableName, dataGen);
+    Dataset<Row> updateQueryDataIn = updateData(spark, jsc, tablePath, tableName, dataGen);
     queryData(spark, jsc, tablePath, tableName, dataGen);
+    assert spark.sql("SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table").except(insertQueryDataIn).except(updateQueryDataIn).count() == 0;
 
     incrementalQuery(spark, tablePath, tableName);
     pointInTimeQuery(spark, tablePath, tableName);
 
-    delete(spark, tablePath, tableName);
+    Dataset<Row> beforeDelete = spark.sql("SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table");
+    Dataset<Row> deleteQueryIn = delete(spark, tablePath, tableName);
     queryData(spark, jsc, tablePath, tableName, dataGen);
+    assert beforeDelete.except(deleteQueryIn).except(spark.sql("SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table")).count() == 0;
 
-    insertOverwriteData(spark, jsc, tablePath, tableName, dataGen);
+    Dataset<Row> beforeOverwrite = spark.sql("SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table");
+    Dataset<Row> overwriteDataIn = insertOverwriteData(spark, jsc, tablePath, tableName, dataGen);
+    Dataset<Row> afterOverwrite = spark.sql("SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table");
+    Dataset<Row> overwriteIntersect = beforeOverwrite.intersect(afterOverwrite);
+    assert afterOverwrite.except(overwriteIntersect).except(overwriteDataIn).count() == 0;
     queryData(spark, jsc, tablePath, tableName, dataGen);
 
+    Dataset<Row> beforeDeleteByPartition = spark.sql(
+        "SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table WHERE partitionpath NOT IN ("
+            + String.join(", ", HoodieExampleDataGenerator.DEFAULT_PARTITION_PATHS) + ")");
     deleteByPartition(spark, tablePath, tableName);
     queryData(spark, jsc, tablePath, tableName, dataGen);
+    assert spark.sql("SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table").except(beforeDeleteByPartition).count() == 0;

Review Comment:
   lets ensure we don't delete all partition paths, but just 1 or 2. 



##########
hudi-examples/hudi-examples-spark/src/main/java/org/apache/hudi/examples/quickstart/HoodieSparkQuickstart.java:
##########
@@ -65,30 +66,42 @@ public static void main(String[] args) {
   public static void runQuickstart(JavaSparkContext jsc, SparkSession spark, String tableName, String tablePath) {
     final HoodieExampleDataGenerator<HoodieAvroPayload> dataGen = new HoodieExampleDataGenerator<>();
 
-    insertData(spark, jsc, tablePath, tableName, dataGen);
+    Dataset<Row> insertQueryDataIn = insertData(spark, jsc, tablePath, tableName, dataGen);
     queryData(spark, jsc, tablePath, tableName, dataGen);
+    assert insertQueryDataIn.except(spark.sql("SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table")).count() == 0;
 
-    updateData(spark, jsc, tablePath, tableName, dataGen);
+    Dataset<Row> updateQueryDataIn = updateData(spark, jsc, tablePath, tableName, dataGen);
     queryData(spark, jsc, tablePath, tableName, dataGen);
+    assert spark.sql("SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table").except(insertQueryDataIn).except(updateQueryDataIn).count() == 0;
 
     incrementalQuery(spark, tablePath, tableName);
     pointInTimeQuery(spark, tablePath, tableName);
 
-    delete(spark, tablePath, tableName);
+    Dataset<Row> beforeDelete = spark.sql("SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table");

Review Comment:
   snapshotBeforeDeleteDf



##########
hudi-examples/hudi-examples-spark/src/main/java/org/apache/hudi/examples/quickstart/HoodieSparkQuickstart.java:
##########
@@ -171,16 +185,19 @@ public static void updateData(SparkSession spark, JavaSparkContext jsc, String t
         .option(TBL_NAME.key(), tableName)
         .mode(Append)
         .save(tablePath);
+    return df;
   }
 
   /**
    * Deleta data based in data information.
    */
-  public static void delete(SparkSession spark, String tablePath, String tableName) {
+  public static Dataset<Row> delete(SparkSession spark, String tablePath, String tableName) {
 
     Dataset<Row> roViewDF = spark.read().format("org.apache.hudi").load(tablePath + "/*/*/*/*");
     roViewDF.createOrReplaceTempView("hudi_ro_table");
-    Dataset<Row> df = spark.sql("select uuid, partitionpath, ts from  hudi_ro_table limit 2");
+    //Dataset<Row> df = spark.sql("select uuid, partitionpath, ts from  hudi_ro_table limit 2");
+    Dataset<Row> ret = spark.sql("SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table limit 2");

Review Comment:
   minor. rename `ret` -> `toBeDeletedDf`



##########
hudi-examples/hudi-examples-spark/src/main/java/org/apache/hudi/examples/quickstart/HoodieSparkQuickstart.java:
##########
@@ -171,16 +185,19 @@ public static void updateData(SparkSession spark, JavaSparkContext jsc, String t
         .option(TBL_NAME.key(), tableName)
         .mode(Append)
         .save(tablePath);
+    return df;
   }
 
   /**
    * Deleta data based in data information.
    */
-  public static void delete(SparkSession spark, String tablePath, String tableName) {
+  public static Dataset<Row> delete(SparkSession spark, String tablePath, String tableName) {
 
     Dataset<Row> roViewDF = spark.read().format("org.apache.hudi").load(tablePath + "/*/*/*/*");
     roViewDF.createOrReplaceTempView("hudi_ro_table");
-    Dataset<Row> df = spark.sql("select uuid, partitionpath, ts from  hudi_ro_table limit 2");
+    //Dataset<Row> df = spark.sql("select uuid, partitionpath, ts from  hudi_ro_table limit 2");

Review Comment:
   remove uncommented code. 



##########
hudi-examples/hudi-examples-spark/src/main/java/org/apache/hudi/examples/quickstart/HoodieSparkQuickstart.java:
##########
@@ -65,30 +66,42 @@ public static void main(String[] args) {
   public static void runQuickstart(JavaSparkContext jsc, SparkSession spark, String tableName, String tablePath) {
     final HoodieExampleDataGenerator<HoodieAvroPayload> dataGen = new HoodieExampleDataGenerator<>();
 
-    insertData(spark, jsc, tablePath, tableName, dataGen);
+    Dataset<Row> insertQueryDataIn = insertData(spark, jsc, tablePath, tableName, dataGen);
     queryData(spark, jsc, tablePath, tableName, dataGen);
+    assert insertQueryDataIn.except(spark.sql("SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table")).count() == 0;
 
-    updateData(spark, jsc, tablePath, tableName, dataGen);
+    Dataset<Row> updateQueryDataIn = updateData(spark, jsc, tablePath, tableName, dataGen);
     queryData(spark, jsc, tablePath, tableName, dataGen);
+    assert spark.sql("SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table").except(insertQueryDataIn).except(updateQueryDataIn).count() == 0;
 
     incrementalQuery(spark, tablePath, tableName);
     pointInTimeQuery(spark, tablePath, tableName);
 
-    delete(spark, tablePath, tableName);
+    Dataset<Row> beforeDelete = spark.sql("SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table");

Review Comment:
   if repeating, declare a constant and re-use 
   `SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table` 
   



##########
hudi-examples/hudi-examples-spark/src/main/java/org/apache/hudi/examples/quickstart/HoodieSparkQuickstart.java:
##########
@@ -65,30 +66,42 @@ public static void main(String[] args) {
   public static void runQuickstart(JavaSparkContext jsc, SparkSession spark, String tableName, String tablePath) {
     final HoodieExampleDataGenerator<HoodieAvroPayload> dataGen = new HoodieExampleDataGenerator<>();
 
-    insertData(spark, jsc, tablePath, tableName, dataGen);
+    Dataset<Row> insertQueryDataIn = insertData(spark, jsc, tablePath, tableName, dataGen);
     queryData(spark, jsc, tablePath, tableName, dataGen);
+    assert insertQueryDataIn.except(spark.sql("SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table")).count() == 0;
 
-    updateData(spark, jsc, tablePath, tableName, dataGen);
+    Dataset<Row> updateQueryDataIn = updateData(spark, jsc, tablePath, tableName, dataGen);

Review Comment:
   updateDf



##########
hudi-examples/hudi-examples-spark/src/main/java/org/apache/hudi/examples/quickstart/HoodieSparkQuickstart.java:
##########
@@ -65,30 +66,42 @@ public static void main(String[] args) {
   public static void runQuickstart(JavaSparkContext jsc, SparkSession spark, String tableName, String tablePath) {
     final HoodieExampleDataGenerator<HoodieAvroPayload> dataGen = new HoodieExampleDataGenerator<>();
 
-    insertData(spark, jsc, tablePath, tableName, dataGen);
+    Dataset<Row> insertQueryDataIn = insertData(spark, jsc, tablePath, tableName, dataGen);
     queryData(spark, jsc, tablePath, tableName, dataGen);
+    assert insertQueryDataIn.except(spark.sql("SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table")).count() == 0;
 
-    updateData(spark, jsc, tablePath, tableName, dataGen);
+    Dataset<Row> updateQueryDataIn = updateData(spark, jsc, tablePath, tableName, dataGen);
     queryData(spark, jsc, tablePath, tableName, dataGen);
+    assert spark.sql("SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table").except(insertQueryDataIn).except(updateQueryDataIn).count() == 0;

Review Comment:
   this might need some fixes. consider all error paths. 



##########
hudi-examples/hudi-examples-spark/src/main/java/org/apache/hudi/examples/quickstart/HoodieSparkQuickstart.java:
##########
@@ -65,30 +66,42 @@ public static void main(String[] args) {
   public static void runQuickstart(JavaSparkContext jsc, SparkSession spark, String tableName, String tablePath) {
     final HoodieExampleDataGenerator<HoodieAvroPayload> dataGen = new HoodieExampleDataGenerator<>();
 
-    insertData(spark, jsc, tablePath, tableName, dataGen);
+    Dataset<Row> insertQueryDataIn = insertData(spark, jsc, tablePath, tableName, dataGen);
     queryData(spark, jsc, tablePath, tableName, dataGen);
+    assert insertQueryDataIn.except(spark.sql("SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table")).count() == 0;

Review Comment:
   probably you can separate out into two lines. 
   val hudiDf = spark.sql("SELECT begin_lat, begin_lon, driver, end_lat, end_lon, fare, partitionpath, rider, ts, uuid FROM hudi_ro_table")
   assert insertDf.except(hudiDf).count == 0



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6615: [HUDI-4758] Add validations to java spark examples

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6615:
URL: https://github.com/apache/hudi/pull/6615#issuecomment-1238759525

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "61214015c3aed029c00882f121e6ec0333767e7f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "61214015c3aed029c00882f121e6ec0333767e7f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 61214015c3aed029c00882f121e6ec0333767e7f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6615: [HUDI-4758] Add validations to java spark examples

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6615:
URL: https://github.com/apache/hudi/pull/6615#issuecomment-1239984954

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "61214015c3aed029c00882f121e6ec0333767e7f",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11199",
       "triggerID" : "61214015c3aed029c00882f121e6ec0333767e7f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b37307093cf2c6eb20a4e5f738f8bac38f1dba7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3b37307093cf2c6eb20a4e5f738f8bac38f1dba7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 61214015c3aed029c00882f121e6ec0333767e7f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11199) 
   * 3b37307093cf2c6eb20a4e5f738f8bac38f1dba7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6615: [HUDI-4758] Add validations to java spark examples

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6615:
URL: https://github.com/apache/hudi/pull/6615#issuecomment-1248803426

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "61214015c3aed029c00882f121e6ec0333767e7f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11199",
       "triggerID" : "61214015c3aed029c00882f121e6ec0333767e7f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b37307093cf2c6eb20a4e5f738f8bac38f1dba7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11230",
       "triggerID" : "3b37307093cf2c6eb20a4e5f738f8bac38f1dba7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d675d338c90b09abbdbcc84003873cf05c40f871",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11395",
       "triggerID" : "d675d338c90b09abbdbcc84003873cf05c40f871",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d675d338c90b09abbdbcc84003873cf05c40f871 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11395) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan merged pull request #6615: [HUDI-4758] Add validations to java spark examples

Posted by GitBox <gi...@apache.org>.
nsivabalan merged PR #6615:
URL: https://github.com/apache/hudi/pull/6615


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org