You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/05/18 03:59:57 UTC

[GitHub] [hudi] huberylee opened a new pull request, #5620: [HUDI-4416] Unify clustering/compaction related procedures' output type

huberylee opened a new pull request, #5620:
URL: https://github.com/apache/hudi/pull/5620

   ## What is the purpose of the pull request
   
   Unify clustering/compaction related procedures' output type to provide more useful info
   
   ## Brief change log
   
     - Modify  the output type of `RunCompactionProcedure`, `ShowCompactionProcedure`, `RunClusteringProcedure`, `ShowClusteringProcedure`, `CompactionHoodiePathCommand`, `CompactionHoodieTableCommand`, `CompactionShowHoodiePathCommand`, `CompactionShowHoodieTableCommand`*
     - Modify `TestClusteringProcedure`, `TestCompactionProcedure` and `TestCompactionTable`
   
   ## Verify this pull request
   
   This pull request is already covered by existing tests, such as `TestClusteringProcedure`, `TestCompactionProcedure` and `TestCompactionTable`
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5620: [HUDI-4116] Unify clustering/compaction related procedures' output type

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5620:
URL: https://github.com/apache/hudi/pull/5620#issuecomment-1129550844

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1378f6ce813b4fea31f1408c615e2449198ac970",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1378f6ce813b4fea31f1408c615e2449198ac970",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1378f6ce813b4fea31f1408c615e2449198ac970 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] XuQianJin-Stars merged pull request #5620: [HUDI-4116] Unify clustering/compaction related procedures' output type

Posted by GitBox <gi...@apache.org>.
XuQianJin-Stars merged PR #5620:
URL: https://github.com/apache/hudi/pull/5620


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5620: [HUDI-4116] Unify clustering/compaction related procedures' output type

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5620:
URL: https://github.com/apache/hudi/pull/5620#issuecomment-1130285688

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1378f6ce813b4fea31f1408c615e2449198ac970",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8732",
       "triggerID" : "1378f6ce813b4fea31f1408c615e2449198ac970",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21b8e9510a4feed0c19f8ebdac906a23fad8b202",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8747",
       "triggerID" : "21b8e9510a4feed0c19f8ebdac906a23fad8b202",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 21b8e9510a4feed0c19f8ebdac906a23fad8b202 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8747) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5620: [HUDI-4116] Unify clustering/compaction related procedures' output type

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5620:
URL: https://github.com/apache/hudi/pull/5620#issuecomment-1129552502

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1378f6ce813b4fea31f1408c615e2449198ac970",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8732",
       "triggerID" : "1378f6ce813b4fea31f1408c615e2449198ac970",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1378f6ce813b4fea31f1408c615e2449198ac970 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8732) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5620: [HUDI-4116] Unify clustering/compaction related procedures' output type

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5620:
URL: https://github.com/apache/hudi/pull/5620#issuecomment-1129659284

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1378f6ce813b4fea31f1408c615e2449198ac970",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8732",
       "triggerID" : "1378f6ce813b4fea31f1408c615e2449198ac970",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1378f6ce813b4fea31f1408c615e2449198ac970 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8732) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] XuQianJin-Stars commented on a diff in pull request #5620: [HUDI-4116] Unify clustering/compaction related procedures' output type

Posted by GitBox <gi...@apache.org>.
XuQianJin-Stars commented on code in PR #5620:
URL: https://github.com/apache/hudi/pull/5620#discussion_r875842085


##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestClusteringProcedure.scala:
##########
@@ -325,7 +335,7 @@ class TestClusteringProcedure extends HoodieSparkSqlTestBase {
           assertResult(3)(clusteringPlan.get().getInputGroups.size())
 
           // No pending clustering instant
-          checkAnswer(s"call show_clustering(table => '$tableName')")()
+          spark.sql(s"call show_clustering(table => '$tableName')").show()

Review Comment:
   why change `checkAnswer` to `spark.sql`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] XuQianJin-Stars commented on a diff in pull request #5620: [HUDI-4116] Unify clustering/compaction related procedures' output type

Posted by GitBox <gi...@apache.org>.
XuQianJin-Stars commented on code in PR #5620:
URL: https://github.com/apache/hudi/pull/5620#discussion_r875842983


##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestCompactionProcedure.scala:
##########
@@ -48,22 +48,48 @@ class TestCompactionProcedure extends HoodieSparkSqlTestBase {
       spark.sql(s"insert into $tableName values(4, 'a4', 10, 1000)")
       spark.sql(s"update $tableName set price = 11 where id = 1")
 
-      spark.sql(s"call run_compaction(op => 'schedule', table => '$tableName')")
+      // Schedule the first compaction
+      val firstResult = spark.sql(s"call run_compaction(op => 'schedule', table => '$tableName')")
+        .collect()
+        .map(row => Seq(row.getString(0), row.getInt(1), row.getString(2)))
+
       spark.sql(s"update $tableName set price = 12 where id = 2")
-      spark.sql(s"call run_compaction('schedule', '$tableName')")
-      val compactionRows = spark.sql(s"call show_compaction(table => '$tableName', limit => 10)").collect()
-      val timestamps = compactionRows.map(_.getString(0))
+
+      // Schedule the second compaction
+      val secondResult = spark.sql(s"call run_compaction('schedule', '$tableName')")
+        .collect()
+        .map(row => Seq(row.getString(0), row.getInt(1), row.getString(2)))
+
+      assertResult(1)(firstResult.length)
+      assertResult(1)(secondResult.length)
+      val showCompactionSql: String = s"call show_compaction(table => '$tableName', limit => 10)"
+      checkAnswer(showCompactionSql)(
+        firstResult(0),
+        secondResult(0)
+      )
+
+      val compactionRows = spark.sql(showCompactionSql).collect()
+      val timestamps = compactionRows.map(_.getString(0)).sorted
       assertResult(2)(timestamps.length)
 
-      spark.sql(s"call run_compaction(op => 'run', table => '$tableName', timestamp => ${timestamps(1)})")
+      // Execute the second scheduled compaction instant actually
+      spark.sql(s"call run_compaction(op => 'run', table => '$tableName', timestamp => ${timestamps(1)})").show()

Review Comment:
   Ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5620: [HUDI-4116] Unify clustering/compaction related procedures' output type

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5620:
URL: https://github.com/apache/hudi/pull/5620#issuecomment-1130087894

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1378f6ce813b4fea31f1408c615e2449198ac970",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8732",
       "triggerID" : "1378f6ce813b4fea31f1408c615e2449198ac970",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21b8e9510a4feed0c19f8ebdac906a23fad8b202",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8747",
       "triggerID" : "21b8e9510a4feed0c19f8ebdac906a23fad8b202",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1378f6ce813b4fea31f1408c615e2449198ac970 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8732) 
   * 21b8e9510a4feed0c19f8ebdac906a23fad8b202 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8747) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5620: [HUDI-4116] Unify clustering/compaction related procedures' output type

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5620:
URL: https://github.com/apache/hudi/pull/5620#issuecomment-1130083583

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "1378f6ce813b4fea31f1408c615e2449198ac970",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8732",
       "triggerID" : "1378f6ce813b4fea31f1408c615e2449198ac970",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21b8e9510a4feed0c19f8ebdac906a23fad8b202",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "21b8e9510a4feed0c19f8ebdac906a23fad8b202",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1378f6ce813b4fea31f1408c615e2449198ac970 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8732) 
   * 21b8e9510a4feed0c19f8ebdac906a23fad8b202 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] huberylee commented on a diff in pull request #5620: [HUDI-4116] Unify clustering/compaction related procedures' output type

Posted by GitBox <gi...@apache.org>.
huberylee commented on code in PR #5620:
URL: https://github.com/apache/hudi/pull/5620#discussion_r875907370


##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestClusteringProcedure.scala:
##########
@@ -325,7 +335,7 @@ class TestClusteringProcedure extends HoodieSparkSqlTestBase {
           assertResult(3)(clusteringPlan.get().getInputGroups.size())
 
           // No pending clustering instant
-          checkAnswer(s"call show_clustering(table => '$tableName')")()
+          spark.sql(s"call show_clustering(table => '$tableName')").show()

Review Comment:
   I will change this test case to check more return result info



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] XuQianJin-Stars commented on a diff in pull request #5620: [HUDI-4116] Unify clustering/compaction related procedures' output type

Posted by GitBox <gi...@apache.org>.
XuQianJin-Stars commented on code in PR #5620:
URL: https://github.com/apache/hudi/pull/5620#discussion_r875842634


##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestCompactionProcedure.scala:
##########
@@ -98,25 +124,37 @@ class TestCompactionProcedure extends HoodieSparkSqlTestBase {
       spark.sql(s"insert into $tableName values(3, 'a3', 10, 1000)")
       spark.sql(s"update $tableName set price = 11 where id = 1")
 
-      spark.sql(s"call run_compaction(op => 'run', path => '${tmp.getCanonicalPath}')")
+      spark.sql(s"call run_compaction(op => 'run', path => '${tmp.getCanonicalPath}')").show()
       checkAnswer(s"select id, name, price, ts from $tableName order by id")(
         Seq(1, "a1", 11.0, 1000),
         Seq(2, "a2", 10.0, 1000),
         Seq(3, "a3", 10.0, 1000)
       )
       assertResult(0)(spark.sql(s"call show_compaction(path => '${tmp.getCanonicalPath}')").collect().length)
-      // schedule compaction first
+
       spark.sql(s"update $tableName set price = 12 where id = 1")
-      spark.sql(s"call run_compaction(op=> 'schedule', path => '${tmp.getCanonicalPath}')")
 
-      // schedule compaction second
+      // Schedule the first compaction
+      val firstResult = spark.sql(s"call run_compaction(op=> 'schedule', path => '${tmp.getCanonicalPath}')")
+        .collect()
+        .map(row => Seq(row.getString(0), row.getInt(1), row.getString(2)))
+
       spark.sql(s"update $tableName set price = 12 where id = 2")
-      spark.sql(s"call run_compaction(op => 'schedule', path => '${tmp.getCanonicalPath}')")
 
-      // show compaction
-      assertResult(2)(spark.sql(s"call show_compaction(path => '${tmp.getCanonicalPath}')").collect().length)
-      // run compaction for all the scheduled compaction
-      spark.sql(s"call run_compaction(op => 'run', path => '${tmp.getCanonicalPath}')")
+      // Schedule the second compaction
+      val secondResult = spark.sql(s"call run_compaction(op => 'schedule', path => '${tmp.getCanonicalPath}')")
+        .collect()
+        .map(row => Seq(row.getString(0), row.getInt(1), row.getString(2)))
+
+      assertResult(1)(firstResult.length)
+      assertResult(1)(secondResult.length)
+      checkAnswer(s"call show_compaction(path => '${tmp.getCanonicalPath}')")(
+        firstResult(0),
+        secondResult(0)
+      )
+
+      // Run compaction for all the scheduled compaction
+      spark.sql(s"call run_compaction(op => 'run', path => '${tmp.getCanonicalPath}')").show()

Review Comment:
   Ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org