You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "slfan1989 (via GitHub)" <gi...@apache.org> on 2023/04/02 13:42:49 UTC

[GitHub] [hudi] slfan1989 opened a new pull request, #8360: [HUDI-5389] Remove Hudi Cli Duplicates Code.

slfan1989 opened a new pull request, #8360:
URL: https://github.com/apache/hudi/pull/8360

   ### Change Logs
   
   JIRA: [HUDI-5389](https://issues.apache.org/jira/browse/HUDI-5389). Remove Hudi Cli Duplicates Code.
   
   In the process of reading the code, I found some duplicate code, I think this part of the duplicate code can be removed directly.
   
   | cli  | hudi-spark |
   | ------------- | ------------- |
   | org.apache.hudi.cli.DedupeSparkJob | org.apache.spark.sql.hudi.DedupeSparkJob  |
   | org.apache.hudi.cli.DeDupeType  | org.apache.spark.sql.hudi.DeDupeType |
   | org.apache.hudi.cli.SparkHelpers  | org.apache.spark.sql.hudi.SparkHelpers |
   
   The code on the left side of the table can be directly replaced by the code on the right side of the table, because their contents are exactly the same.
   
   ### Impact
   
   none.
   
   ### Risk level (write none, low medium or high below)
   
   none.
   
   ### Documentation Update
   
   none.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8360: [HUDI-5389] Remove Hudi Cli Duplicates Code.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8360:
URL: https://github.com/apache/hudi/pull/8360#issuecomment-1493339899

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8c4cefd2bb26011d6669c7fbf6f0c85c2ea374df",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "8c4cefd2bb26011d6669c7fbf6f0c85c2ea374df",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8c4cefd2bb26011d6669c7fbf6f0c85c2ea374df UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8360: [HUDI-5389] Remove Hudi Cli Duplicates Code.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8360:
URL: https://github.com/apache/hudi/pull/8360#issuecomment-1493387054

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8c4cefd2bb26011d6669c7fbf6f0c85c2ea374df",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16068",
       "triggerID" : "8c4cefd2bb26011d6669c7fbf6f0c85c2ea374df",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8c4cefd2bb26011d6669c7fbf6f0c85c2ea374df Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16068) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8360: [HUDI-5389] Remove Hudi Cli Duplicates Code.

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8360:
URL: https://github.com/apache/hudi/pull/8360#discussion_r1155424649


##########
hudi-cli/src/main/java/org/apache/hudi/cli/commands/RepairsCommand.java:
##########
@@ -18,7 +18,7 @@
 
 package org.apache.hudi.cli.commands;
 
-import org.apache.hudi.cli.DeDupeType;
+import org.apache.spark.sql.hudi.DeDupeType;
 import org.apache.hudi.cli.HoodieCLI;

Review Comment:
   The hudi-cli-bundle does not really include the hudi-spark-xxx jars, do we need to package them all together?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bvaradar merged pull request #8360: [HUDI-5389] Remove Hudi Cli Duplicates Code.

Posted by "bvaradar (via GitHub)" <gi...@apache.org>.
bvaradar merged PR #8360:
URL: https://github.com/apache/hudi/pull/8360


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] slfan1989 commented on a diff in pull request #8360: [HUDI-5389] Remove Hudi Cli Duplicates Code.

Posted by "slfan1989 (via GitHub)" <gi...@apache.org>.
slfan1989 commented on code in PR #8360:
URL: https://github.com/apache/hudi/pull/8360#discussion_r1158523764


##########
hudi-cli/src/main/java/org/apache/hudi/cli/commands/RepairsCommand.java:
##########
@@ -18,7 +18,7 @@
 
 package org.apache.hudi.cli.commands;
 
-import org.apache.hudi.cli.DeDupeType;
+import org.apache.spark.sql.hudi.DeDupeType;
 import org.apache.hudi.cli.HoodieCLI;

Review Comment:
   Thank you for helping to review the code! I found some duplicate code, I want to improve this part of the code. Do we need to keep `org.apache.hudi.cli.DedupeSparkJob` ? and replace `org.apache.spark.sql.hudi.DedupeSparkJob`, I want to hear your opinion.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bvaradar commented on pull request #8360: [HUDI-5389] Remove Hudi Cli Duplicates Code.

Posted by "bvaradar (via GitHub)" <gi...@apache.org>.
bvaradar commented on PR #8360:
URL: https://github.com/apache/hudi/pull/8360#issuecomment-1498521560

   This is a duplicate of https://github.com/apache/hudi/pull/7457 which was closed accidentally


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8360: [HUDI-5389] Remove Hudi Cli Duplicates Code.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8360:
URL: https://github.com/apache/hudi/pull/8360#issuecomment-1493342704

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8c4cefd2bb26011d6669c7fbf6f0c85c2ea374df",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16068",
       "triggerID" : "8c4cefd2bb26011d6669c7fbf6f0c85c2ea374df",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8c4cefd2bb26011d6669c7fbf6f0c85c2ea374df Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16068) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] slfan1989 commented on a diff in pull request #8360: [HUDI-5389] Remove Hudi Cli Duplicates Code.

Posted by "slfan1989 (via GitHub)" <gi...@apache.org>.
slfan1989 commented on code in PR #8360:
URL: https://github.com/apache/hudi/pull/8360#discussion_r1158523764


##########
hudi-cli/src/main/java/org/apache/hudi/cli/commands/RepairsCommand.java:
##########
@@ -18,7 +18,7 @@
 
 package org.apache.hudi.cli.commands;
 
-import org.apache.hudi.cli.DeDupeType;
+import org.apache.spark.sql.hudi.DeDupeType;
 import org.apache.hudi.cli.HoodieCLI;

Review Comment:
   Thank you for helping to review the code, ! found some duplicate code, I want to improve this part of the code. Do we need to keep `org.apache.hudi.cli.DedupeSparkJob` ? and replace `org.apache.spark.sql.hudi.DedupeSparkJob`, I want to hear your opinion.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8360: [HUDI-5389] Remove Hudi Cli Duplicates Code.

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8360:
URL: https://github.com/apache/hudi/pull/8360#discussion_r1159254853


##########
hudi-cli/src/main/java/org/apache/hudi/cli/commands/RepairsCommand.java:
##########
@@ -18,7 +18,7 @@
 
 package org.apache.hudi.cli.commands;
 
-import org.apache.hudi.cli.DeDupeType;
+import org.apache.spark.sql.hudi.DeDupeType;
 import org.apache.hudi.cli.HoodieCLI;

Review Comment:
   Need to ping the author why there are duplciates here, cc @nsivabalan do you have any impression why these classes are duplicated here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bvaradar commented on a diff in pull request #8360: [HUDI-5389] Remove Hudi Cli Duplicates Code.

Posted by "bvaradar (via GitHub)" <gi...@apache.org>.
bvaradar commented on code in PR #8360:
URL: https://github.com/apache/hudi/pull/8360#discussion_r1159306419


##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/DedupeSparkJob.scala:
##########
@@ -74,7 +74,7 @@ class DedupeSparkJob(basePath: String,
     val metadata = HoodieTableMetaClient.builder().setConf(fs.getConf).setBasePath(basePath).build()
 
     val allFiles = fs.listStatus(new org.apache.hadoop.fs.Path(s"$basePath/$duplicatedPartitionPath"))
-    val fsView = new HoodieTableFileSystemView(metadata, metadata.getActiveTimeline.getCommitTimeline.filterCompletedInstants(), allFiles)
+    val fsView = new HoodieTableFileSystemView(metadata, metadata.getActiveTimeline.getCommitsTimeline.filterCompletedInstants(), allFiles)

Review Comment:
   Thanks for fixing it. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org