You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "yihua (via GitHub)" <gi...@apache.org> on 2023/02/13 05:51:37 UTC

[GitHub] [hudi] yihua opened a new pull request, #7927: [HUDI-5771] Improve deploy script of release artifacts

yihua opened a new pull request, #7927:
URL: https://github.com/apache/hudi/pull/7927

   ### Change Logs
   
   The current `scripts/release/deploy_staging_jars.sh` took around 6 hours to upload all release artifacts to the Apache Nexus staging repository, which is too long.  After analyzing the upload sequence, there are repeated uploads of the same module that can be avoided.
   
   After carefully reviewing the deploy script and logs, I make the following changes to cut down the upload time by 70%, without changing the intended jars for uploads:
   - For each profile (e.g., `-Dscala-2.12 -Dspark3.2`), only make one mvn build
   - Remove overlapping build targets among different profiles
     - For Spark 2.4, Scala 2.11: `hudi-spark-common_2.11`, `hudi-spark_2.11`, `hudi-spark2_2.11`, `hudi-utilities_2.11`, `hudi-cli-bundle_2.11`, `hudi-spark2.4-bundle_2.11`, `hudi-utilities-bundle_2.11`, `hudi-utilities-slim-bundle_2.11`
     - For Spark 2.4, Scala 2.12: `hudi-spark2.4-bundle_2.12`
     - For Spark 3.2, Scala 2.12: `hudi-spark3.2.x_2.12`, `hudi-spark3.2plus-common`, `hudi-spark3.2-bundle_2.12`
     - For Spark 3.3, Scala 2.12: `hudi-spark3.3.x_2.12`, `hudi-cli-bundle_2.12`, `hudi-spark3.3-bundle_2.12`
     - For Spark 3.1, Scala 2.12: all other modules and bundles (`hudi-cli-bundle_2.12` is not overridden)
   
   Legacy Spark bundles and Flink bundles are not changed.
   
   Raw logs:
   - Summary of existing upload sequence: [deploy_sequence.txt](https://github.com/apache/hudi/files/10719044/deploy_sequence.txt)
   - Last modified times of uploaded artifacts for analyzing the relevant upload and profile: [staging_file_timestamp.txt](https://github.com/apache/hudi/files/10719051/staging_file_timestamp.txt)
   
   ### Impact
   
   Significantly reduces the time (by ~70%, from 6 hours to <2 hours) of uploading all release artifacts to the Apache Nexus staging repository.
   
   ### Risk level
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #7927: [HUDI-5771] Improve deploy script of release artifacts

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan commented on code in PR #7927:
URL: https://github.com/apache/hudi/pull/7927#discussion_r1104786887


##########
scripts/release/deploy_staging_jars.sh:
##########
@@ -36,38 +36,41 @@ if [ "$#" -gt "1" ]; then
   exit 1
 fi
 
-BUNDLE_MODULES=$(find -s packaging -name 'hudi-*-bundle' -type d)
-BUNDLE_MODULES_EXCLUDED="-${BUNDLE_MODULES//$'\n'/,-}"
-
 declare -a ALL_VERSION_OPTS=(
-# upload all module jars and bundle jars
-"-Dscala-2.11 -Dspark2.4 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark2.4 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark3.3 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark3.2 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark3.1"  # this profile goes last in this section to ensure bundles use avro 1.8
-
-# spark bundles
-"-Dscala-2.11 -Dspark2.4 -pl packaging/hudi-spark-bundle,packaging/hudi-cli-bundle -am"
+# Upload Spark specific modules and bundle jars
+# For Spark 2.4, Scala 2.11:
+# hudi-spark-common_2.11
+# hudi-spark_2.11
+# hudi-spark2_2.11
+# hudi-utilities_2.11
+# hudi-cli-bundle_2.11
+# hudi-spark2.4-bundle_2.11
+# hudi-utilities-bundle_2.11
+# hudi-utilities-slim-bundle_2.11
+"-Dscala-2.11 -Dspark2.4 -pl hudi-spark-datasource/hudi-spark-common,hudi-spark-datasource/hudi-spark2,hudi-spark-datasource/hudi-spark,hudi-utilities,packaging/hudi-spark-bundle,packaging/hudi-cli-bundle,packaging/hudi-utilities-bundle,packaging/hudi-utilities-slim-bundle -am"

Review Comment:
   it should be built and uploaded under spark 2 scala 11 profile right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua merged pull request #7927: [HUDI-5771] Improve deploy script of release artifacts

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua merged PR #7927:
URL: https://github.com/apache/hudi/pull/7927


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #7927: [HUDI-5771] Improve deploy script of release artifacts

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on code in PR #7927:
URL: https://github.com/apache/hudi/pull/7927#discussion_r1104034135


##########
scripts/release/deploy_staging_jars.sh:
##########
@@ -36,38 +36,41 @@ if [ "$#" -gt "1" ]; then
   exit 1
 fi
 
-BUNDLE_MODULES=$(find -s packaging -name 'hudi-*-bundle' -type d)
-BUNDLE_MODULES_EXCLUDED="-${BUNDLE_MODULES//$'\n'/,-}"
-
 declare -a ALL_VERSION_OPTS=(
-# upload all module jars and bundle jars
-"-Dscala-2.11 -Dspark2.4 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark2.4 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark3.3 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark3.2 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark3.1"  # this profile goes last in this section to ensure bundles use avro 1.8
-
-# spark bundles
-"-Dscala-2.11 -Dspark2.4 -pl packaging/hudi-spark-bundle,packaging/hudi-cli-bundle -am"
+# Upload Spark specific modules and bundle jars
+# For Spark 2.4, Scala 2.11:
+# hudi-spark-common_2.11
+# hudi-spark_2.11
+# hudi-spark2_2.11
+# hudi-utilities_2.11
+# hudi-cli-bundle_2.11
+# hudi-spark2.4-bundle_2.11
+# hudi-utilities-bundle_2.11
+# hudi-utilities-slim-bundle_2.11
+"-Dscala-2.11 -Dspark2.4 -pl hudi-spark-datasource/hudi-spark-common,hudi-spark-datasource/hudi-spark2,hudi-spark-datasource/hudi-spark,hudi-utilities,packaging/hudi-spark-bundle,packaging/hudi-cli-bundle,packaging/hudi-utilities-bundle,packaging/hudi-utilities-slim-bundle -am"

Review Comment:
   Yes, it is still uploaded.  If you check the [staging_file_timestamp.txt](https://github.com/apache/hudi/files/10719051/staging_file_timestamp.txt), `hudi-spark2-common` is uploaded by `-Dscala-2.12 -Dspark3.1` profile.  I keep it the same for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #7927: [HUDI-5771] Improve deploy script of release artifacts

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan commented on code in PR #7927:
URL: https://github.com/apache/hudi/pull/7927#discussion_r1104030123


##########
scripts/release/deploy_staging_jars.sh:
##########
@@ -36,38 +36,41 @@ if [ "$#" -gt "1" ]; then
   exit 1
 fi
 
-BUNDLE_MODULES=$(find -s packaging -name 'hudi-*-bundle' -type d)
-BUNDLE_MODULES_EXCLUDED="-${BUNDLE_MODULES//$'\n'/,-}"
-
 declare -a ALL_VERSION_OPTS=(
-# upload all module jars and bundle jars
-"-Dscala-2.11 -Dspark2.4 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark2.4 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark3.3 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark3.2 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark3.1"  # this profile goes last in this section to ensure bundles use avro 1.8
-
-# spark bundles
-"-Dscala-2.11 -Dspark2.4 -pl packaging/hudi-spark-bundle,packaging/hudi-cli-bundle -am"
+# Upload Spark specific modules and bundle jars
+# For Spark 2.4, Scala 2.11:
+# hudi-spark-common_2.11
+# hudi-spark_2.11
+# hudi-spark2_2.11
+# hudi-utilities_2.11
+# hudi-cli-bundle_2.11
+# hudi-spark2.4-bundle_2.11
+# hudi-utilities-bundle_2.11
+# hudi-utilities-slim-bundle_2.11
+"-Dscala-2.11 -Dspark2.4 -pl hudi-spark-datasource/hudi-spark-common,hudi-spark-datasource/hudi-spark2,hudi-spark-datasource/hudi-spark,hudi-utilities,packaging/hudi-spark-bundle,packaging/hudi-cli-bundle,packaging/hudi-utilities-bundle,packaging/hudi-utilities-slim-bundle -am"

Review Comment:
   there is a `hudi-spark2-common`, which is a placeholder module and empty. Though it won't affect things, it should be still added to keep consistent with existing modules.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org