You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/28 18:39:49 UTC

[GitHub] [hudi] xushiyan commented on a diff in pull request #7071: [HUDI-4982] Upgrade Bundle Testing

xushiyan commented on code in PR #7071:
URL: https://github.com/apache/hudi/pull/7071#discussion_r1008346010


##########
packaging/bundle-validation/Dockerfile-base:
##########
@@ -47,3 +49,44 @@ RUN wget https://archive.apache.org/dist/spark/spark-$SPARK_VERSION/spark-$SPARK
     && tar -xf $WORKDIR/spark-$SPARK_VERSION-bin-hadoop$SPARK_HADOOP_VERSION.tgz -C $WORKDIR/ \
     && rm $WORKDIR/spark-$SPARK_VERSION-bin-hadoop$SPARK_HADOOP_VERSION.tgz
 ENV SPARK_HOME=$WORKDIR/spark-$SPARK_VERSION-bin-hadoop$SPARK_HADOOP_VERSION
+
+
+# Utilities bundles
+RUN wget https://repo1.maven.org/maven2/org/apache/hudi/hudi-utilities-bundle_$SCALA_VERSION/0.11.0/hudi-utilities-bundle_$SCALA_VERSION-0.11.0.jar -P "$WORKDIR" 
+ENV UTILITIES_BUNDLE_0_11_0=$WORKDIR/hudi-utilities-bundle_$SCALA_VERSION-0.11.0.jar

Review Comment:
   we talked about upgrade/downgrade test scenario which should only be run for release branch, and for this reason, we should parameterize the versions to pull based on the release branch name like if the branch name is `release-0.13.0` -> downloading `0.11.{latest}` and `0.10.{latest}` for testing. 
   
   Another design we should follow here: we build infra with docker, which is generic and re-usable, and make testing artifacts mounted via volumes, for easy switching, flexible with testcases. So by this, we should actually download jars from GH actions job itself and mount those to container. Because this is only for release branch tests, it won't "waste" a lot in downloading. If to optimize downloading jars, we can even explore GH feature to see if we can put jars in GH's own artifactory for probably cached downloading



##########
packaging/bundle-validation/validate.sh:
##########
@@ -111,6 +114,73 @@ test_utilities_bundle () {
     echo "::warning::validate.sh done validating deltastreamer in spark shell"
 }
 
+##
+# Function to test the utilities bundle and utilities slim bundle + spark bundle.
+# It runs deltastreamer and then verifies that deltastreamer worked correctly.
+#
+# 1st arg: main jar to run with spark-submit, usually it's the utilities(-slim) bundle
+# 2nd arg and beyond: any additional jars to pass to --jars option
+#
+# env vars (defined in container):
+#   SPARK_HOME: path to the spark directory
+##
+test_utilities_bundle () {
+    OUTPUT_DIR=/tmp/hudi-utilities-test/
+    rm -r $OUTPUT_DIR
+    EXPECTED_SIZE=580
+    test_utilities_bundle_helper $1 "${@:2}"
+    exit $?
+}
+
+
+##
+# Function to test the upgrading the utilities bundle and 
+# utilities slim bundle + spark bundle.
+# It runs deltastreamer and then verifies that deltastreamer worked correctly on
+# half the data. Then, using an upgraded hudi, runs deltastreamer and verifies 
+# that deltastreamer worked correctly on the rest of the data
+#
+#
+# env vars (defined in container):
+#   SPARK_HOME: path to the spark directory
+#   FIRST_MAIN_ARG: what you would put as the first arg to test_utilities_bundle
+#       and that is used for running deltastreamer on the first batch of data
+#   FIRST_ADDITIONAL_ARG: what you would put as extra args to test_utilities_bundle
+#       and that is used for running deltastreamer on the first batch of data
+#   SECOND_MAIN_ARG: what you would put as the first arg to test_utilities_bundle
+#       and that is used for running deltastreamer on the second batch of data
+#   SECOND_ADDITIONAL_ARG: what you would put as extra args to test_utilities_bundle
+#       and that is used for running deltastreamer on the second batch of data
+##
+test_upgrade_bundle () {

Review Comment:
   we want to run upgrade/downgrade test only with release branch, so we should separate out the testing function, and add a separate GH action job conditioned on branch pattern



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org