You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/26 20:35:07 UTC

[GitHub] [hudi] jonvex opened a new pull request, #7071: [HUDI-4982] Upgrade Bundle Testing

jonvex opened a new pull request, #7071:
URL: https://github.com/apache/hudi/pull/7071

   ### Change Logs
   
   Bundle tests are improved to add bundle upgrade testing. 
   
   - Base Docker file is updated to download the older bundle versions
   - test_utilities_bundle now has a helper because we delete the output folder at the start, but with upgrading, we want to run the testing twice, and don't want to delete the output folder at the start of the second test
   
   ### Impact
   
   better bundle testing
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   no documentation needed
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #7071: [HUDI-4982] Upgrade Bundle Testing

Posted by GitBox <gi...@apache.org>.
xushiyan commented on code in PR #7071:
URL: https://github.com/apache/hudi/pull/7071#discussion_r1008346010


##########
packaging/bundle-validation/Dockerfile-base:
##########
@@ -47,3 +49,44 @@ RUN wget https://archive.apache.org/dist/spark/spark-$SPARK_VERSION/spark-$SPARK
     && tar -xf $WORKDIR/spark-$SPARK_VERSION-bin-hadoop$SPARK_HADOOP_VERSION.tgz -C $WORKDIR/ \
     && rm $WORKDIR/spark-$SPARK_VERSION-bin-hadoop$SPARK_HADOOP_VERSION.tgz
 ENV SPARK_HOME=$WORKDIR/spark-$SPARK_VERSION-bin-hadoop$SPARK_HADOOP_VERSION
+
+
+# Utilities bundles
+RUN wget https://repo1.maven.org/maven2/org/apache/hudi/hudi-utilities-bundle_$SCALA_VERSION/0.11.0/hudi-utilities-bundle_$SCALA_VERSION-0.11.0.jar -P "$WORKDIR" 
+ENV UTILITIES_BUNDLE_0_11_0=$WORKDIR/hudi-utilities-bundle_$SCALA_VERSION-0.11.0.jar

Review Comment:
   we talked about upgrade/downgrade test scenario which should only be run for release branch, and for this reason, we should parameterize the versions to pull based on the release branch name like if the branch name is `release-0.13.0` -> downloading `0.11.{latest}` and `0.10.{latest}` for testing. 
   
   Another design we should follow here: we build infra with docker, which is generic and re-usable, and make testing artifacts mounted via volumes, for easy switching, flexible with testcases. So by this, we should actually download jars from GH actions job itself and mount those to container. Because this is only for release branch tests, it won't "waste" a lot in downloading. If to optimize downloading jars, we can even explore GH feature to see if we can put jars in GH's own artifactory for probably cached downloading



##########
packaging/bundle-validation/validate.sh:
##########
@@ -111,6 +114,73 @@ test_utilities_bundle () {
     echo "::warning::validate.sh done validating deltastreamer in spark shell"
 }
 
+##
+# Function to test the utilities bundle and utilities slim bundle + spark bundle.
+# It runs deltastreamer and then verifies that deltastreamer worked correctly.
+#
+# 1st arg: main jar to run with spark-submit, usually it's the utilities(-slim) bundle
+# 2nd arg and beyond: any additional jars to pass to --jars option
+#
+# env vars (defined in container):
+#   SPARK_HOME: path to the spark directory
+##
+test_utilities_bundle () {
+    OUTPUT_DIR=/tmp/hudi-utilities-test/
+    rm -r $OUTPUT_DIR
+    EXPECTED_SIZE=580
+    test_utilities_bundle_helper $1 "${@:2}"
+    exit $?
+}
+
+
+##
+# Function to test the upgrading the utilities bundle and 
+# utilities slim bundle + spark bundle.
+# It runs deltastreamer and then verifies that deltastreamer worked correctly on
+# half the data. Then, using an upgraded hudi, runs deltastreamer and verifies 
+# that deltastreamer worked correctly on the rest of the data
+#
+#
+# env vars (defined in container):
+#   SPARK_HOME: path to the spark directory
+#   FIRST_MAIN_ARG: what you would put as the first arg to test_utilities_bundle
+#       and that is used for running deltastreamer on the first batch of data
+#   FIRST_ADDITIONAL_ARG: what you would put as extra args to test_utilities_bundle
+#       and that is used for running deltastreamer on the first batch of data
+#   SECOND_MAIN_ARG: what you would put as the first arg to test_utilities_bundle
+#       and that is used for running deltastreamer on the second batch of data
+#   SECOND_ADDITIONAL_ARG: what you would put as extra args to test_utilities_bundle
+#       and that is used for running deltastreamer on the second batch of data
+##
+test_upgrade_bundle () {

Review Comment:
   we want to run upgrade/downgrade test only with release branch, so we should separate out the testing function, and add a separate GH action job conditioned on branch pattern



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] jonvex commented on a diff in pull request #7071: [HUDI-4982] [WIP] Upgrade Bundle Testing

Posted by GitBox <gi...@apache.org>.
jonvex commented on code in PR #7071:
URL: https://github.com/apache/hudi/pull/7071#discussion_r1012031298


##########
packaging/bundle-validation/validate.sh:
##########
@@ -111,6 +114,73 @@ test_utilities_bundle () {
     echo "::warning::validate.sh done validating deltastreamer in spark shell"
 }
 
+##
+# Function to test the utilities bundle and utilities slim bundle + spark bundle.
+# It runs deltastreamer and then verifies that deltastreamer worked correctly.
+#
+# 1st arg: main jar to run with spark-submit, usually it's the utilities(-slim) bundle
+# 2nd arg and beyond: any additional jars to pass to --jars option
+#
+# env vars (defined in container):
+#   SPARK_HOME: path to the spark directory
+##
+test_utilities_bundle () {
+    OUTPUT_DIR=/tmp/hudi-utilities-test/
+    rm -r $OUTPUT_DIR
+    EXPECTED_SIZE=580
+    test_utilities_bundle_helper $1 "${@:2}"
+    exit $?
+}
+
+
+##
+# Function to test the upgrading the utilities bundle and 
+# utilities slim bundle + spark bundle.
+# It runs deltastreamer and then verifies that deltastreamer worked correctly on
+# half the data. Then, using an upgraded hudi, runs deltastreamer and verifies 
+# that deltastreamer worked correctly on the rest of the data
+#
+#
+# env vars (defined in container):
+#   SPARK_HOME: path to the spark directory
+#   FIRST_MAIN_ARG: what you would put as the first arg to test_utilities_bundle
+#       and that is used for running deltastreamer on the first batch of data
+#   FIRST_ADDITIONAL_ARG: what you would put as extra args to test_utilities_bundle
+#       and that is used for running deltastreamer on the first batch of data
+#   SECOND_MAIN_ARG: what you would put as the first arg to test_utilities_bundle
+#       and that is used for running deltastreamer on the second batch of data
+#   SECOND_ADDITIONAL_ARG: what you would put as extra args to test_utilities_bundle
+#       and that is used for running deltastreamer on the second batch of data
+##
+test_upgrade_bundle () {

Review Comment:
   I made it work only only on the release branch, but the code currently works on this branch so that the tests run. After you approve what I have, then I will change it so it works only on the release branch and then it can be merged



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7071: [HUDI-4982] Upgrade Bundle Testing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7071:
URL: https://github.com/apache/hudi/pull/7071#issuecomment-1292899575

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "46a96de62e5f6df954a024ecaf7fa7c30963a146",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12607",
       "triggerID" : "46a96de62e5f6df954a024ecaf7fa7c30963a146",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 46a96de62e5f6df954a024ecaf7fa7c30963a146 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12607) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7071: [HUDI-4982] Upgrade Bundle Testing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7071:
URL: https://github.com/apache/hudi/pull/7071#issuecomment-1292684630

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "46a96de62e5f6df954a024ecaf7fa7c30963a146",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12607",
       "triggerID" : "46a96de62e5f6df954a024ecaf7fa7c30963a146",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 46a96de62e5f6df954a024ecaf7fa7c30963a146 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12607) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7071: [HUDI-4982] Upgrade Bundle Testing

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7071:
URL: https://github.com/apache/hudi/pull/7071#issuecomment-1292679247

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "46a96de62e5f6df954a024ecaf7fa7c30963a146",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "46a96de62e5f6df954a024ecaf7fa7c30963a146",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 46a96de62e5f6df954a024ecaf7fa7c30963a146 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org