You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2021/04/05 12:17:18 UTC

[spark-website] branch asf-site updated: Document benchmark GitHub Actions workflow, and update contribution guide

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 6c4a011  Document benchmark GitHub Actions workflow, and update contribution guide
6c4a011 is described below

commit 6c4a011f7720db8b0d8392baf02ec6183a39896b
Author: HyukjinKwon <gu...@apache.org>
AuthorDate: Mon Apr 5 21:17:06 2021 +0900

    Document benchmark GitHub Actions workflow, and update contribution guide
    
    This PR adds the benchmark GitHub Actions workflow for both testing and benchmarking, and updates contribution guide:
    
    <img width="860" alt="Screen Shot 2021-04-05 at 4 08 36 PM" src="https://user-images.githubusercontent.com/6477701/113547676-5511fd80-9629-11eb-9522-2132e256ab81.png">
    <img width="858" alt="Screen Shot 2021-04-05 at 4 08 23 PM" src="https://user-images.githubusercontent.com/6477701/113547669-53e0d080-9629-11eb-871c-d93bcf1a31e1.png">
    
    Author: HyukjinKwon <gu...@apache.org>
    
    Closes #330 from HyukjinKwon/document-ga-benchmark.
---
 contributing.md                                    |   7 ++-
 developer-tools.md                                 |  44 +++++++++++++------
 images/running-benchamrks-using-github-actions.png | Bin 0 -> 425594 bytes
 site/contributing.html                             |   7 ++-
 site/developer-tools.html                          |  48 +++++++++++++++------
 .../running-benchamrks-using-github-actions.png    | Bin 0 -> 425594 bytes
 6 files changed, 80 insertions(+), 26 deletions(-)

diff --git a/contributing.md b/contributing.md
index 43854b6..43d18d7 100644
--- a/contributing.md
+++ b/contributing.md
@@ -331,8 +331,13 @@ and add them as needed.
             test_that("SPARK-12345: a short description of the test", {
               ...
             ```
+1. Consider whether benchmark results should be added or updated as part of the change, and add them as needed by
+<a href="https://spark.apache.org/developer-tools.html#github-workflow-benchmarks">Running benchmarks in your forked repository</a>
+to generate benchmark results.
 1. Run all tests with `./dev/run-tests` to verify that the code still compiles, passes tests, and 
-passes style checks. If style checks fail, review the Code Style Guide below.
+passes style checks. Alternatively you can run the tests via GitHub Actions workflow by
+<a href="https://spark.apache.org/developer-tools.html#github-workflow-tests">Running tests in your forked repository</a>.
+If style checks fail, review the Code Style Guide below.
 1. <a href="https://help.github.com/articles/using-pull-requests/">Open a pull request</a> against 
 the `master` branch of `apache/spark`. (Only in special cases would the PR be opened against other branches.)
      1. The PR title should be of the form `[SPARK-xxxx][COMPONENT] Title`, where `SPARK-xxxx` is 
diff --git a/developer-tools.md b/developer-tools.md
index ab536bf..4ed8455 100644
--- a/developer-tools.md
+++ b/developer-tools.md
@@ -216,25 +216,45 @@ In case of a failure the POD logs (driver and executors) can be found at the end
 
 Kubernetes, and more importantly, minikube have rapid release cycles, and point releases have been found to be buggy and/or break older and existing functionality.  If you are having trouble getting tests to pass on Jenkins, but locally things work, don't hesitate to file a Jira issue.
 
-<h3>Running tests in your forked repository using GitHub Actions</h3>
+<h3>Testing with GitHub Actions workflow</h3>
 
-GitHub Actions is a functionality within GitHub that enables continuous integration and a wide range of automation.
-We already have started using some action scripts and one of them is to run tests for [pull requests](https://spark.apache.org/contributing.html).
-If you are planning to create a new pull request, it is important to check if tests can pass on your branch before creating a pull request.
-This is because our GitHub Acrions script automatically runs tests for your pull request/following commits and
-this can burden our limited resources of GitHub Actions.
+Apache Spark leverages GitHub Actions that enables continuous integration and a wide range of automation. Apache Spark repository provides several GitHub Actions workflows for developers to run before creating a pull request.
 
-Our script enables you to run tests for a branch in your forked repository.
-Let's say that you have a branch named "your_branch" for a pull request.
+<a name="github-workflow-tests"></a>
+<h4>Running tests in your forked repository</h4>
+
+Before creating a pull request in Apache Spark, it is important to check if tests can pass on your branch because our GitHub Acrions workflows automatically run tests for your pull request/following commits, and every run burdens the limited resources of GitHub Actions in Apache Spark repository.
+
+Apache Spark repository has a workflow that enables you to run the same tests for a branch in your own forked repository that does not burden the resource from Apache Spark repository.
+
+For example, suppose that you have a branch named "your_branch" for a pull request.
 To run tests on "your_branch" and check test results:
 
-- Clicks a "Actions" tab in your forked repository.
-- Selects a "Build and test" workflow in a "All workflows" list.
-- Pushes a "Run workflow" button and enters "your_branch" in a "Target branch to run" field.
-- When a "Build and test" workflow finished, clicks a "Report test results" workflow to check test results.
+- Click the "Actions" tab in your forked repository.
+- Select the "Build and test" workflow in the "All workflows" list.
+- Click the "Run workflow" button and enter "your_branch" in the "Target branch to run" field.
+- Once the "Build and test" workflow is finished, click the "Report test results" workflow to check test results.
 
 <img src="/images/running-tests-using-github-actions.png" style="width: 100%; max-width: 800px;" />
 
+<a name="github-workflow-benchmarks"></a>
+<h4>Running benchmarks in your forked repository</h4>
+
+Apache Spark repository provides an easy way to run benchmarks in GitHub Actions. When you update the benchmark results in a pull request, it is recommended to use GitHub Actions to run and generate the benchmark results in order to run them on the environment as same as possible.
+
+- Click the "Actions" tab in your forked repository.
+- Select the "Run benchmarks" workflow in the "All workflows" list.
+- Click the "Run workflow" button and enter the fields appropriately as below:
+  - **Benchmark class**: the benchmark class which you wish to run. It allows a glob pattern. For example, `org.apache.spark.sql.*`.
+  - **JDK version**: Java version you want to run the benchmark with. For example, `11`.
+  - **Failfast**: indicates if you want to stop the benchmark and workflow when it fails. When `true`, it fails right away. When `false`, it runs all whether it fails or not.
+  - **Number of job splits**: it splits the benchmark jobs into the specified number, and runs them in parallel. It is particularly useful to work around the time limits of workflow and jobs in GitHub Actions.
+- Once a "Run benchmarks" workflow is finished, click the workflow and download benchmarks results at "Artifacts".
+- Go to your root directory of Apache Spark repository, and unzip/untar the downloaded files which will update the benchmark results with appropriately locating the files to update.
+
+<img src="/images/running-benchamrks-using-github-actions.png" style="width: 100%; max-width: 800px;" />
+
+
 <h3>ScalaTest Issues</h3>
 
 If the following error occurs when running ScalaTest
diff --git a/images/running-benchamrks-using-github-actions.png b/images/running-benchamrks-using-github-actions.png
new file mode 100644
index 0000000..e1f01f4
Binary files /dev/null and b/images/running-benchamrks-using-github-actions.png differ
diff --git a/site/contributing.html b/site/contributing.html
index c0609f3..4bc7ca3 100644
--- a/site/contributing.html
+++ b/site/contributing.html
@@ -571,8 +571,13 @@ public void testCase() {
       </li>
     </ol>
   </li>
+  <li>Consider whether benchmark results should be added or updated as part of the change, and add them as needed by
+<a href="https://spark.apache.org/developer-tools.html#github-workflow-benchmarks">Running benchmarks in your forked repository</a>
+to generate benchmark results.</li>
   <li>Run all tests with <code class="language-plaintext highlighter-rouge">./dev/run-tests</code> to verify that the code still compiles, passes tests, and 
-passes style checks. If style checks fail, review the Code Style Guide below.</li>
+passes style checks. Alternatively you can run the tests via GitHub Actions workflow by
+<a href="https://spark.apache.org/developer-tools.html#github-workflow-tests">Running tests in your forked repository</a>.
+If style checks fail, review the Code Style Guide below.</li>
   <li><a href="https://help.github.com/articles/using-pull-requests/">Open a pull request</a> against 
 the <code class="language-plaintext highlighter-rouge">master</code> branch of <code class="language-plaintext highlighter-rouge">apache/spark</code>. (Only in special cases would the PR be opened against other branches.)
     <ol>
diff --git a/site/developer-tools.html b/site/developer-tools.html
index 10b8257..6e73c89 100644
--- a/site/developer-tools.html
+++ b/site/developer-tools.html
@@ -394,27 +394,51 @@ minikube stop
 
 <p>Kubernetes, and more importantly, minikube have rapid release cycles, and point releases have been found to be buggy and/or break older and existing functionality.  If you are having trouble getting tests to pass on Jenkins, but locally things work, don&#8217;t hesitate to file a Jira issue.</p>
 
-<h3>Running tests in your forked repository using GitHub Actions</h3>
+<h3>Testing with GitHub Actions workflow</h3>
 
-<p>GitHub Actions is a functionality within GitHub that enables continuous integration and a wide range of automation.
-We already have started using some action scripts and one of them is to run tests for <a href="https://spark.apache.org/contributing.html">pull requests</a>.
-If you are planning to create a new pull request, it is important to check if tests can pass on your branch before creating a pull request.
-This is because our GitHub Acrions script automatically runs tests for your pull request/following commits and
-this can burden our limited resources of GitHub Actions.</p>
+<p>Apache Spark leverages GitHub Actions that enables continuous integration and a wide range of automation. Apache Spark repository provides several GitHub Actions workflows for developers to run before creating a pull request.</p>
 
-<p>Our script enables you to run tests for a branch in your forked repository.
-Let&#8217;s say that you have a branch named &#8220;your_branch&#8221; for a pull request.
+<p><a name="github-workflow-tests"></a></p>
+<h4>Running tests in your forked repository</h4>
+
+<p>Before creating a pull request in Apache Spark, it is important to check if tests can pass on your branch because our GitHub Acrions workflows automatically run tests for your pull request/following commits, and every run burdens the limited resources of GitHub Actions in Apache Spark repository.</p>
+
+<p>Apache Spark repository has a workflow that enables you to run the same tests for a branch in your own forked repository that does not burden the resource from Apache Spark repository.</p>
+
+<p>For example, suppose that you have a branch named &#8220;your_branch&#8221; for a pull request.
 To run tests on &#8220;your_branch&#8221; and check test results:</p>
 
 <ul>
-  <li>Clicks a &#8220;Actions&#8221; tab in your forked repository.</li>
-  <li>Selects a &#8220;Build and test&#8221; workflow in a &#8220;All workflows&#8221; list.</li>
-  <li>Pushes a &#8220;Run workflow&#8221; button and enters &#8220;your_branch&#8221; in a &#8220;Target branch to run&#8221; field.</li>
-  <li>When a &#8220;Build and test&#8221; workflow finished, clicks a &#8220;Report test results&#8221; workflow to check test results.</li>
+  <li>Click the &#8220;Actions&#8221; tab in your forked repository.</li>
+  <li>Select the &#8220;Build and test&#8221; workflow in the &#8220;All workflows&#8221; list.</li>
+  <li>Click the &#8220;Run workflow&#8221; button and enter &#8220;your_branch&#8221; in the &#8220;Target branch to run&#8221; field.</li>
+  <li>Once the &#8220;Build and test&#8221; workflow is finished, click the &#8220;Report test results&#8221; workflow to check test results.</li>
 </ul>
 
 <p><img src="/images/running-tests-using-github-actions.png" style="width: 100%; max-width: 800px;" /></p>
 
+<p><a name="github-workflow-benchmarks"></a></p>
+<h4>Running benchmarks in your forked repository</h4>
+
+<p>Apache Spark repository provides an easy way to run benchmarks in GitHub Actions. When you update the benchmark results in a pull request, it is recommended to use GitHub Actions to run and generate the benchmark results in order to run them on the environment as same as possible.</p>
+
+<ul>
+  <li>Click the &#8220;Actions&#8221; tab in your forked repository.</li>
+  <li>Select the &#8220;Run benchmarks&#8221; workflow in the &#8220;All workflows&#8221; list.</li>
+  <li>Click the &#8220;Run workflow&#8221; button and enter the fields appropriately as below:
+    <ul>
+      <li><strong>Benchmark class</strong>: the benchmark class which you wish to run. It allows a glob pattern. For example, <code class="language-plaintext highlighter-rouge">org.apache.spark.sql.*</code>.</li>
+      <li><strong>JDK version</strong>: Java version you want to run the benchmark with. For example, <code class="language-plaintext highlighter-rouge">11</code>.</li>
+      <li><strong>Failfast</strong>: indicates if you want to stop the benchmark and workflow when it fails. When <code class="language-plaintext highlighter-rouge">true</code>, it fails right away. When <code class="language-plaintext highlighter-rouge">false</code>, it runs all whether it fails or not.</li>
+      <li><strong>Number of job splits</strong>: it splits the benchmark jobs into the specified number, and runs them in parallel. It is particularly useful to work around the time limits of workflow and jobs in GitHub Actions.</li>
+    </ul>
+  </li>
+  <li>Once a &#8220;Run benchmarks&#8221; workflow is finished, click the workflow and download benchmarks results at &#8220;Artifacts&#8221;.</li>
+  <li>Go to your root directory of Apache Spark repository, and unzip/untar the downloaded files which will update the benchmark results with appropriately locating the files to update.</li>
+</ul>
+
+<p><img src="/images/running-benchamrks-using-github-actions.png" style="width: 100%; max-width: 800px;" /></p>
+
 <h3>ScalaTest Issues</h3>
 
 <p>If the following error occurs when running ScalaTest</p>
diff --git a/site/images/running-benchamrks-using-github-actions.png b/site/images/running-benchamrks-using-github-actions.png
new file mode 100644
index 0000000..e1f01f4
Binary files /dev/null and b/site/images/running-benchamrks-using-github-actions.png differ

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org