You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/05/26 20:44:42 UTC

[GitHub] [tvm] areusch commented on a diff in pull request #11403: [skip ci][ci][docs] Add CI infra docs

areusch commented on code in PR #11403:
URL: https://github.com/apache/tvm/pull/11403#discussion_r883067342


##########
jenkins/README.md:
##########
@@ -26,3 +137,90 @@ pip install -r jenkins/requirements.txt
 python jenkins/generate.py

Review Comment:
   while we're here, maybe we should change to use a venv:
   ```
   python3 -mvenv _venv
   _venv/bin/pip3 install -r jenkins/requirements.txt
   _venv/bin/python3 jenkins/generate.py
   ```
   
   we could consider adding to `Makefile`



##########
jenkins/README.md:
##########
@@ -15,8 +15,119 @@
 <!--- specific language governing permissions and limitations -->
 <!--- under the License. -->
 
+# TVM CI
+
+TVM runs CI jobs on every commit to an open pull request and to branches in the apache/tvm repo (such as `main`). These jobs are essential to keeping the TVM project in a healthy state and preventing breakages. Jenkins does most of the work in running the TVM tests, though some smaller jobs are also run on GitHub Actions.
+
+## GitHub Actions
+
+GitHub Actions is used to run Windows jobs, MacOS jobs, and various on-GitHub automations. These are defined in [`.github/workflows`](../.github/workflows/). These automations include bots to:
+* [cc people based on subscribed teams/topics](https://github.com/apache/tvm/issues/10317)
+* [allow non-committers to merge approved / CI passing PRs](https://discuss.tvm.apache.org/t/rfc-allow-merging-via-pr-comments/12220)
+* [add cc-ed people as reviewers on GitHub](https://discuss.tvm.apache.org/t/rfc-remove-codeowners/12095)
+* [ping languishing PRs after no activity for a week (currently opt-in only)](https://github.com/apache/tvm/issues/9983)
+* [push a `last-successful` branch to GitHub with the last `main` commit that passed CI](https://github.com/apache/tvm/tree/last-successful)
+
+https://github.com/apache/tvm/actions has the logs for each of these workflows. Note that when debugging these workflows changes from PRs from forked repositories won't be relfected in the PR. These should be tested in the forked repository first and linked in the PR body.
+
+
+## Keeping CI Green
+
+Developers rely on the TVM CI to get signal on their PRs before merging.
+Occasionally breakages slip through and break `main`, which in turn causes
+the same error to show up on an PR that is based on the broken commit(s). Broken
+commits can be identified [through GitHub](https://github.com/apache/tvm/commits/main>)
+via the commit status icon or via [Jenkins](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/activity?branch=main>).
+In these situations it is possible to either revert the offending commit or
+submit a forward fix to address the issue. It is up to the committer and commit
+author which option to choose, keeping in mind that a broken CI affects all TVM
+developers and should be fixed as soon as possible.
+
+Some tests are also flaky and fail for reasons unrelated to the PR. The [CI monitoring rotation](https://github.com/apache/tvm/wiki/CI-Monitoring-Runbook) watches for these failures and disables tests as necessary. It is the responsibility of those who wrote the test to ultimately fix and re-enable the test.
+
+
+## Dealing with Flakiness
+
+If you notice a failure on your PR that seems unrelated to your change, you should
+search [recent GitHub issues related to flaky tests](https://github.com/apache/tvm/issues?q=is%3Aissue+%5BCI+Problem%5D+Flaky+>) and
+[file a new issue](https://github.com/apache/tvm/issues/new?assignees=&labels=&template=ci-problem.md&title=%5BCI+Problem%5D+>)
+if you don't see any reports of the failure. If a certain test or class of tests affects
+several PRs or commits on `main` with flaky failures, the test should be disabled via
+[pytest's @xfail decorator](https://docs.pytest.org/en/6.2.x/skipping.html#xfail-mark-test-functions-as-expected-to-fail) with [`strict=True`](https://docs.pytest.org/en/6.2.x/skipping.html#strict-parameter) and the relevant issue linked in the

Review Comment:
   =True or =False?



##########
jenkins/README.md:
##########
@@ -15,8 +15,119 @@
 <!--- specific language governing permissions and limitations -->
 <!--- under the License. -->
 
+# TVM CI
+
+TVM runs CI jobs on every commit to an open pull request and to branches in the apache/tvm repo (such as `main`). These jobs are essential to keeping the TVM project in a healthy state and preventing breakages. Jenkins does most of the work in running the TVM tests, though some smaller jobs are also run on GitHub Actions.
+
+## GitHub Actions
+
+GitHub Actions is used to run Windows jobs, MacOS jobs, and various on-GitHub automations. These are defined in [`.github/workflows`](../.github/workflows/). These automations include bots to:
+* [cc people based on subscribed teams/topics](https://github.com/apache/tvm/issues/10317)
+* [allow non-committers to merge approved / CI passing PRs](https://discuss.tvm.apache.org/t/rfc-allow-merging-via-pr-comments/12220)
+* [add cc-ed people as reviewers on GitHub](https://discuss.tvm.apache.org/t/rfc-remove-codeowners/12095)
+* [ping languishing PRs after no activity for a week (currently opt-in only)](https://github.com/apache/tvm/issues/9983)
+* [push a `last-successful` branch to GitHub with the last `main` commit that passed CI](https://github.com/apache/tvm/tree/last-successful)
+
+https://github.com/apache/tvm/actions has the logs for each of these workflows. Note that when debugging these workflows changes from PRs from forked repositories won't be relfected in the PR. These should be tested in the forked repository first and linked in the PR body.

Review Comment:
   ```suggestion
   https://github.com/apache/tvm/actions has the logs for each of these workflows. Note that when debugging these workflows changes from PRs from forked repositories won't be reflected in the PR. These should be tested in the forked repository first and linked in the PR body.
   ```



##########
jenkins/README.md:
##########
@@ -26,3 +137,90 @@ pip install -r jenkins/requirements.txt
 python jenkins/generate.py
 ```
 
+# Infrastructure
+
+Jenkins runs in AWS on an EC2 instance fronted by an ELB which makes it available at https://ci.tlcpack.ai. These definitions are declared via Terraform in the [tlc-pack/ci-terraform](https://github.com/tlc-pack/ci-terraform) repository. The Terraform code references custom AMIs built in [tlc-pack/ci-packer](https://github.com/tlc-pack/ci-packer). [tlc-pack/ci](https://github.com/tlc-pack/ci) contains Ansible scripts to deploy the Jenkins head node and set it up to interact with AWS.
+
+The Jenkins head node has a number of autoscaling groups with labels that are used to run jobs (e.g. `CPU`, `GPU` or `ARM`) via the [EC2 Fleet](https://plugins.jenkins.io/ec2-fleet/) plugin.
+
+## Deploying
+
+Deploying Jenkins can disrupt developers so it must be done with care. Jobs that are in-flight will be cancelled and must be manually restarted. Follow the instructions [here](https://github.com/tlc-pack/ci/issues/10) to run a deploy.
+
+## Monitoring
+
+Dashboards of CI data can be found:
+* within Jenkins at https://ci.tlcpack.ai/monitoring (HTTP / JVM stats)
+* at https://monitoring.tlcpack.ai (job status, worker status)
+
+## CI Diagram
+
+This details the individual parts that interact in TVM's CI.

Review Comment:
   should we link to further CI ops docs?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org