You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by al...@apache.org on 2023/05/16 21:35:54 UTC

[arrow-datafusion] branch main updated: Minor: Update the testing section of contributor guide (#6357)

This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


The following commit(s) were added to refs/heads/main by this push:
     new 33b15c1e8a Minor: Update the testing section of contributor guide (#6357)
33b15c1e8a is described below

commit 33b15c1e8a670bee7ceb11f5f02e445e0e16bff0
Author: Andrew Lamb <an...@nerdnetworks.org>
AuthorDate: Tue May 16 17:35:47 2023 -0400

    Minor: Update the testing section of contributor guide (#6357)
---
 docs/source/contributor-guide/index.md | 45 +++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 23 deletions(-)

diff --git a/docs/source/contributor-guide/index.md b/docs/source/contributor-guide/index.md
index 7c19ff2e89..f8457b8854 100644
--- a/docs/source/contributor-guide/index.md
+++ b/docs/source/contributor-guide/index.md
@@ -33,7 +33,7 @@ list to help you get started.
 
 # Developer's guide
 
-## Pull Requests
+## Pull Request Overview
 
 We welcome pull requests (PRs) from anyone from the community.
 
@@ -115,42 +115,41 @@ or run them all at once:
 
 - [dev/rust_lint.sh](../../../dev/rust_lint.sh)
 
-### Test Organization
+## Testing
 
-Tests are very important to ensure that improvemens or fixes are not accidentally broken during subsequent refactorings.
+Tests are critical to ensure that DataFusion is working properly and
+is not accidentally broken during refactorings. All new features
+should have test coverage.
 
 DataFusion has several levels of tests in its [Test
 Pyramid](https://martinfowler.com/articles/practical-test-pyramid.html)
-and tries to follow rust standard [Testing Organization](https://doc.rust-lang.org/book/ch11-03-test-organization.html) in the The Book.
+and tries to follow the Rust standard [Testing Organization](https://doc.rust-lang.org/book/ch11-03-test-organization.html) in the The Book.
 
-This section highlights the most important test modules that exist
+### Unit tests
 
-#### Unit tests
+Tests for code in an individual module are defined in the same source file with a `test` module, following Rust convention.
 
-Tests for the code in an individual module are defined in the same source file with a `test` module, following Rust convention.
+### sqllogictests Tests
 
-#### Rust Integration Tests
+DataFusion's SQL implementation is tested using [sqllogictest](https://github.com/apache/arrow-datafusion/tree/main/datafusion/core/tests/sqllogictests) which are run like any other Rust test using `cargo test --test sqllogictests`.
 
-There are several tests of the public interface of the DataFusion library in the [tests](https://github.com/apache/arrow-datafusion/tree/main/datafusion/core/tests) directory.
-
-You can run these tests individually using a command such as
+`sqllogictests` tests may be less convenient for new contributors who are familiar with writing `.rs` tests as they require learning another tool. However, `sqllogictest` based tests are much easier to develop and maintain as they 1) do not require a slow recompile/link cycle and 2) can be automatically updated via `cargo test --test sqllogictests -- --complete`.
 
-```shell
-cargo test -p datafusion --test sql_integration
-```
+Like similar systems such as [DuckDB](https://duckdb.org/dev/testing), DataFusion has chosen to trade off a slightly higher barrier to contribution for longer term maintainability. While we are still in the process of [migrating some old sql_integration tests](https://github.com/apache/arrow-datafusion/issues/6195), all new tests should be written using sqllogictests if possible.
 
-One very important test is the [sql_integration](https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/sql_integration.rs) test which validates DataFusion's ability to run a large assortment of SQL queries against an assortment of data setups.
+### Rust Integration Tests
 
-#### sqllogictests Tests
+There are several tests of the public interface of the DataFusion library in the [tests](https://github.com/apache/arrow-datafusion/tree/main/datafusion/core/tests) directory.
 
-The [sqllogictests](https://github.com/apache/arrow-datafusion/tree/main/datafusion/core/tests/sqllogictests) also validate DataFusion SQL against an assortment of data setups.
+You can run these tests individually using `cargo` as normal command such as
 
-Data Driven tests have many benefits including being easier to write and maintain. We are in the process of [migrating sql_integration tests](https://github.com/apache/arrow-datafusion/issues/4460) and encourage
-you to add new tests using sqllogictests if possible.
+```shell
+cargo test -p datafusion --test dataframe
+```
 
-### Benchmarks
+## Benchmarks
 
-#### Criterion Benchmarks
+### Criterion Benchmarks
 
 [Criterion](https://docs.rs/criterion/latest/criterion/index.html) is a statistics-driven micro-benchmarking framework used by DataFusion for evaluating the performance of specific code-paths. In particular, the criterion benchmarks help to both guide optimisation efforts, and prevent performance regressions within DataFusion.
 
@@ -164,7 +163,7 @@ A full list of benchmarks can be found [here](https://github.com/apache/arrow-da
 
 _[cargo-criterion](https://github.com/bheisler/cargo-criterion) may also be used for more advanced reporting._
 
-#### Parquet SQL Benchmarks
+### Parquet SQL Benchmarks
 
 The parquet SQL benchmarks can be run with
 
@@ -178,7 +177,7 @@ If the environment variable `PARQUET_FILE` is set, the benchmark will run querie
 
 The benchmark will automatically remove any generated parquet file on exit, however, if interrupted (e.g. by CTRL+C) it will not. This can be useful for analysing the particular file after the fact, or preserving it to use with `PARQUET_FILE` in subsequent runs.
 
-#### Upstream Benchmark Suites
+### Upstream Benchmark Suites
 
 Instructions and tooling for running upstream benchmark suites against DataFusion can be found in [benchmarks](https://github.com/apache/arrow-datafusion/tree/main/benchmarks).