You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by ag...@apache.org on 2022/08/13 19:30:28 UTC

[arrow-datafusion] branch master updated: separate contributors guide (#3128)

This is an automated email from the ASF dual-hosted git repository.

agrove pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


The following commit(s) were added to refs/heads/master by this push:
     new 48f9b7ab5 separate contributors guide (#3128)
48f9b7ab5 is described below

commit 48f9b7ab57edce2548997e358eb589c5f23e1bad
Author: kmitchener <km...@gmail.com>
AuthorDate: Sat Aug 13 15:30:22 2022 -0400

    separate contributors guide (#3128)
---
 CONTRIBUTING.md                                    | 255 +--------------------
 README.md                                          |   8 +-
 .../communication.md                               |  12 -
 .../source/contributor-guide/index.md              |  42 ++--
 .../quarterly_roadmap.md                           |   2 +-
 .../roadmap.md                                     |  24 +-
 .../contributor-guide/specification/index.rst      |  25 ++
 .../specification/invariants.md                    |   2 +-
 .../specification/output-field-name-semantic.md    |   2 +-
 docs/source/index.rst                              |  33 +--
 10 files changed, 75 insertions(+), 330 deletions(-)

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 4c0379773..29a1a0692 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -17,257 +17,4 @@
   under the License.
 -->
 
-# Introduction
-
-We welcome and encourage contributions of all kinds, such as:
-
-1. Tickets with issue reports of feature requests
-2. Documentation improvements
-3. Code (PR or PR Review)
-
-In addition to submitting new PRs, we have a healthy tradition of community members helping review each other's PRs. Doing so is a great way to help the community as well as get more familiar with Rust and the relevant codebases.
-
-You can find a curated
-[good-first-issue](https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)
-list to help you get started.
-
-# Developer's guide
-
-This section describes how you can get started at developing DataFusion.
-
-### Windows setup
-
-```shell
-wget https://az792536.vo.msecnd.net/vms/VMBuild_20190311/VirtualBox/MSEdge/MSEdge.Win10.VirtualBox.zip
-choco install -y git rustup.install visualcpp-build-tools
-git-bash.exe
-cargo build
-```
-
-### Bootstrap environment
-
-DataFusion is written in Rust and it uses a standard rust toolkit:
-
-- `cargo build`
-- `cargo fmt` to format the code
-- `cargo test` to test
-- etc.
-
-Testing setup:
-
-- `rustup update stable` DataFusion uses the latest stable release of rust
-- `git submodule init`
-- `git submodule update`
-
-Formatting instructions:
-
-- [ci/scripts/rust_fmt.sh](ci/scripts/rust_fmt.sh)
-- [ci/scripts/rust_clippy.sh](ci/scripts/rust_clippy.sh)
-- [ci/scripts/rust_toml_fmt.sh](ci/scripts/rust_toml_fmt.sh)
-
-or run them all at once:
-
-- [dev/rust_lint.sh](dev/rust_lint.sh)
-
-## Test Organization
-
-DataFusion has several levels of tests in its [Test
-Pyramid](https://martinfowler.com/articles/practical-test-pyramid.html)
-and tries to follow [Testing Organization](https://doc.rust-lang.org/book/ch11-03-test-organization.html) in the The Book.
-
-This section highlights the most important test modules that exist
-
-### Unit tests
-
-Tests for the code in an individual module are defined in the same source file with a `test` module, following Rust convention
-
-### Rust Integration Tests
-
-There are several tests of the public interface of the DataFusion library in the [tests](https://github.com/apache/arrow-datafusion/tree/master/datafusion/core/tests) directory.
-
-You can run these tests individually using a command such as
-
-```shell
-cargo test -p datafusion --tests sql_integration
-```
-
-One very important test is the [sql_integration](https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/tests/sql_integration.rs) test which validates DataFusion's ability to run a large assortment of SQL queries against an assortment of data setups.
-
-### SQL / Postgres Integration Tests
-
-The [integration-tests](https://github.com/apache/arrow-datafusion/blob/master/datafusion/integration-tests) directory contains a harness that runs certain queries against both postgres and datafusion and compares results
-
-#### setup environment
-
-```shell
-export POSTGRES_DB=postgres
-export POSTGRES_USER=postgres
-export POSTGRES_HOST=localhost
-export POSTGRES_PORT=5432
-```
-
-#### Install dependencies
-
-```shell
-# Install dependencies
-python -m pip install --upgrade pip setuptools wheel
-python -m pip install -r integration-tests/requirements.txt
-
-# setup environment
-POSTGRES_DB=postgres POSTGRES_USER=postgres POSTGRES_HOST=localhost POSTGRES_PORT=5432 python -m pytest -v integration-tests/test_psql_parity.py
-
-# Create
-psql -d "$POSTGRES_DB" -h "$POSTGRES_HOST" -p "$POSTGRES_PORT" -U "$POSTGRES_USER" -c 'CREATE TABLE IF NOT EXISTS test (
-  c1 character varying NOT NULL,
-  c2 integer NOT NULL,
-  c3 smallint NOT NULL,
-  c4 smallint NOT NULL,
-  c5 integer NOT NULL,
-  c6 bigint NOT NULL,
-  c7 smallint NOT NULL,
-  c8 integer NOT NULL,
-  c9 bigint NOT NULL,
-  c10 character varying NOT NULL,
-  c11 double precision NOT NULL,
-  c12 double precision NOT NULL,
-  c13 character varying NOT NULL
-);'
-
-psql -d "$POSTGRES_DB" -h "$POSTGRES_HOST" -p "$POSTGRES_PORT" -U "$POSTGRES_USER" -c "\copy test FROM '$(pwd)/testing/data/csv/aggregate_test_100.csv' WITH (FORMAT csv, HEADER true);"
-```
-
-#### Invoke the test runner
-
-```shell
-python -m pytest -v integration-tests/test_psql_parity.py
-```
-
-## Benchmarks
-
-### Criterion Benchmarks
-
-[Criterion](https://docs.rs/criterion/latest/criterion/index.html) is a statistics-driven micro-benchmarking framework used by DataFusion for evaluating the performance of specific code-paths. In particular, the criterion benchmarks help to both guide optimisation efforts, and prevent performance regressions within DataFusion.
-
-Criterion integrates with Cargo's built-in [benchmark support](https://doc.rust-lang.org/cargo/commands/cargo-bench.html) and a given benchmark can be run with
-
-```
-cargo bench --bench BENCHMARK_NAME
-```
-
-A full list of benchmarks can be found [here](./datafusion/benches).
-
-_[cargo-criterion](https://github.com/bheisler/cargo-criterion) may also be used for more advanced reporting._
-
-#### Parquet SQL Benchmarks
-
-The parquet SQL benchmarks can be run with
-
-```
- cargo bench --bench parquet_query_sql
-```
-
-These randomly generate a parquet file, and then benchmark queries sourced from [parquet_query_sql.sql](./datafusion/core/benches/parquet_query_sql.sql) against it. This can therefore be a quick way to add coverage of particular query and/or data paths.
-
-If the environment variable `PARQUET_FILE` is set, the benchmark will run queries against this file instead of a randomly generated one. This can be useful for performing multiple runs, potentially with different code, against the same source data, or for testing against a custom dataset.
-
-The benchmark will automatically remove any generated parquet file on exit, however, if interrupted (e.g. by CTRL+C) it will not. This can be useful for analysing the particular file after the fact, or preserving it to use with `PARQUET_FILE` in subsequent runs.
-
-### Upstream Benchmark Suites
-
-Instructions and tooling for running upstream benchmark suites against DataFusion can be found in [benchmarks](./benchmarks).
-
-These are valuable for comparative evaluation against alternative Arrow implementations and query engines.
-
-## How to add a new scalar function
-
-Below is a checklist of what you need to do to add a new scalar function to DataFusion:
-
-- Add the actual implementation of the function:
-  - [here](datafusion/physical-expr/src/string_expressions.rs) for string functions
-  - [here](datafusion/physical-expr/src/math_expressions.rs) for math functions
-  - [here](datafusion/physical-expr/src/datetime_expressions.rs) for datetime functions
-  - create a new module [here](datafusion/physical-expr/src) for other functions
-- In [core/src/physical_plan](datafusion/core/src/physical_plan/functions.rs), add:
-  - a new variant to `BuiltinScalarFunction`
-  - a new entry to `FromStr` with the name of the function as called by SQL
-  - a new line in `return_type` with the expected return type of the function, given an incoming type
-  - a new line in `signature` with the signature of the function (number and types of its arguments)
-  - a new line in `create_physical_expr`/`create_physical_fun` mapping the built-in to the implementation
-  - tests to the function.
-- In [core/tests/sql](datafusion/core/tests/sql), add a new test where the function is called through SQL against well known data and returns the expected result.
-- In [expr/src/expr_fn.rs](datafusion/expr/src/expr_fn.rs), add:
-  - a new entry of the `unary_scalar_expr!` macro for the new function.
-- In [core/src/logical_plan/mod](datafusion/core/src/logical_plan/mod.rs), add:
-  - a new entry in the `pub use expr::{}` set.
-
-## How to add a new aggregate function
-
-Below is a checklist of what you need to do to add a new aggregate function to DataFusion:
-
-- Add the actual implementation of an `Accumulator` and `AggregateExpr`:
-  - [here](datafusion/physical-expr/src/string_expressions.rs) for string functions
-  - [here](datafusion/physical-expr/src/math_expressions.rs) for math functions
-  - [here](datafusion/physical-expr/src/datetime_expressions.rs) for datetime functions
-  - create a new module [here](datafusion/physical-expr/src) for other functions
-- In [datafusion/expr/src](datafusion/expr/src/aggregate_function.rs), add:
-  - a new variant to `AggregateFunction`
-  - a new entry to `FromStr` with the name of the function as called by SQL
-  - a new line in `return_type` with the expected return type of the function, given an incoming type
-  - a new line in `signature` with the signature of the function (number and types of its arguments)
-  - a new line in `create_aggregate_expr` mapping the built-in to the implementation
-  - tests to the function.
-- In [tests/sql](datafusion/core/tests/sql), add a new test where the function is called through SQL against well known data and returns the expected result.
-
-## How to display plans graphically
-
-The query plans represented by `LogicalPlan` nodes can be graphically
-rendered using [Graphviz](http://www.graphviz.org/).
-
-To do so, save the output of the `display_graphviz` function to a file.:
-
-```rust
-// Create plan somehow...
-let mut output = File::create("/tmp/plan.dot")?;
-write!(output, "{}", plan.display_graphviz());
-```
-
-Then, use the `dot` command line tool to render it into a file that
-can be displayed. For example, the following command creates a
-`/tmp/plan.pdf` file:
-
-```bash
-dot -Tpdf < /tmp/plan.dot > /tmp/plan.pdf
-```
-
-## Specification
-
-We formalize DataFusion semantics and behaviors through specification
-documents. These specifications are useful to be used as references to help
-resolve ambiguities during development or code reviews.
-
-You are also welcome to propose changes to existing specifications or create
-new specifications as you see fit.
-
-Here is the list current active specifications:
-
-- [Output field name semantic](https://arrow.apache.org/datafusion/specification/output-field-name-semantic.html)
-- [Invariants](https://arrow.apache.org/datafusion/specification/invariants.html)
-
-All specifications are stored in the `docs/source/specification` folder.
-
-## How to format `.md` document
-
-We are using `prettier` to format `.md` files.
-
-You can either use `npm i -g prettier` to install it globally or use `npx` to run it as a standalone binary. Using `npx` required a working node environment. Upgrading to the latest prettier is recommended (by adding `--upgrade` to the `npm` command).
-
-```bash
-$ prettier --version
-2.3.0
-```
-
-After you've confirmed your prettier version, you can format all the `.md` files:
-
-```bash
-prettier -w {datafusion,datafusion-cli,datafusion-examples,dev,docs}/**/*.md
-```
+See the Contributor Guide: https://arrow.apache.org/datafusion/ or the source under `docs/source/contributor-guide`
diff --git a/README.md b/README.md
index af6174760..2b6820dde 100644
--- a/README.md
+++ b/README.md
@@ -99,7 +99,7 @@ Please see [example usage](https://arrow.apache.org/datafusion/user-guide/exampl
 
 ## Roadmap
 
-Please see [Roadmap](docs/source/specification/roadmap.md) for information of where the project is headed.
+Please see [Roadmap](docs/source/contributor-guide/roadmap.md) for information of where the project is headed.
 
 ## Architecture Overview
 
@@ -109,10 +109,10 @@ There is no formal document describing DataFusion's architecture yet, but the fo
 - (March 2021): The DataFusion architecture is described in _Query Engine Design and the Rust-Based DataFusion in Apache Arrow_: [recording](https://www.youtube.com/watch?v=K6eCAVEk4kU) (DataFusion content starts [~ 15 minutes in](https://www.youtube.com/watch?v=K6eCAVEk4kU&t=875s)) and [slides](https://www.slideshare.net/influxdata/influxdb-iox-tech-talks-query-engine-design-and-the-rustbased-datafusion-in-apache-arrow-244161934)
 - (February 2021): How DataFusion is used within the Ballista Project is described in \*Ballista: Distributed Compute with Rust and Apache Arrow: [recording](https://www.youtube.com/watch?v=ZZHQaOap9pQ)
 
-## User's guide
+## User Guide
 
 Please see [User Guide](https://arrow.apache.org/datafusion/) for more information about DataFusion.
 
-## Contribution Guide
+## Contributor Guide
 
-Please see [Contribution Guide](CONTRIBUTING.md) for information about contributing to DataFusion.
+Please see [Contributor Guide](docs/source/contributor-guide/index.md) for information about contributing to DataFusion.
diff --git a/docs/source/developer-guide/community/communication.md b/docs/source/contributor-guide/communication.md
similarity index 80%
rename from docs/source/developer-guide/community/communication.md
rename to docs/source/contributor-guide/communication.md
index d100ad962..6c176610c 100644
--- a/docs/source/developer-guide/community/communication.md
+++ b/docs/source/contributor-guide/communication.md
@@ -69,15 +69,3 @@ The goals of these calls are:
 No decisions are made on the call and anything of substance will be discussed on this mailing list or in github issues / google docs.
 
 We will send a summary of all sync ups to the dev@arrow.apache.org mailing list.
-
-## Contributing
-
-Our source code is hosted on
-[GitHub](https://github.com/apache/arrow-datafusion). More information on contributing is in
-the [Contribution Guide](https://github.com/apache/arrow-datafusion/blob/master/CONTRIBUTING.md)
-, and we have curated a [good-first-issue](https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)
-list to help you get started. You can find DataFusion's major designs in docs/source/specification.
-
-We use GitHub issues for maintaining a queue of development work and as the
-public record. We often use Google docs, Github issues and pull requests for
-quick and small design discussions. For major design change proposals, we encourage you to write a rfc.
diff --git a/CONTRIBUTING.md b/docs/source/contributor-guide/index.md
similarity index 83%
copy from CONTRIBUTING.md
copy to docs/source/contributor-guide/index.md
index 4c0379773..c3c175496 100644
--- a/CONTRIBUTING.md
+++ b/docs/source/contributor-guide/index.md
@@ -35,7 +35,7 @@ list to help you get started.
 
 This section describes how you can get started at developing DataFusion.
 
-### Windows setup
+## Windows setup
 
 ```shell
 wget https://az792536.vo.msecnd.net/vms/VMBuild_20190311/VirtualBox/MSEdge/MSEdge.Win10.VirtualBox.zip
@@ -44,7 +44,7 @@ git-bash.exe
 cargo build
 ```
 
-### Bootstrap environment
+## Bootstrap environment
 
 DataFusion is written in Rust and it uses a standard rust toolkit:
 
@@ -61,13 +61,13 @@ Testing setup:
 
 Formatting instructions:
 
-- [ci/scripts/rust_fmt.sh](ci/scripts/rust_fmt.sh)
-- [ci/scripts/rust_clippy.sh](ci/scripts/rust_clippy.sh)
-- [ci/scripts/rust_toml_fmt.sh](ci/scripts/rust_toml_fmt.sh)
+- [ci/scripts/rust_fmt.sh](../../../ci/scripts/rust_fmt.sh)
+- [ci/scripts/rust_clippy.sh](../../../ci/scripts/rust_clippy.sh)
+- [ci/scripts/rust_toml_fmt.sh](../../../ci/scripts/rust_toml_fmt.sh)
 
 or run them all at once:
 
-- [dev/rust_lint.sh](dev/rust_lint.sh)
+- [dev/rust_lint.sh](../../../dev/rust_lint.sh)
 
 ## Test Organization
 
@@ -166,7 +166,7 @@ The parquet SQL benchmarks can be run with
  cargo bench --bench parquet_query_sql
 ```
 
-These randomly generate a parquet file, and then benchmark queries sourced from [parquet_query_sql.sql](./datafusion/core/benches/parquet_query_sql.sql) against it. This can therefore be a quick way to add coverage of particular query and/or data paths.
+These randomly generate a parquet file, and then benchmark queries sourced from [parquet_query_sql.sql](../../../datafusion/core/benches/parquet_query_sql.sql) against it. This can therefore be a quick way to add coverage of particular query and/or data paths.
 
 If the environment variable `PARQUET_FILE` is set, the benchmark will run queries against this file instead of a randomly generated one. This can be useful for performing multiple runs, potentially with different code, against the same source data, or for testing against a custom dataset.
 
@@ -174,7 +174,7 @@ The benchmark will automatically remove any generated parquet file on exit, howe
 
 ### Upstream Benchmark Suites
 
-Instructions and tooling for running upstream benchmark suites against DataFusion can be found in [benchmarks](./benchmarks).
+Instructions and tooling for running upstream benchmark suites against DataFusion can be found in [benchmarks](../../../benchmarks).
 
 These are valuable for comparative evaluation against alternative Arrow implementations and query engines.
 
@@ -183,10 +183,10 @@ These are valuable for comparative evaluation against alternative Arrow implemen
 Below is a checklist of what you need to do to add a new scalar function to DataFusion:
 
 - Add the actual implementation of the function:
-  - [here](datafusion/physical-expr/src/string_expressions.rs) for string functions
-  - [here](datafusion/physical-expr/src/math_expressions.rs) for math functions
-  - [here](datafusion/physical-expr/src/datetime_expressions.rs) for datetime functions
-  - create a new module [here](datafusion/physical-expr/src) for other functions
+  - [here](../../../datafusion/physical-expr/src/string_expressions.rs) for string functions
+  - [here](../../../datafusion/physical-expr/src/math_expressions.rs) for math functions
+  - [here](../../../datafusion/physical-expr/src/datetime_expressions.rs) for datetime functions
+  - create a new module [here](../../../datafusion/physical-expr/src) for other functions
 - In [core/src/physical_plan](datafusion/core/src/physical_plan/functions.rs), add:
   - a new variant to `BuiltinScalarFunction`
   - a new entry to `FromStr` with the name of the function as called by SQL
@@ -194,10 +194,10 @@ Below is a checklist of what you need to do to add a new scalar function to Data
   - a new line in `signature` with the signature of the function (number and types of its arguments)
   - a new line in `create_physical_expr`/`create_physical_fun` mapping the built-in to the implementation
   - tests to the function.
-- In [core/tests/sql](datafusion/core/tests/sql), add a new test where the function is called through SQL against well known data and returns the expected result.
-- In [expr/src/expr_fn.rs](datafusion/expr/src/expr_fn.rs), add:
+- In [core/tests/sql](../../../datafusion/core/tests/sql), add a new test where the function is called through SQL against well known data and returns the expected result.
+- In [expr/src/expr_fn.rs](../../../datafusion/expr/src/expr_fn.rs), add:
   - a new entry of the `unary_scalar_expr!` macro for the new function.
-- In [core/src/logical_plan/mod](datafusion/core/src/logical_plan/mod.rs), add:
+- In [core/src/logical_plan/mod](../../../datafusion/core/src/logical_plan/mod.rs), add:
   - a new entry in the `pub use expr::{}` set.
 
 ## How to add a new aggregate function
@@ -205,18 +205,18 @@ Below is a checklist of what you need to do to add a new scalar function to Data
 Below is a checklist of what you need to do to add a new aggregate function to DataFusion:
 
 - Add the actual implementation of an `Accumulator` and `AggregateExpr`:
-  - [here](datafusion/physical-expr/src/string_expressions.rs) for string functions
-  - [here](datafusion/physical-expr/src/math_expressions.rs) for math functions
-  - [here](datafusion/physical-expr/src/datetime_expressions.rs) for datetime functions
-  - create a new module [here](datafusion/physical-expr/src) for other functions
-- In [datafusion/expr/src](datafusion/expr/src/aggregate_function.rs), add:
+  - [here](../../../datafusion/physical-expr/src/string_expressions.rs) for string functions
+  - [here](../../../datafusion/physical-expr/src/math_expressions.rs) for math functions
+  - [here](../../../datafusion/physical-expr/src/datetime_expressions.rs) for datetime functions
+  - create a new module [here](../../../datafusion/physical-expr/src) for other functions
+- In [datafusion/expr/src](../../../datafusion/expr/src/aggregate_function.rs), add:
   - a new variant to `AggregateFunction`
   - a new entry to `FromStr` with the name of the function as called by SQL
   - a new line in `return_type` with the expected return type of the function, given an incoming type
   - a new line in `signature` with the signature of the function (number and types of its arguments)
   - a new line in `create_aggregate_expr` mapping the built-in to the implementation
   - tests to the function.
-- In [tests/sql](datafusion/core/tests/sql), add a new test where the function is called through SQL against well known data and returns the expected result.
+- In [tests/sql](../../../datafusion/core/tests/sql), add a new test where the function is called through SQL against well known data and returns the expected result.
 
 ## How to display plans graphically
 
diff --git a/docs/source/specification/quarterly_roadmap.md b/docs/source/contributor-guide/quarterly_roadmap.md
similarity index 99%
rename from docs/source/specification/quarterly_roadmap.md
rename to docs/source/contributor-guide/quarterly_roadmap.md
index 94c7dd9e2..c593e859d 100644
--- a/docs/source/specification/quarterly_roadmap.md
+++ b/docs/source/contributor-guide/quarterly_roadmap.md
@@ -17,7 +17,7 @@
   under the License.
 -->
 
-# Roadmap
+# Quarterly Roadmap
 
 A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding.
 
diff --git a/docs/source/specification/roadmap.md b/docs/source/contributor-guide/roadmap.md
similarity index 94%
rename from docs/source/specification/roadmap.md
rename to docs/source/contributor-guide/roadmap.md
index e1d8ae9c0..99b408a6a 100644
--- a/docs/source/specification/roadmap.md
+++ b/docs/source/contributor-guide/roadmap.md
@@ -34,7 +34,7 @@ suggest you start a conversation using a github issue or the
 dev@arrow.apache.org mailing list to make review efficient and avoid
 surprises.
 
-# DataFusion
+## DataFusion
 
 DataFusion's goal is to become the embedded query engine of choice
 for new analytic applications, by leveraging the unique features of
@@ -47,7 +47,7 @@ to provide:
 4. A Procedural API for programmatically creating and running execution plans
 5. High performance, data race free, ergonomic extensibility points at at every layer
 
-## Additional SQL Language Features
+### Additional SQL Language Features
 
 - Decimal Support [#122](https://github.com/apache/arrow-datafusion/issues/122)
 - Complete support list on [status](https://github.com/apache/arrow-datafusion/blob/master/README.md#status)
@@ -56,32 +56,32 @@ to provide:
 - Support for nested structures (fields, lists, structs) [#119](https://github.com/apache/arrow-datafusion/issues/119)
 - Run all queries from the TPCH benchmark (see [milestone](https://github.com/apache/arrow-datafusion/milestone/2) for more details)
 
-## Query Optimizer
+### Query Optimizer
 
 - More sophisticated cost based optimizer for join ordering
 - Implement advanced query optimization framework (Tokomak) #440
 - Finer optimizations for group by and aggregate functions
 
-## Datasources
+### Datasources
 
 - Better support for reading data from remote filesystems (e.g. S3) without caching it locally [#907](https://github.com/apache/arrow-datafusion/issues/907) [#1060](https://github.com/apache/arrow-datafusion/issues/1060)
 - Improve performances of file format datasources (parallelize file listings, async Arrow readers, file chunk prefetching capability...)
 
-## Runtime / Infrastructure
+### Runtime / Infrastructure
 
 - Migrate to some sort of arrow2 based implementation (see [milestone](https://github.com/apache/arrow-datafusion/milestone/3) for more details)
 - Add DataFusion to h2oai/db-benchmark [147](https://github.com/apache/arrow-datafusion/issues/147)
 - Improve build time [348](https://github.com/apache/arrow-datafusion/issues/348)
 
-## Resource Management
+### Resource Management
 
 - Finer grain control and limit of runtime memory [#587](https://github.com/apache/arrow-datafusion/issues/587) and CPU usage [#54](https://github.com/apache/arrow-datafusion/issues/64)
 
-## Python Interface
+### Python Interface
 
 TBD
 
-## DataFusion CLI (`datafusion-cli`)
+### DataFusion CLI (`datafusion-cli`)
 
 Note: There are some additional thoughts on a datafusion-cli vision on [#1096](https://github.com/apache/arrow-datafusion/issues/1096#issuecomment-939418770).
 
@@ -91,7 +91,7 @@ Note: There are some additional thoughts on a datafusion-cli vision on [#1096](h
 - publishing to apt, brew, and possible NuGet registry so that people can use it more easily
 - adopt a shorter name, like dfcli?
 
-# Ballista
+## Ballista
 
 Ballista is a distributed compute platform based on Apache Arrow and DataFusion. It provides a query scheduler that
 breaks a physical plan into stages and tasks and then schedules tasks for execution across the available executors
@@ -101,16 +101,16 @@ Having Ballista as part of the DataFusion codebase helps ensure that DataFusion
 compute. For example, it helps ensure that physical query plans can be serialized to protobuf format and that they
 remain language-agnostic so that executors can be built in languages other than Rust.
 
-## Ballista Roadmap
+### Ballista Roadmap
 
-## Move query scheduler into DataFusion
+### Move query scheduler into DataFusion
 
 The Ballista scheduler has some advantages over DataFusion query execution because it doesn't try to eagerly execute
 the entire query at once but breaks it down into a directionally-acyclic graph (DAG) of stages and executes a
 configurable number of stages and tasks concurrently. It should be possible to push some of this logic down to
 DataFusion so that the same scheduler can be used to scale across cores in-process and across nodes in a cluster.
 
-## Implement execution-time cost-based optimizations based on statistics
+### Implement execution-time cost-based optimizations based on statistics
 
 After the execution of a query stage, accurate statistics are available for the resulting data. These statistics
 could be leveraged by the scheduler to optimize the query during execution. For example, when performing a hash join
diff --git a/docs/source/contributor-guide/specification/index.rst b/docs/source/contributor-guide/specification/index.rst
new file mode 100644
index 000000000..bcd5a895c
--- /dev/null
+++ b/docs/source/contributor-guide/specification/index.rst
@@ -0,0 +1,25 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+Specifications
+==============
+
+.. toctree::
+   :maxdepth: 1
+
+   invariants
+   output-field-name-semantic
diff --git a/docs/source/specification/invariants.md b/docs/source/contributor-guide/specification/invariants.md
similarity index 99%
rename from docs/source/specification/invariants.md
rename to docs/source/contributor-guide/specification/invariants.md
index 430976306..c8de4e1d4 100644
--- a/docs/source/specification/invariants.md
+++ b/docs/source/contributor-guide/specification/invariants.md
@@ -17,7 +17,7 @@
   under the License.
 -->
 
-# DataFusion's Invariants
+# Invariants
 
 This document enumerates invariants of DataFusion's logical and physical planes
 (functions, and nodes). Some of these invariants are currently not enforced.
diff --git a/docs/source/specification/output-field-name-semantic.md b/docs/source/contributor-guide/specification/output-field-name-semantic.md
similarity index 99%
rename from docs/source/specification/output-field-name-semantic.md
rename to docs/source/contributor-guide/specification/output-field-name-semantic.md
index c86657344..fe378a52c 100644
--- a/docs/source/specification/output-field-name-semantic.md
+++ b/docs/source/contributor-guide/specification/output-field-name-semantic.md
@@ -17,7 +17,7 @@
   under the License.
 -->
 
-# DataFusion output field name semantic
+# Output field name semantics
 
 This specification documents how field names in output record batches should be
 generated based on given user queries. The filed name rules apply to
diff --git a/docs/source/index.rst b/docs/source/index.rst
index e0b432985..0d6d33ef7 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -46,31 +46,16 @@ Table of Contents
    user-guide/configs
    user-guide/faq
 
-.. _toc.specs:
+.. _toc.contributor-guide:
 
 .. toctree::
-   :maxdepth: 1
-   :caption: Specification
-
-   specification/roadmap
-   specification/invariants
-   specification/output-field-name-semantic
-   specification/quarterly_roadmap
-
-.. _toc.readme:
-
-.. toctree::
-   :maxdepth: 1
-   :caption: README
-
-   DataFusion <https://github.com/apache/arrow-datafusion/blob/master/README.md>
-
-.. _toc.community:
-
-.. toctree::
-   :maxdepth: 1
-   :caption: Community
-
-   community/communication
+   :maxdepth: 2
+   :caption: Contributor Guide
+
+   contributor-guide/index
+   contributor-guide/communication
+   contributor-guide/roadmap
+   contributor-guide/quarterly_roadmap
+   contributor-guide/specification/index
    Issue tracker <https://github.com/apache/arrow-datafusion/issues>
    Code of conduct <https://github.com/apache/arrow-datafusion/blob/master/CODE_OF_CONDUCT.md>