You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by al...@apache.org on 2023/04/04 14:35:46 UTC

[arrow-datafusion] branch main updated: Move content from README.md to docs site (#5824)

This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


The following commit(s) were added to refs/heads/main by this push:
     new b87871fdd1 Move content from README.md to docs site (#5824)
b87871fdd1 is described below

commit b87871fdd1f4ce64201eb1f7c79a0547627f37e9
Author: Andrew Lamb <an...@nerdnetworks.org>
AuthorDate: Tue Apr 4 16:35:40 2023 +0200

    Move content from README.md to docs site (#5824)
    
    * Move content from README.md to docs site
    
    * RAT
---
 README.md                                     | 172 +-------------------------
 docs/source/contributor-guide/architecture.md |  26 ++++
 docs/source/contributor-guide/index.md        |  46 +++----
 docs/source/index.rst                         |  18 ++-
 docs/source/user-guide/comparison.md          |  52 ++++++++
 docs/source/user-guide/integration.md         |  35 ++++++
 docs/source/user-guide/introduction.md        |   2 +-
 docs/source/user-guide/users.md               |  67 ++++++++++
 8 files changed, 224 insertions(+), 194 deletions(-)

diff --git a/README.md b/README.md
index 953f08bd45..c9ca835695 100644
--- a/README.md
+++ b/README.md
@@ -19,6 +19,8 @@
 
 # DataFusion
 
+[![Coverage Status](https://codecov.io/gh/apache/arrow-datafusion/rust/branch/master/graph/badge.svg)](https://codecov.io/gh/apache/arrow-datafusion?branch=master)
+
 <img src="docs/source/_static/images/DataFusion-Logo-Background-White.svg" width="256" alt="logo"/>
 
 DataFusion is a very fast, extensible query engine for building high-quality data-centric systems in
@@ -27,176 +29,8 @@ in-memory format.
 
 DataFusion offers SQL and Dataframe APIs, excellent [performance](https://benchmark.clickhouse.com/), built-in support for CSV, Parquet, JSON, and Avro, extensive customization, and a great community.
 
-[![Coverage Status](https://codecov.io/gh/apache/arrow-datafusion/rust/branch/master/graph/badge.svg)](https://codecov.io/gh/apache/arrow-datafusion?branch=master)
-
-## Features
-
-- Feature-rich [SQL support](https://arrow.apache.org/datafusion/user-guide/sql/index.html) and [DataFrame API](https://arrow.apache.org/datafusion/user-guide/dataframe.html)
-- Blazingly fast, vectorized, multi-threaded, streaming execution engine.
-- Native support for Parquet, CSV, JSON, and Avro file formats. Support
-  for custom file formats and non file datasources via the `TableProvider` trait.
-- Many extension points: user defined scalar/aggregate/window functions, DataSources, SQL,
-  other query languages, custom plan and execution nodes, optimizer passes, and more.
-- Streaming, asynchronous IO directly from popular object stores, including AWS S3,
-  Azure Blob Storage, and Google Cloud Storage. Other storage systems are supported via the
-  `ObjectStore` trait.
-- [Excellent Documentation](https://docs.rs/datafusion/latest) and a
-  [welcoming community](https://arrow.apache.org/datafusion/contributor-guide/communication.html).
-- A state of the art query optimizer with projection and filter pushdown, sort aware optimizations,
-  automatic join reordering, expression coercion, and more.
-- Permissive Apache 2.0 License, Apache Software Foundation governance
-- Written in [Rust](https://www.rust-lang.org/), a modern system language with development
-  productivity similar to Java or Golang, the performance of C++, and
-  [loved by programmers everywhere](https://insights.stackoverflow.com/survey/2021#technology-most-loved-dreaded-and-wanted).
-- Support for [Substrait](https://substrait.io/) for query plan serialization, making it easier to integrate DataFusion
-  with other projects, and to pass plans across language boundaries.
-
-## Use Cases
-
-DataFusion can be used without modification as an embedded SQL
-engine or can be customized and used as a foundation for
-building new systems. Here are some examples of systems built using DataFusion:
-
-- Specialized Analytical Database systems such as [CeresDB] and more general Apache Spark like system such a [Ballista].
-- New query language engines such as [prql-query] and accelerators such as [VegaFusion]
-- Research platform for new Database Systems, such as [Flock]
-- SQL support to another library, such as [dask sql]
-- Streaming data platforms such as [Synnada]
-- Tools for reading / sorting / transcoding Parquet, CSV, AVRO, and JSON files such as [qv]
-- A faster Spark runtime replacement [Blaze]
-
-By using DataFusion, the projects are freed to focus on their specific
-features, and avoid reimplementing general (but still necessary)
-features such as an expression representation, standard optimizations,
-execution plans, file format support, etc.
-
-## Why DataFusion?
-
-- _High Performance_: Leveraging Rust and Arrow's memory model, DataFusion is very fast.
-- _Easy to Connect_: Being part of the Apache Arrow ecosystem (Arrow, Parquet and Flight), DataFusion works well with the rest of the big data ecosystem
-- _Easy to Embed_: Allowing extension at almost any point in its design, DataFusion can be tailored for your specific use case
-- _High Quality_: Extensively tested, both by itself and with the rest of the Arrow ecosystem, DataFusion can be used as the foundation for production systems.
-
-## Comparisons with other projects
-
-When compared to similar systems, DataFusion typically is:
-
-1. Targeted at developers, rather than end users / data scientists.
-2. Designed to be embedded, rather than a complete file based SQL system.
-3. Governed by the [Apache Software Foundation](https://www.apache.org/) process, rather than a single company or individual.
-4. Implemented in `Rust`, rather than `C/C++`
-
-Here is a comparison with similar projects that may help understand
-when DataFusion might be be suitable and unsuitable for your needs:
-
-- [DuckDB](http://www.duckdb.org) is an open source, in process analytic database.
-  Like DataFusion, it supports very fast execution, both from its custom file format
-  and directly from parquet files. Unlike DataFusion, it is written in C/C++ and it
-  is primarily used directly by users as a serverless database and query system rather
-  than as a library for building such database systems.
-
-- [Polars](http://pola.rs): Polars is one of the fastest DataFrame
-  libraries at the time of writing. Like DataFusion, it is also
-  written in Rust and uses the Apache Arrow memory model, but unlike
-  DataFusion it does not provide SQL nor as many extension points.
-
-- [Facebook Velox](https://engineering.fb.com/2022/08/31/open-source/velox/)
-  is an execution engine. Like DataFusion, Velox aims to
-  provide a reusable foundation for building database-like systems. Unlike DataFusion,
-  it is written in C/C++ and does not include a SQL frontend or planning /optimization
-  framework.
-
-- [Databend](https://github.com/datafuselabs/databend) is a complete
-  database system. Like DataFusion it is also written in Rust and
-  utilizes the Apache Arrow memory model, but unlike DataFusion it
-  targets end-users rather than developers of other database systems.
-
-## DataFusion Community Extensions
-
-There are a number of community projects that extend DataFusion or
-provide integrations with other systems.
-
-### Language Bindings
-
-- [datafusion-c](https://github.com/datafusion-contrib/datafusion-c)
-- [datafusion-python](https://github.com/apache/arrow-datafusion-python)
-- [datafusion-ruby](https://github.com/datafusion-contrib/datafusion-ruby)
-- [datafusion-java](https://github.com/datafusion-contrib/datafusion-java)
-
-### Integrations
-
-- [datafusion-bigtable](https://github.com/datafusion-contrib/datafusion-bigtable)
-- [datafusion-catalogprovider-glue](https://github.com/datafusion-contrib/datafusion-catalogprovider-glue)
-
-## Known Uses
-
-Here are some of the projects known to use DataFusion:
-
-- [Ballista](https://github.com/apache/arrow-ballista) Distributed SQL Query Engine
-- [Blaze](https://github.com/blaze-init/blaze) Spark accelerator with DataFusion at its core
-- [CeresDB](https://github.com/CeresDB/ceresdb) Distributed Time-Series Database
-- [Cloudfuse Buzz](https://github.com/cloudfuse-io/buzz-rust)
-- [CnosDB](https://github.com/cnosdb/cnosdb) Open Source Distributed Time Series Database
-- [Cube Store](https://github.com/cube-js/cube.js/tree/master/rust)
-- [Dask SQL](https://github.com/dask-contrib/dask-sql) Distributed SQL query engine in Python
-- [datafusion-tui](https://github.com/datafusion-contrib/datafusion-tui) Text UI for DataFusion
-- [delta-rs](https://github.com/delta-io/delta-rs) Native Rust implementation of Delta Lake
-- [Flock](https://github.com/flock-lab/flock)
-- [GreptimeDB](https://github.com/GreptimeTeam/greptimedb) Open Source & Cloud Native Distributed Time Series Database
-- [InfluxDB IOx](https://github.com/influxdata/influxdb_iox) Time Series Database
-- [Kamu](https://github.com/kamu-data/kamu-cli/) Planet-scale streaming data pipeline
-- [Parseable](https://github.com/parseablehq/parseable) Log storage and observability platform
-- [qv](https://github.com/timvw/qv) Quickly view your data
-- [ROAPI](https://github.com/roapi/roapi)
-- [Seafowl](https://github.com/splitgraph/seafowl) CDN-friendly analytical database
-- [Synnada](https://synnada.ai/) Streaming-first framework for data products
-- [Tensorbase](https://github.com/tensorbase/tensorbase)
-- [VegaFusion](https://vegafusion.io/) Server-side acceleration for the [Vega](https://vega.github.io/) visualization grammar
-- [ZincObserve](https://github.com/zinclabs/zincobserve) Distributed cloud native observability platform
-
-[ballista]: https://github.com/apache/arrow-ballista
-[blaze]: https://github.com/blaze-init/blaze
-[ceresdb]: https://github.com/CeresDB/ceresdb
-[cloudfuse buzz]: https://github.com/cloudfuse-io/buzz-rust
-[cnosdb]: https://github.com/cnosdb/cnosdb
-[cube store]: https://github.com/cube-js/cube.js/tree/master/rust
-[dask sql]: https://github.com/dask-contrib/dask-sql
-[datafusion-tui]: https://github.com/datafusion-contrib/datafusion-tui
-[delta-rs]: https://github.com/delta-io/delta-rs
-[flock]: https://github.com/flock-lab/flock
-[kamu]: https://github.com/kamu-data/kamu-cli
-[greptime db]: https://github.com/GreptimeTeam/greptimedb
-[influxdb iox]: https://github.com/influxdata/influxdb_iox
-[parseable]: https://github.com/parseablehq/parseable
-[prql-query]: https://github.com/prql/prql-query
-[qv]: https://github.com/timvw/qv
-[roapi]: https://github.com/roapi/roapi
-[seafowl]: https://github.com/splitgraph/seafowl
-[synnada]: https://synnada.ai/
-[tensorbase]: https://github.com/tensorbase/tensorbase
-[vegafusion]: https://vegafusion.io/
-[zincobserve]: https://github.com/zinclabs/zincobserve "if you know of another project, please submit a PR to add a link!"
+See the Project Website at https://arrow.apache.org/datafusion/ for more details.
 
 ## Examples
 
 Please see the [example usage](https://arrow.apache.org/datafusion/user-guide/example-usage.html) in the user guide and the [datafusion-examples](https://github.com/apache/arrow-datafusion/tree/master/datafusion-examples) crate for more information on how to use DataFusion.
-
-## Roadmap
-
-Please see [Roadmap](docs/source/contributor-guide/roadmap.md) for information of where the project is headed.
-
-## Architecture Overview
-
-There is no formal document describing DataFusion's architecture yet, but the following presentations offer a good overview of its different components and how they interact together.
-
-- (July 2022): DataFusion and Arrow: Supercharge Your Data Analytical Tool with a Rusty Query Engine: [recording](https://www.youtube.com/watch?v=Rii1VTn3seQ) and [slides](https://docs.google.com/presentation/d/1q1bPibvu64k2b7LPi7Yyb0k3gA1BiUYiUbEklqW1Ckc/view#slide=id.g11054eeab4c_0_1165)
-- (March 2021): The DataFusion architecture is described in _Query Engine Design and the Rust-Based DataFusion in Apache Arrow_: [recording](https://www.youtube.com/watch?v=K6eCAVEk4kU) (DataFusion content starts [~ 15 minutes in](https://www.youtube.com/watch?v=K6eCAVEk4kU&t=875s)) and [slides](https://www.slideshare.net/influxdata/influxdb-iox-tech-talks-query-engine-design-and-the-rustbased-datafusion-in-apache-arrow-244161934)
-- (February 2021): How DataFusion is used within the Ballista Project is described in \*Ballista: Distributed Compute with Rust and Apache Arrow: [recording](https://www.youtube.com/watch?v=ZZHQaOap9pQ)
-
-## User Guide
-
-Please see [User Guide](https://arrow.apache.org/datafusion/) for more information about DataFusion.
-
-## Contributor Guide
-
-Please see [Contributor Guide](docs/source/contributor-guide/index.md) for information about contributing to DataFusion.
diff --git a/docs/source/contributor-guide/architecture.md b/docs/source/contributor-guide/architecture.md
new file mode 100644
index 0000000000..3150060ff3
--- /dev/null
+++ b/docs/source/contributor-guide/architecture.md
@@ -0,0 +1,26 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Architecture
+
+There is no formal document describing DataFusion's architecture yet, but the following presentations offer a good overview of its different components and how they interact together.
+
+- (July 2022): DataFusion and Arrow: Supercharge Your Data Analytical Tool with a Rusty Query Engine: [recording](https://www.youtube.com/watch?v=Rii1VTn3seQ) and [slides](https://docs.google.com/presentation/d/1q1bPibvu64k2b7LPi7Yyb0k3gA1BiUYiUbEklqW1Ckc/view#slide=id.g11054eeab4c_0_1165)
+- (March 2021): The DataFusion architecture is described in _Query Engine Design and the Rust-Based DataFusion in Apache Arrow_: [recording](https://www.youtube.com/watch?v=K6eCAVEk4kU) (DataFusion content starts [~ 15 minutes in](https://www.youtube.com/watch?v=K6eCAVEk4kU&t=875s)) and [slides](https://www.slideshare.net/influxdata/influxdb-iox-tech-talks-query-engine-design-and-the-rustbased-datafusion-in-apache-arrow-244161934)
+- (February 2021): How DataFusion is used within the Ballista Project is described in \*Ballista: Distributed Compute with Rust and Apache Arrow: [recording](https://www.youtube.com/watch?v=ZZHQaOap9pQ)
diff --git a/docs/source/contributor-guide/index.md b/docs/source/contributor-guide/index.md
index d7172329c2..df1709979b 100644
--- a/docs/source/contributor-guide/index.md
+++ b/docs/source/contributor-guide/index.md
@@ -31,7 +31,9 @@ You can find a curated
 [good-first-issue](https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)
 list to help you get started.
 
-# Pull Requests
+# Developer's guide
+
+## Pull Requests
 
 We welcome pull requests (PRs) from anyone from the community.
 
@@ -39,8 +41,6 @@ DataFusion is a very active fast-moving project and we try to review and merge P
 
 Review bandwidth is currently our most limited resource, and we highly encourage reviews by the broader community. If you are waiting for your PR to be reviewed, consider helping review other PRs that are waiting. Such review both helps the reviewer to learn the codebase and become more expert, as well as helps identify issues in the PR (such as lack of test coverage), that can be addressed and make future reviews faster and more efficient.
 
-## Merging PRs
-
 Since we are a worldwide community, we have contributors in many timezones who review and comment. To ensure anyone who wishes has an opportunity to review a PR, our committers try to ensure that at least 24 hours passes between when a "major" PR is approved and when it is merged.
 
 A "major" PR means there is a substantial change in design or a change in the API. Committers apply their best judgment to determine what constitutes a substantial change. A "minor" PR might be merged without a 24 hour delay, again subject to the judgment of the committer. Examples of potential "minor" PRs are:
@@ -50,11 +50,11 @@ A "major" PR means there is a substantial change in design or a change in the AP
 3. Non-controversial build-related changes (clippy, version upgrades etc.)
 4. Smaller non-controversial feature additions
 
-# Developer's guide
+## Getting Started
 
 This section describes how you can get started at developing DataFusion.
 
-## Windows setup
+### Windows setup
 
 ```shell
 wget https://az792536.vo.msecnd.net/vms/VMBuild_20190311/VirtualBox/MSEdge/MSEdge.Win10.VirtualBox.zip
@@ -63,7 +63,7 @@ git-bash.exe
 cargo build
 ```
 
-## Protoc Installation
+### Protoc Installation
 
 Compiling DataFusion from sources requires an installed version of the protobuf compiler, `protoc`.
 
@@ -85,7 +85,7 @@ libprotoc 3.12.4
 
 Alternatively a binary release can be downloaded from the [Release Page](https://github.com/protocolbuffers/protobuf/releases) or [built from source](https://github.com/protocolbuffers/protobuf/blob/main/src/README.md).
 
-## Bootstrap environment
+### Bootstrap environment
 
 DataFusion is written in Rust and it uses a standard rust toolkit:
 
@@ -110,7 +110,7 @@ or run them all at once:
 
 - [dev/rust_lint.sh](../../../dev/rust_lint.sh)
 
-## Test Organization
+### Test Organization
 
 DataFusion has several levels of tests in its [Test
 Pyramid](https://martinfowler.com/articles/practical-test-pyramid.html)
@@ -118,13 +118,13 @@ and tries to follow [Testing Organization](https://doc.rust-lang.org/book/ch11-0
 
 This section highlights the most important test modules that exist
 
-### Unit tests
+#### Unit tests
 
 Tests for the code in an individual module are defined in the same source file with a `test` module, following Rust convention
 
-### Rust Integration Tests
+#### Rust Integration Tests
 
-There are several tests of the public interface of the DataFusion library in the [tests](../../../datafusion/core/tests) directory.
+There are several tests of the public interface of the DataFusion library in the [tests](https://github.com/apache/arrow-datafusion/tree/main/datafusion/core/tests) directory.
 
 You can run these tests individually using a command such as
 
@@ -132,18 +132,18 @@ You can run these tests individually using a command such as
 cargo test -p datafusion --tests sql_integration
 ```
 
-One very important test is the [sql_integration](../../../datafusion/core/tests/sql_integration.rs) test which validates DataFusion's ability to run a large assortment of SQL queries against an assortment of data setups.
+One very important test is the [sql_integration](https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/sql_integration.rs) test which validates DataFusion's ability to run a large assortment of SQL queries against an assortment of data setups.
 
-### sqllogictests Tests
+#### sqllogictests Tests
 
-The [sqllogictests](../../../datafusion/core/tests/sqllogictests) also validate DataFusion SQL against an assortment of data setups.
+The [sqllogictests](https://github.com/apache/arrow-datafusion/tree/main/datafusion/core/tests/sqllogictests) also validate DataFusion SQL against an assortment of data setups.
 
 Data Driven tests have many benefits including being easier to write and maintain. We are in the process of [migrating sql_integration tests](https://github.com/apache/arrow-datafusion/issues/4460) and encourage
 you to add new tests using sqllogictests if possible.
 
-## Benchmarks
+### Benchmarks
 
-### Criterion Benchmarks
+#### Criterion Benchmarks
 
 [Criterion](https://docs.rs/criterion/latest/criterion/index.html) is a statistics-driven micro-benchmarking framework used by DataFusion for evaluating the performance of specific code-paths. In particular, the criterion benchmarks help to both guide optimisation efforts, and prevent performance regressions within DataFusion.
 
@@ -153,7 +153,7 @@ Criterion integrates with Cargo's built-in [benchmark support](https://doc.rust-
 cargo bench --bench BENCHMARK_NAME
 ```
 
-A full list of benchmarks can be found [here](../../../datafusion/core/benches).
+A full list of benchmarks can be found [here](https://github.com/apache/arrow-datafusion/tree/main/datafusion/core/benches).
 
 _[cargo-criterion](https://github.com/bheisler/cargo-criterion) may also be used for more advanced reporting._
 
@@ -171,13 +171,15 @@ If the environment variable `PARQUET_FILE` is set, the benchmark will run querie
 
 The benchmark will automatically remove any generated parquet file on exit, however, if interrupted (e.g. by CTRL+C) it will not. This can be useful for analysing the particular file after the fact, or preserving it to use with `PARQUET_FILE` in subsequent runs.
 
-### Upstream Benchmark Suites
+#### Upstream Benchmark Suites
 
-Instructions and tooling for running upstream benchmark suites against DataFusion can be found in [benchmarks](../../../benchmarks).
+Instructions and tooling for running upstream benchmark suites against DataFusion can be found in [benchmarks](https://github.com/apache/arrow-datafusion/tree/main/benchmarks).
 
 These are valuable for comparative evaluation against alternative Arrow implementations and query engines.
 
-## How to add a new scalar function
+## HOWTOs
+
+### How to add a new scalar function
 
 Below is a checklist of what you need to do to add a new scalar function to DataFusion:
 
@@ -197,7 +199,7 @@ Below is a checklist of what you need to do to add a new scalar function to Data
 - In [expr/src/expr_fn.rs](../../../datafusion/expr/src/expr_fn.rs), add:
   - a new entry of the `unary_scalar_expr!` macro for the new function.
 
-## How to add a new aggregate function
+### How to add a new aggregate function
 
 Below is a checklist of what you need to do to add a new aggregate function to DataFusion:
 
@@ -215,7 +217,7 @@ Below is a checklist of what you need to do to add a new aggregate function to D
   - tests to the function.
 - In [tests/sql](../../../datafusion/core/tests/sql), add a new test where the function is called through SQL against well known data and returns the expected result.
 
-## How to display plans graphically
+### How to display plans graphically
 
 The query plans represented by `LogicalPlan` nodes can be graphically
 rendered using [Graphviz](https://www.graphviz.org/).
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 57290d5a26..09071a7511 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -15,12 +15,21 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
+.. image:: _static/images/DataFusion-Logo-Background-White.png
+  :alt: DataFusion Logo
+
 =======================
 Apache Arrow DataFusion
 =======================
 
-Table of Contents
-=================
+DataFusion is a very fast, extensible query engine for building high-quality data-centric systems in
+`Rust <http://rustlang.org>`_, using the `Apache Arrow <https://arrow.apache.org>`_
+in-memory format.
+
+DataFusion offers SQL and Dataframe APIs, excellent
+`performance <https://benchmark.clickhouse.com>`_, built-in support for
+CSV, Parquet, JSON, and Avro, extensive customization, and a great
+community.
 
 .. _toc.guide:
 
@@ -30,6 +39,9 @@ Table of Contents
 
    user-guide/introduction
    user-guide/example-usage
+   user-guide/users
+   user-guide/comparison
+   user-guide/integration
    user-guide/library
    user-guide/cli
    user-guide/dataframe
@@ -47,8 +59,10 @@ Table of Contents
 
    contributor-guide/index
    contributor-guide/communication
+   contributor-guide/architecture
    contributor-guide/roadmap
    contributor-guide/quarterly_roadmap
    contributor-guide/specification/index
+   Github <https://github.com/apache/arrow-datafusion>
    Issue tracker <https://github.com/apache/arrow-datafusion/issues>
    Code of conduct <https://github.com/apache/arrow-datafusion/blob/main/CODE_OF_CONDUCT.md>
diff --git a/docs/source/user-guide/comparison.md b/docs/source/user-guide/comparison.md
new file mode 100644
index 0000000000..2cb13f326a
--- /dev/null
+++ b/docs/source/user-guide/comparison.md
@@ -0,0 +1,52 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Comparisons to Other Projects
+
+When compared to similar systems, DataFusion typically is:
+
+1. Targeted at developers, rather than end users / data scientists.
+2. Designed to be embedded, rather than a complete file based SQL system.
+3. Governed by the [Apache Software Foundation](https://www.apache.org/) process, rather than a single company or individual.
+4. Implemented in `Rust`, rather than `C/C++`
+
+Here is a comparison with similar projects that may help understand
+when DataFusion might be be suitable and unsuitable for your needs:
+
+- [DuckDB](http://www.duckdb.org) is an open source, in process analytic database.
+  Like DataFusion, it supports very fast execution, both from its custom file format
+  and directly from parquet files. Unlike DataFusion, it is written in C/C++ and it
+  is primarily used directly by users as a serverless database and query system rather
+  than as a library for building such database systems.
+
+- [Polars](http://pola.rs): Polars is one of the fastest DataFrame
+  libraries at the time of writing. Like DataFusion, it is also
+  written in Rust and uses the Apache Arrow memory model, but unlike
+  DataFusion it does not provide SQL nor as many extension points.
+
+- [Facebook Velox](https://engineering.fb.com/2022/08/31/open-source/velox/)
+  is an execution engine. Like DataFusion, Velox aims to
+  provide a reusable foundation for building database-like systems. Unlike DataFusion,
+  it is written in C/C++ and does not include a SQL frontend or planning /optimization
+  framework.
+
+- [Databend](https://github.com/datafuselabs/databend) is a complete
+  database system. Like DataFusion it is also written in Rust and
+  utilizes the Apache Arrow memory model, but unlike DataFusion it
+  targets end-users rather than developers of other database systems.
diff --git a/docs/source/user-guide/integration.md b/docs/source/user-guide/integration.md
new file mode 100644
index 0000000000..bffa6b1893
--- /dev/null
+++ b/docs/source/user-guide/integration.md
@@ -0,0 +1,35 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Integrations and Extensions
+
+There are a number of community projects that extend DataFusion or
+provide integrations with other systems.
+
+## Language Bindings
+
+- [datafusion-c](https://github.com/datafusion-contrib/datafusion-c)
+- [datafusion-python](https://github.com/apache/arrow-datafusion-python)
+- [datafusion-ruby](https://github.com/datafusion-contrib/datafusion-ruby)
+- [datafusion-java](https://github.com/datafusion-contrib/datafusion-java)
+
+## Integrations
+
+- [datafusion-bigtable](https://github.com/datafusion-contrib/datafusion-bigtable)
+- [datafusion-catalogprovider-glue](https://github.com/datafusion-contrib/datafusion-catalogprovider-glue)
diff --git a/docs/source/user-guide/introduction.md b/docs/source/user-guide/introduction.md
index 55fc59b320..f906eac78c 100644
--- a/docs/source/user-guide/introduction.md
+++ b/docs/source/user-guide/introduction.md
@@ -17,7 +17,7 @@
   under the License.
 -->
 
-# Introduction
+# Features, and Usecases
 
 DataFusion is a very fast, extensible query engine for building
 high-quality data-centric systems in [Rust](http://rustlang.org),
diff --git a/docs/source/user-guide/users.md b/docs/source/user-guide/users.md
new file mode 100644
index 0000000000..0d259c8de3
--- /dev/null
+++ b/docs/source/user-guide/users.md
@@ -0,0 +1,67 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Known Users
+
+Here are some of the projects known to use DataFusion:
+
+- [Ballista](https://github.com/apache/arrow-ballista) Distributed SQL Query Engine
+- [Blaze](https://github.com/blaze-init/blaze) Spark accelerator with DataFusion at its core
+- [CeresDB](https://github.com/CeresDB/ceresdb) Distributed Time-Series Database
+- [Cloudfuse Buzz](https://github.com/cloudfuse-io/buzz-rust)
+- [CnosDB](https://github.com/cnosdb/cnosdb) Open Source Distributed Time Series Database
+- [Cube Store](https://github.com/cube-js/cube.js/tree/master/rust)
+- [Dask SQL](https://github.com/dask-contrib/dask-sql) Distributed SQL query engine in Python
+- [datafusion-tui](https://github.com/datafusion-contrib/datafusion-tui) Text UI for DataFusion
+- [delta-rs](https://github.com/delta-io/delta-rs) Native Rust implementation of Delta Lake
+- [Flock](https://github.com/flock-lab/flock)
+- [GreptimeDB](https://github.com/GreptimeTeam/greptimedb) Open Source & Cloud Native Distributed Time Series Database
+- [InfluxDB IOx](https://github.com/influxdata/influxdb_iox) Time Series Database
+- [Kamu](https://github.com/kamu-data/kamu-cli/) Planet-scale streaming data pipeline
+- [Parseable](https://github.com/parseablehq/parseable) Log storage and observability platform
+- [qv](https://github.com/timvw/qv) Quickly view your data
+- [ROAPI](https://github.com/roapi/roapi)
+- [Seafowl](https://github.com/splitgraph/seafowl) CDN-friendly analytical database
+- [Synnada](https://synnada.ai/) Streaming-first framework for data products
+- [Tensorbase](https://github.com/tensorbase/tensorbase)
+- [VegaFusion](https://vegafusion.io/) Server-side acceleration for the [Vega](https://vega.github.io/) visualization grammar
+- [ZincObserve](https://github.com/zinclabs/zincobserve) Distributed cloud native observability platform
+
+[ballista]: https://github.com/apache/arrow-ballista
+[blaze]: https://github.com/blaze-init/blaze
+[ceresdb]: https://github.com/CeresDB/ceresdb
+[cloudfuse buzz]: https://github.com/cloudfuse-io/buzz-rust
+[cnosdb]: https://github.com/cnosdb/cnosdb
+[cube store]: https://github.com/cube-js/cube.js/tree/master/rust
+[dask sql]: https://github.com/dask-contrib/dask-sql
+[datafusion-tui]: https://github.com/datafusion-contrib/datafusion-tui
+[delta-rs]: https://github.com/delta-io/delta-rs
+[flock]: https://github.com/flock-lab/flock
+[kamu]: https://github.com/kamu-data/kamu-cli
+[greptime db]: https://github.com/GreptimeTeam/greptimedb
+[influxdb iox]: https://github.com/influxdata/influxdb_iox
+[parseable]: https://github.com/parseablehq/parseable
+[prql-query]: https://github.com/prql/prql-query
+[qv]: https://github.com/timvw/qv
+[roapi]: https://github.com/roapi/roapi
+[seafowl]: https://github.com/splitgraph/seafowl
+[synnada]: https://synnada.ai/
+[tensorbase]: https://github.com/tensorbase/tensorbase
+[vegafusion]: https://vegafusion.io/
+[zincobserve]: https://github.com/zinclabs/zincobserve "if you know of another project, please submit a PR to add a link!"