You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by ag...@apache.org on 2021/04/21 13:42:43 UTC
[arrow-datafusion] branch master updated: Create starting point for
combined user guide for DataFusion and Ballista (#20)
This is an automated email from the ASF dual-hosted git repository.
agrove pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git
The following commit(s) were added to refs/heads/master by this push:
new abe84cf Create starting point for combined user guide for DataFusion and Ballista (#20)
abe84cf is described below
commit abe84cfbfb6cc3e80ed314bc343ef78eae15ed9b
Author: Andy Grove <an...@users.noreply.github.com>
AuthorDate: Wed Apr 21 07:42:37 2021 -0600
Create starting point for combined user guide for DataFusion and Ballista (#20)
---
ballista/docs/user-guide/.gitignore | 2 -
docs/user-guide/.gitignore | 1 +
{ballista/docs => docs}/user-guide/README.md | 14 ++--
{ballista/docs => docs}/user-guide/book.toml | 4 +-
{ballista/docs => docs}/user-guide/src/SUMMARY.md | 19 +++---
.../user-guide/src/distributed/client-python.md | 5 +-
.../user-guide/src/distributed}/client-rust.md | 0
.../user-guide/src/distributed}/clients.md | 0
.../user-guide/src/distributed}/configuration.md | 0
.../user-guide/src/distributed}/deployment.md | 0
.../user-guide/src/distributed}/docker-compose.md | 0
.../user-guide/src/distributed}/introduction.md | 0
.../user-guide/src/distributed}/kubernetes.md | 3 +-
.../user-guide/src/distributed}/standalone.md | 0
docs/user-guide/src/example-usage.md | 76 +++++++++++++++++++++
{ballista/docs => docs}/user-guide/src/faq.md | 0
.../user-guide/src/img/ballista-architecture.png | Bin
docs/user-guide/src/introduction.md | 44 ++++++++++++
.../user-guide/src/library.md | 12 ++--
19 files changed, 148 insertions(+), 32 deletions(-)
diff --git a/ballista/docs/user-guide/.gitignore b/ballista/docs/user-guide/.gitignore
deleted file mode 100644
index e662f99..0000000
--- a/ballista/docs/user-guide/.gitignore
+++ /dev/null
@@ -1,2 +0,0 @@
-ballista-book.tgz
-book
\ No newline at end of file
diff --git a/docs/user-guide/.gitignore b/docs/user-guide/.gitignore
new file mode 100644
index 0000000..e9c0728
--- /dev/null
+++ b/docs/user-guide/.gitignore
@@ -0,0 +1 @@
+book
\ No newline at end of file
diff --git a/ballista/docs/user-guide/README.md b/docs/user-guide/README.md
similarity index 77%
rename from ballista/docs/user-guide/README.md
rename to docs/user-guide/README.md
index 9ee3e90..0b9278c 100644
--- a/ballista/docs/user-guide/README.md
+++ b/docs/user-guide/README.md
@@ -16,21 +16,15 @@
specific language governing permissions and limitations
under the License.
-->
-# Ballista User Guide Source
+# DataFusion User Guide Source
-This directory contains the sources for the user guide that is published at https://ballistacompute.org/docs/.
+This directory contains the sources for the DataFusion user guide.
## Generate HTML
+To generate the user guide in HTML format, run the following commands:
+
```bash
cargo install mdbook
mdbook build
-```
-
-## Deploy User Guide to Web Site
-
-Requires ssh certificate to be available.
-
-```bash
-./deploy.sh
```
\ No newline at end of file
diff --git a/ballista/docs/user-guide/book.toml b/docs/user-guide/book.toml
similarity index 93%
rename from ballista/docs/user-guide/book.toml
rename to docs/user-guide/book.toml
index cf1653d..efb9212 100644
--- a/ballista/docs/user-guide/book.toml
+++ b/docs/user-guide/book.toml
@@ -16,8 +16,8 @@
# under the License.
[book]
-authors = ["Andy Grove"]
+authors = ["Apache Arrow"]
language = "en"
multilingual = false
src = "src"
-title = "Ballista User Guide"
+title = "DataFusion User Guide"
diff --git a/ballista/docs/user-guide/src/SUMMARY.md b/docs/user-guide/src/SUMMARY.md
similarity index 60%
rename from ballista/docs/user-guide/src/SUMMARY.md
rename to docs/user-guide/src/SUMMARY.md
index c8fc2c8..e2ddcb0 100644
--- a/ballista/docs/user-guide/src/SUMMARY.md
+++ b/docs/user-guide/src/SUMMARY.md
@@ -19,12 +19,15 @@
# Summary
- [Introduction](introduction.md)
-- [Create a Ballista Cluster](deployment.md)
- - [Docker](standalone.md)
- - [Docker Compose](docker-compose.md)
- - [Kubernetes](kubernetes.md)
- - [Ballista Configuration](configuration.md)
-- [Clients](clients.md)
- - [Rust](client-rust.md)
- - [Python](client-python.md)
+- [Example Usage](example-usage.md)
+- [Use as a Library](library.md)
+- [Distributed](distributed/introduction.md)
+ - [Create a Ballista Cluster](distributed/deployment.md)
+ - [Docker](distributed/standalone.md)
+ - [Docker Compose](distributed/docker-compose.md)
+ - [Kubernetes](distributed/kubernetes.md)
+ - [Ballista Configuration](distributed/configuration.md)
+ - [Clients](distributed/clients.md)
+ - [Rust](distributed/client-rust.md)
+ - [Python](distributed/client-python.md)
- [Frequently Asked Questions](faq.md)
\ No newline at end of file
diff --git a/ballista/docs/user-guide/src/clients.md b/docs/user-guide/src/distributed/client-python.md
similarity index 92%
copy from ballista/docs/user-guide/src/clients.md
copy to docs/user-guide/src/distributed/client-python.md
index 1e223dd..7525c60 100644
--- a/ballista/docs/user-guide/src/clients.md
+++ b/docs/user-guide/src/distributed/client-python.md
@@ -16,7 +16,6 @@
specific language governing permissions and limitations
under the License.
-->
-## Clients
+# Python
-- [Rust](client-rust.md)
-- [Python](client-python.md)
+Coming soon.
\ No newline at end of file
diff --git a/ballista/docs/user-guide/src/client-rust.md b/docs/user-guide/src/distributed/client-rust.md
similarity index 100%
rename from ballista/docs/user-guide/src/client-rust.md
rename to docs/user-guide/src/distributed/client-rust.md
diff --git a/ballista/docs/user-guide/src/clients.md b/docs/user-guide/src/distributed/clients.md
similarity index 100%
rename from ballista/docs/user-guide/src/clients.md
rename to docs/user-guide/src/distributed/clients.md
diff --git a/ballista/docs/user-guide/src/configuration.md b/docs/user-guide/src/distributed/configuration.md
similarity index 100%
rename from ballista/docs/user-guide/src/configuration.md
rename to docs/user-guide/src/distributed/configuration.md
diff --git a/ballista/docs/user-guide/src/deployment.md b/docs/user-guide/src/distributed/deployment.md
similarity index 100%
copy from ballista/docs/user-guide/src/deployment.md
copy to docs/user-guide/src/distributed/deployment.md
diff --git a/ballista/docs/user-guide/src/docker-compose.md b/docs/user-guide/src/distributed/docker-compose.md
similarity index 100%
rename from ballista/docs/user-guide/src/docker-compose.md
rename to docs/user-guide/src/distributed/docker-compose.md
diff --git a/ballista/docs/user-guide/src/introduction.md b/docs/user-guide/src/distributed/introduction.md
similarity index 100%
rename from ballista/docs/user-guide/src/introduction.md
rename to docs/user-guide/src/distributed/introduction.md
diff --git a/ballista/docs/user-guide/src/kubernetes.md b/docs/user-guide/src/distributed/kubernetes.md
similarity index 97%
rename from ballista/docs/user-guide/src/kubernetes.md
rename to docs/user-guide/src/distributed/kubernetes.md
index 8cd8bee..027a44d 100644
--- a/ballista/docs/user-guide/src/kubernetes.md
+++ b/docs/user-guide/src/distributed/kubernetes.md
@@ -33,8 +33,7 @@ The k8s deployment consists of:
Ballista is at an early stage of development and therefore has some significant limitations:
- There is no support for shared object stores such as S3. All data must exist locally on each node in the
- cluster, including where any client process runs (until
- [#473](https://github.com/ballista-compute/ballista/issues/473) is resolved).
+ cluster, including where any client process runs.
- Only a single scheduler instance is currently supported unless the scheduler is configured to use `etcd` as a
backing store.
diff --git a/ballista/docs/user-guide/src/standalone.md b/docs/user-guide/src/distributed/standalone.md
similarity index 100%
rename from ballista/docs/user-guide/src/standalone.md
rename to docs/user-guide/src/distributed/standalone.md
diff --git a/docs/user-guide/src/example-usage.md b/docs/user-guide/src/example-usage.md
new file mode 100644
index 0000000..ff23c96
--- /dev/null
+++ b/docs/user-guide/src/example-usage.md
@@ -0,0 +1,76 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+# Example Usage
+
+Run a SQL query against data stored in a CSV:
+
+```rust
+use datafusion::prelude::*;
+use arrow::util::pretty::print_batches;
+use arrow::record_batch::RecordBatch;
+
+#[tokio::main]
+async fn main() -> datafusion::error::Result<()> {
+ // register the table
+ let mut ctx = ExecutionContext::new();
+ ctx.register_csv("example", "tests/example.csv", CsvReadOptions::new())?;
+
+ // create a plan to run a SQL query
+ let df = ctx.sql("SELECT a, MIN(b) FROM example GROUP BY a LIMIT 100")?;
+
+ // execute and print results
+ let results: Vec<RecordBatch> = df.collect().await?;
+ print_batches(&results)?;
+ Ok(())
+}
+```
+
+Use the DataFrame API to process data stored in a CSV:
+
+```rust
+use datafusion::prelude::*;
+use arrow::util::pretty::print_batches;
+use arrow::record_batch::RecordBatch;
+
+#[tokio::main]
+async fn main() -> datafusion::error::Result<()> {
+ // create the dataframe
+ let mut ctx = ExecutionContext::new();
+ let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
+
+ let df = df.filter(col("a").lt_eq(col("b")))?
+ .aggregate(vec![col("a")], vec![min(col("b"))])?
+ .limit(100)?;
+
+ // execute and print results
+ let results: Vec<RecordBatch> = df.collect().await?;
+ print_batches(&results)?;
+ Ok(())
+}
+```
+
+Both of these examples will produce
+
+```text
++---+--------+
+| a | MIN(b) |
++---+--------+
+| 1 | 2 |
++---+--------+
+```
diff --git a/ballista/docs/user-guide/src/faq.md b/docs/user-guide/src/faq.md
similarity index 100%
rename from ballista/docs/user-guide/src/faq.md
rename to docs/user-guide/src/faq.md
diff --git a/ballista/docs/user-guide/src/img/ballista-architecture.png b/docs/user-guide/src/img/ballista-architecture.png
similarity index 100%
rename from ballista/docs/user-guide/src/img/ballista-architecture.png
rename to docs/user-guide/src/img/ballista-architecture.png
diff --git a/docs/user-guide/src/introduction.md b/docs/user-guide/src/introduction.md
new file mode 100644
index 0000000..c67fb90
--- /dev/null
+++ b/docs/user-guide/src/introduction.md
@@ -0,0 +1,44 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+# DataFusion
+
+DataFusion is an extensible query execution framework, written in
+Rust, that uses [Apache Arrow](https://arrow.apache.org) as its
+in-memory format.
+
+DataFusion supports both an SQL and a DataFrame API for building
+logical query plans as well as a query optimizer and execution engine
+capable of parallel execution against partitioned data sources (CSV
+and Parquet) using threads.
+
+## Use Cases
+
+DataFusion is used to create modern, fast and efficient data
+pipelines, ETL processes, and database systems, which need the
+performance of Rust and Apache Arrow and want to provide their users
+the convenience of an SQL interface or a DataFrame API.
+
+## Why DataFusion?
+
+* *High Performance*: Leveraging Rust and Arrow's memory model, DataFusion achieves very high performance
+* *Easy to Connect*: Being part of the Apache Arrow ecosystem (Arrow, Parquet and Flight), DataFusion works well with the rest of the big data ecosystem
+* *Easy to Embed*: Allowing extension at almost any point in its design, DataFusion can be tailored for your specific usecase
+* *High Quality*: Extensively tested, both by itself and with the rest of the Arrow ecosystem, DataFusion can be used as the foundation for production systems.
+
diff --git a/ballista/docs/user-guide/src/deployment.md b/docs/user-guide/src/library.md
similarity index 73%
rename from ballista/docs/user-guide/src/deployment.md
rename to docs/user-guide/src/library.md
index 2432f2b..12879b1 100644
--- a/ballista/docs/user-guide/src/deployment.md
+++ b/docs/user-guide/src/library.md
@@ -16,11 +16,13 @@
specific language governing permissions and limitations
under the License.
-->
-# Deployment
+# Using DataFusion as a library
-Ballista is packaged as Docker images. Refer to the following guides to create a Ballista cluster:
+DataFusion is [published on crates.io](https://crates.io/crates/datafusion), and is [well documented on docs.rs](https://docs.rs/datafusion/).
-- [Create a cluster using Docker](standalone.md)
-- [Create a cluster using Docker Compose](docker-compose.md)
-- [Create a cluster using Kubernetes](kubernetes.md)
+To get started, add the following to your `Cargo.toml` file:
+```toml
+[dependencies]
+datafusion = "4.0.0-SNAPSHOT"
+```